Metadata-Version: 2.1
Name: bamt
Version: 1.0.2
Summary: data modeling and analysis tool based on Bayesian networks
Home-page: https://github.com/ITMO-NSS-team/BAMT
License: MIT
Author: Roman Netrogolov
Author-email: romius2001@gmail.com
Maintainer: N Mramorov
Requires-Python: >=3.9,<3.11
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Dist: gmr (==1.6.2)
Requires-Dist: matplotlib (==3.5.1)
Requires-Dist: missingno (>=0.5.1,<0.6.0)
Requires-Dist: numpy (==1.22.2)
Requires-Dist: pandas (==1.4.0)
Requires-Dist: pgmpy (>=0.1.17,<0.2.0)
Requires-Dist: pomegranate (==0.14.8)
Requires-Dist: pyvis (>=0.1.9,<0.2.0)
Requires-Dist: scikit-learn (==1.0.2)
Requires-Dist: scipy (>=1.8.0,<2.0.0)
Project-URL: Repository, https://github.com/ITMO-NSS-team/BAMT
Description-Content-Type: text/markdown

# BAMT
Repository of a data modeling and analysis tool based on Bayesian networks


BAMT - Bayesian Analytical and Modelling Toolkit. This repository contains a data modeling and analysis tool based on Bayesian networks. It can be divided into two main parts - algorithms for constructing and training Bayesian networks on data and algorithms for applying Bayesian networks for filling gaps, generating synthetic data, and searching for anomalous values.

### Installation

BAMT package is available via PyPi: ``pip install bamt``

Bayesian network learning
=========================
In terms of training Bayesian networks on data, the following algorithms are implemented:
- Building the structure of a Bayesian network based on expert knowledge by directly specifying the structure of the network;
- Building the structure of a Bayesian network on data using three algorithms - Hill Climbing, evolutionary and PC. For Hill Climbing, the following score functions are implemented - MI, K2, BIC, AIC. The algorithms work on both discrete and mixed data.
- Training the parameters of distributions in the nodes of the network on the basis of data.

Difference from existing implementations:
- Algorithms work on mixed data;
- Structural learning implements score-functions for mixed data;
- Parametric learning implements the use of a mixture of Gaussian distributions to approximate continuous distributions;
- The algorithm for structural training of large Bayesian networks (> 10 nodes) is based on local training of small networks with their subsequent algorithmic connection.
![title](img/BN_gif.gif)

Generating synthetic data
=========================
In terms of data analysis and modeling using Bayesian networks, a pipeline has been implemented to generate synthetic data by sampling from Bayesian networks.
![title](img/synth_gen.png)

Oil and Gas Reservoirs Parameters Analysis
==========================================
Bayesian networks can be used to restore gaps in reservoirs data, search for anomalous values, and also to search for analogous reservoirs.
![title](img/concept.png)

# Project structure
## Utils
  1. GraphUtils consist of functions for:
  - Finding nodes types/signs
  - Topological ordering
  2. MathUtils consist of functions for:
  - Additional function to support calculation of metrics from group1 ('MI', 'LL', 'BIC', 'AIC')

## Preprocessing
Preprocessor module allows user to transform data according pipeline (in analogy with pipeline in scikit-learn).
## Networks
There are 3 general types of networks: discrete, gaussian, hybrid. All implemented in networks.py  
## Nodes
Contains nodes' classes and their methods.

