Metadata-Version: 2.1
Name: mlblocks
Version: 0.4.1
Summary: Pipelines and primitives for machine learning and data science.
Home-page: https://github.com/MLBazaar/MLBlocks
Author: MIT Data To AI Lab
Author-email: dailabmit@gmail.com
License: MIT license
Keywords: auto machine learning classification regression data science pipeline
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6,<3.9
Description-Content-Type: text/markdown
Provides-Extra: dev
Provides-Extra: test
Provides-Extra: examples
Provides-Extra: mlprimitives
License-File: LICENSE
License-File: AUTHORS.rst

<p align="left">
  <a href="https://dai.lids.mit.edu">
    <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
  </a>
  <i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i>
</p>

<p align="left">
<img width=20% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/mlblocks-icon.png" alt=“MLBlocks” />
</p>

<p align="left">
Pipelines and Primitives for Machine Learning and Data Science.
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![PyPi](https://img.shields.io/pypi/v/mlblocks.svg)](https://pypi.python.org/pypi/mlblocks)
[![Tests](https://github.com/MLBazaar/MLBlocks/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/MLBlocks/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
[![CodeCov](https://codecov.io/gh/MLBazaar/MLBlocks/branch/master/graph/badge.svg)](https://codecov.io/gh/MLBazaar/MLBlocks)
[![Downloads](https://pepy.tech/badge/mlblocks)](https://pepy.tech/project/mlblocks)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/MLBazaar/MLBlocks/master?filepath=examples/tutorials)

<br>

# MLBlocks

* Documentation: https://mlbazaar.github.io/MLBlocks
* Github: https://github.com/MLBazaar/MLBlocks
* License: [MIT](https://github.com/MLBazaar/MLBlocks/blob/master/LICENSE)
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)

## Overview

MLBlocks is a simple framework for composing end-to-end tunable Machine Learning Pipelines by
seamlessly combining tools from any python library with a simple, common and uniform interface.

Features include:

* Build Machine Learning Pipelines combining **any Machine Learning Library in Python**.
* Access a repository with hundreds of primitives and pipelines ready to be used with little to
  no python code to write, carefully curated by Machine Learning and Domain experts.
* Extract machine-readable information about which hyperparameters can be tuned and within
  which ranges, allowing automated integration with Hyperparameter Optimization tools like
  [BTB](https://github.com/MLBazaar/BTB).
* Complex multi-branch pipelines and DAG configurations, with unlimited number of inputs and
  outputs per primitive.
* Easy save and load Pipelines using JSON Annotations.

# Install

## Requirements

**MLBlocks** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/)

## Install with `pip`

The easiest and recommended way to install **MLBlocks** is using [pip](
https://pip.pypa.io/en/stable/):

```bash
pip install mlblocks
```

This will pull and install the latest stable release from [PyPi](https://pypi.org/).

If you want to install from source or contribute to the project please read the
[Contributing Guide](https://mlbazaar.github.io/MLBlocks/contributing.html#get-started).

## MLPrimitives

In order to be usable, MLBlocks requires a compatible primitives library.

The official library, required in order to follow the following MLBlocks tutorial,
is [MLPrimitives](https://github.com/MLBazaar/MLPrimitives), which you can install
with this command:

```bash
pip install mlprimitives
```

# Quickstart

Below there is a short example about how to use **MLBlocks** to solve the [Adult Census
Dataset](https://archive.ics.uci.edu/ml/datasets/Adult) classification problem using a
pipeline which combines primitives from [MLPrimitives](https://github.com/MLBazaar/MLPrimitives),
[scikit-learn](https://scikit-learn.org/) and [xgboost](https://xgboost.readthedocs.io/).

```python3
from mlblocks import MLPipeline
from mlprimitives.datasets import load_dataset

dataset = load_dataset('census')
X_train, X_test, y_train, y_test = dataset.get_splits(1)

primitives = [
    'mlprimitives.custom.preprocessing.ClassEncoder',
    'mlprimitives.custom.feature_extraction.CategoricalEncoder',
    'sklearn.impute.SimpleImputer',
    'xgboost.XGBClassifier',
    'mlprimitives.custom.preprocessing.ClassDecoder'
]
pipeline = MLPipeline(primitives)

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

dataset.score(y_test, predictions)
```

# What's Next?

If you want to learn more about how to tune the pipeline hyperparameters, save and load
the pipelines using JSON annotations or build complex multi-branched pipelines, please
check our [documentation site](https://mlbazaar.github.io/MLBlocks).

Also do not forget to have a look at the [notebook tutorials](
https://github.com/MLBazaar/MLBlocks/tree/master/examples/tutorials)!

# Citing MLBlocks

If you use MLBlocks for your research, please consider citing our related papers.

For the current design of MLBlocks and its usage within the larger *Machine Learning Bazaar* project at
the MIT Data To AI Lab, please see:

Micah J. Smith, Carles Sala, James Max Kanter, and Kalyan Veeramachaneni. ["The Machine Learning Bazaar:
Harnessing the ML Ecosystem for Effective System Development."](https://arxiv.org/abs/1905.08942) arXiv
Preprint 1905.08942. 2019.

```bibtex
@article{smith2019mlbazaar,
  author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
  title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
  journal = {arXiv e-prints},
  year = {2019},
  eid = {arXiv:1905.08942},
  pages = {arXiv:1905.08942},
  archivePrefix = {arXiv},
  eprint = {1905.08942},
}
```

For the first MLBlocks version from 2015, designed for only multi table, multi entity temporal data, please
refer to Bryan Collazo’s thesis:

* [Machine learning blocks](https://dai.lids.mit.edu/wp-content/uploads/2018/06/Mlblocks_Bryan.pdf).
  Bryan Collazo. Masters thesis, MIT EECS, 2015.

With recent availability of a multitude of libraries and tools, we decided it was time to integrate
them and expand the library to address other data types: images, text, graph, time series and
integrate with deep learning libraries.


Changelog
=========

0.4.1 - 2021-10-08
------------------

* Update NumPy dependency - [Issue #136](https://github.com/MLBazaar/MLBlocks/issues/136) by @sarahmish
* Support dynamic inputs and outputs - [Issue #134](https://github.com/MLBazaar/MLBlocks/issues/134) by @pvk-developer

0.4.0 - 2021-01-09
------------------

* Stop pipeline fitting after the last block - [Issue #131](https://github.com/MLBazaar/MLBlocks/issues/131) by @sarahmish
* Add memory debug and profiling - [Issue #130](https://github.com/MLBazaar/MLBlocks/issues/130) by @pvk-developer
* Update Python support - [Issue #129](https://github.com/MLBazaar/MLBlocks/issues/129) by @csala
* Get execution time for each block - [Issue #127](https://github.com/MLBazaar/MLBlocks/issues/127) by @sarahmish
* Allow loading a primitive or pipeline directly from the JSON path - [Issue #114](https://github.com/MLBazaar/MLBlocks/issues/114) by @csala
* Pipeline Diagrams - [Issue #113](https://github.com/MLBazaar/MLBlocks/issues/113) by @erica-chiu
* Get Pipeline Inputs - [Issue #112](https://github.com/MLBazaar/MLBlocks/issues/112) by @erica-chiu

0.3.4 - 2019-11-01
------------------

* Ability to return intermediate context - [Issue #110](https://github.com/MLBazaar/MLBlocks/issues/110) by @csala
* Support for static or class methods - [Issue #107](https://github.com/MLBazaar/MLBlocks/issues/107) by @csala

0.3.3 - 2019-09-09
------------------

* Improved intermediate outputs management - [Issue #105](https://github.com/MLBazaar/MLBlocks/issues/105) by @csala

0.3.2 - 2019-08-12
------------------

* Allow passing fit and produce arguments as `init_params` - [Issue #96](https://github.com/MLBazaar/MLBlocks/issues/96) by @csala
* Support optional fit and produce args and arg defaults - [Issue #95](https://github.com/MLBazaar/MLBlocks/issues/95) by @csala
* Isolate primitives from their hyperparameters dictionary - [Issue #94](https://github.com/MLBazaar/MLBlocks/issues/94) by @csala
* Add functions to explore the available primitives and pipelines - [Issue #90](https://github.com/MLBazaar/MLBlocks/issues/90) by @csala
* Add primitive caching - [Issue #22](https://github.com/MLBazaar/MLBlocks/issues/22) by @csala

0.3.1 - Pipelines Discovery
---------------------------

* Support flat hyperparameter dictionaries - [Issue #92](https://github.com/MLBazaar/MLBlocks/issues/92) by @csala
* Load pipelines by name and register them as `entry_points` - [Issue #88](https://github.com/MLBazaar/MLBlocks/issues/88) by @csala
* Implement partial re-fit -[Issue #61](https://github.com/MLBazaar/MLBlocks/issues/61) by @csala
* Move argument parsing to MLBlock - [Issue #86](https://github.com/MLBazaar/MLBlocks/issues/86) by @csala
* Allow getting intermediate outputs - [Issue #58](https://github.com/MLBazaar/MLBlocks/issues/58) by @csala

0.3.0 - New Primitives Discovery
--------------------------------

* New primitives discovery system based on `entry_points`.
* Conditional Hyperparameters filtering in MLBlock initialization.
* Improved logging and exception reporting.

0.2.4 - New Datasets and Unit Tests
-----------------------------------

* Add a new multi-table dataset.
* Add Unit Tests up to 50% coverage.
* Improve documentation.
* Fix minor bug in newsgroups dataset.

0.2.3 - Demo Datasets
---------------------

* Add new methods to Dataset class.
* Add documentation for the datasets module.

0.2.2 - MLPipeline Load/Save
----------------------------

* Implement save and load methods for MLPipelines
* Add more datasets

0.2.1 - New Documentation
-------------------------

* Add mlblocks.datasets module with demo data download functions.
* Extensive documentation, including multiple pipeline examples.

0.2.0 - New MLBlocks API
------------------------

A new MLBlocks API and Primitive format.

This is a summary of the changes:

* Primitives JSONs and Python code has been moved to a different repository, called MLPrimitives
* Optional usage of multiple JSON primitive folders.
* JSON format has been changed to allow more flexibility and features:
    * input and output arguments, as well as argument types, can be specified for each method
    * both classes and function as primitives are supported
    * multitype and conditional hyperparameters fully supported
    * data modalities and primitive classifiers introduced
    * metadata such as documentation, description and author fields added
* Parsers are removed, and now the MLBlock class is responsible for loading and reading the
  JSON primitive.
* Multiple blocks of the same primitive are supported within the same pipeline.
* Arbitrary inputs and outputs for both pipelines and blocks are allowed.
* Shared variables during pipeline execution, usable by multiple blocks.

0.1.9 - Bugfix Release
----------------------

* Disable some NetworkX functions for incompatibilities with some types of graphs.

0.1.8 - New primitives and some improvements
--------------------------------------------

* Improve the NetworkX primitives.
* Add String Vectorization and Datetime Featurization primitives.
* Refactor some Keras primitives to work with single dimension `y` arrays and be compatible with `pickle`.
* Add XGBClassifier and XGBRegressor primitives.
* Add some `keras.applications` pretrained networks as preprocessing primitives.
* Add helper class to allow function primitives.

0.1.7 - Nested hyperparams dicts
--------------------------------

* Support passing hyperparams as nested dicts.

0.1.6 - Text and Graph Pipelines
--------------------------------

* Add LSTM classifier and regressor primitives.
* Add OneHotEncoder and MultiLabelEncoder primitives.
* Add several NetworkX graph featurization primitives.
* Add `community.best_partition` primitive.

0.1.5 - Collaborative Filtering Pipelines
-----------------------------------------

* Add LightFM primitive.

0.1.4 - Image pipelines improved
--------------------------------

* Allow passing `init_params` on `MLPipeline` creation.
* Fix bug with MLHyperparam types and Keras.
* Rename `produce_params` as `predict_params`.
* Add SingleCNN Classifier and Regressor primitives.
* Simplify and improve Trivial Predictor

0.1.3 - Multi Table pipelines improved
--------------------------------------

* Improve RandomForest primitive ranges
* Improve DFS primitive
* Add Tree Based Feature Selection primitives
* Fix bugs in TrivialPredictor
* Improved documentation

0.1.2 - Bugfix release
----------------------

* Fix bug in TrivialMedianPredictor
* Fix bug in OneHotLabelEncoder

0.1.1 - Single Table pipelines improved
---------------------------------------

* New project structure and primitives for integration into MIT-TA2.
* MIT-TA2 default pipelines and single table pipelines fully working.

0.1.0
-----

* First release on PyPI.


