Metadata-Version: 2.1
Name: mlblocks
Version: 0.5.0
Summary: Pipelines and primitives for machine learning and data science.
Home-page: https://github.com/MLBazaar/MLBlocks
Author: MIT Data To AI Lab
Author-email: dailabmit@gmail.com
License: MIT license
Description: <p align="left">
          <a href="https://dai.lids.mit.edu">
            <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
          </a>
          <i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i>
        </p>
        
        <p align="left">
        <img width=20% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/mlblocks-icon.png" alt=“MLBlocks” />
        </p>
        
        <p align="left">
        Pipelines and Primitives for Machine Learning and Data Science.
        </p>
        
        [![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
        [![PyPi](https://img.shields.io/pypi/v/mlblocks.svg)](https://pypi.python.org/pypi/mlblocks)
        [![Tests](https://github.com/MLBazaar/MLBlocks/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/MLBlocks/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
        [![CodeCov](https://codecov.io/gh/MLBazaar/MLBlocks/branch/master/graph/badge.svg)](https://codecov.io/gh/MLBazaar/MLBlocks)
        [![Downloads](https://pepy.tech/badge/mlblocks)](https://pepy.tech/project/mlblocks)
        [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/MLBazaar/MLBlocks/master?filepath=examples/tutorials)
        
        <br>
        
        # MLBlocks
        
        * Documentation: https://mlbazaar.github.io/MLBlocks
        * Github: https://github.com/MLBazaar/MLBlocks
        * License: [MIT](https://github.com/MLBazaar/MLBlocks/blob/master/LICENSE)
        * Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
        
        ## Overview
        
        MLBlocks is a simple framework for composing end-to-end tunable Machine Learning Pipelines by
        seamlessly combining tools from any python library with a simple, common and uniform interface.
        
        Features include:
        
        * Build Machine Learning Pipelines combining **any Machine Learning Library in Python**.
        * Access a repository with hundreds of primitives and pipelines ready to be used with little to
          no python code to write, carefully curated by Machine Learning and Domain experts.
        * Extract machine-readable information about which hyperparameters can be tuned and within
          which ranges, allowing automated integration with Hyperparameter Optimization tools like
          [BTB](https://github.com/MLBazaar/BTB).
        * Complex multi-branch pipelines and DAG configurations, with unlimited number of inputs and
          outputs per primitive.
        * Easy save and load Pipelines using JSON Annotations.
        
        # Install
        
        ## Requirements
        
        **MLBlocks** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/)
        
        ## Install with `pip`
        
        The easiest and recommended way to install **MLBlocks** is using [pip](
        https://pip.pypa.io/en/stable/):
        
        ```bash
        pip install mlblocks
        ```
        
        This will pull and install the latest stable release from [PyPi](https://pypi.org/).
        
        If you want to install from source or contribute to the project please read the
        [Contributing Guide](https://mlbazaar.github.io/MLBlocks/contributing.html#get-started).
        
        ## MLPrimitives
        
        In order to be usable, MLBlocks requires a compatible primitives library.
        
        The official library, required in order to follow the following MLBlocks tutorial,
        is [MLPrimitives](https://github.com/MLBazaar/MLPrimitives), which you can install
        with this command:
        
        ```bash
        pip install mlprimitives
        ```
        
        # Quickstart
        
        Below there is a short example about how to use **MLBlocks** to solve the [Adult Census
        Dataset](https://archive.ics.uci.edu/ml/datasets/Adult) classification problem using a
        pipeline which combines primitives from [MLPrimitives](https://github.com/MLBazaar/MLPrimitives),
        [scikit-learn](https://scikit-learn.org/) and [xgboost](https://xgboost.readthedocs.io/).
        
        ```python3
        from mlblocks import MLPipeline
        from mlprimitives.datasets import load_dataset
        
        dataset = load_dataset('census')
        X_train, X_test, y_train, y_test = dataset.get_splits(1)
        
        primitives = [
            'mlprimitives.custom.preprocessing.ClassEncoder',
            'mlprimitives.custom.feature_extraction.CategoricalEncoder',
            'sklearn.impute.SimpleImputer',
            'xgboost.XGBClassifier',
            'mlprimitives.custom.preprocessing.ClassDecoder'
        ]
        pipeline = MLPipeline(primitives)
        
        pipeline.fit(X_train, y_train)
        predictions = pipeline.predict(X_test)
        
        dataset.score(y_test, predictions)
        ```
        
        # What's Next?
        
        If you want to learn more about how to tune the pipeline hyperparameters, save and load
        the pipelines using JSON annotations or build complex multi-branched pipelines, please
        check our [documentation site](https://mlbazaar.github.io/MLBlocks).
        
        Also do not forget to have a look at the [notebook tutorials](
        https://github.com/MLBazaar/MLBlocks/tree/master/examples/tutorials)!
        
        # Citing MLBlocks
        
        If you use MLBlocks for your research, please consider citing our related papers.
        
        For the current design of MLBlocks and its usage within the larger *Machine Learning Bazaar* project at
        the MIT Data To AI Lab, please see:
        
        Micah J. Smith, Carles Sala, James Max Kanter, and Kalyan Veeramachaneni. ["The Machine Learning Bazaar:
        Harnessing the ML Ecosystem for Effective System Development."](https://arxiv.org/abs/1905.08942) arXiv
        Preprint 1905.08942. 2019.
        
        ```bibtex
        @article{smith2019mlbazaar,
          author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
          title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
          journal = {arXiv e-prints},
          year = {2019},
          eid = {arXiv:1905.08942},
          pages = {arXiv:1905.08942},
          archivePrefix = {arXiv},
          eprint = {1905.08942},
        }
        ```
        
        For the first MLBlocks version from 2015, designed for only multi table, multi entity temporal data, please
        refer to Bryan Collazo’s thesis:
        
        * [Machine learning blocks](https://dai.lids.mit.edu/wp-content/uploads/2018/06/Mlblocks_Bryan.pdf).
          Bryan Collazo. Masters thesis, MIT EECS, 2015.
        
        With recent availability of a multitude of libraries and tools, we decided it was time to integrate
        them and expand the library to address other data types: images, text, graph, time series and
        integrate with deep learning libraries.
        
        
        Changelog
        =========
        
        0.5.0 - 2023-01-22
        ------------------
        
        * Update `numpy` dependency and isolate tests - [Issue #139](https://github.com/MLBazaar/MLBlocks/issues/139) by @sarahmish
        
        0.4.1 - 2021-10-08
        ------------------
        
        * Update NumPy dependency - [Issue #136](https://github.com/MLBazaar/MLBlocks/issues/136) by @sarahmish
        * Support dynamic inputs and outputs - [Issue #134](https://github.com/MLBazaar/MLBlocks/issues/134) by @pvk-developer
        
        0.4.0 - 2021-01-09
        ------------------
        
        * Stop pipeline fitting after the last block - [Issue #131](https://github.com/MLBazaar/MLBlocks/issues/131) by @sarahmish
        * Add memory debug and profiling - [Issue #130](https://github.com/MLBazaar/MLBlocks/issues/130) by @pvk-developer
        * Update Python support - [Issue #129](https://github.com/MLBazaar/MLBlocks/issues/129) by @csala
        * Get execution time for each block - [Issue #127](https://github.com/MLBazaar/MLBlocks/issues/127) by @sarahmish
        * Allow loading a primitive or pipeline directly from the JSON path - [Issue #114](https://github.com/MLBazaar/MLBlocks/issues/114) by @csala
        * Pipeline Diagrams - [Issue #113](https://github.com/MLBazaar/MLBlocks/issues/113) by @erica-chiu
        * Get Pipeline Inputs - [Issue #112](https://github.com/MLBazaar/MLBlocks/issues/112) by @erica-chiu
        
        0.3.4 - 2019-11-01
        ------------------
        
        * Ability to return intermediate context - [Issue #110](https://github.com/MLBazaar/MLBlocks/issues/110) by @csala
        * Support for static or class methods - [Issue #107](https://github.com/MLBazaar/MLBlocks/issues/107) by @csala
        
        0.3.3 - 2019-09-09
        ------------------
        
        * Improved intermediate outputs management - [Issue #105](https://github.com/MLBazaar/MLBlocks/issues/105) by @csala
        
        0.3.2 - 2019-08-12
        ------------------
        
        * Allow passing fit and produce arguments as `init_params` - [Issue #96](https://github.com/MLBazaar/MLBlocks/issues/96) by @csala
        * Support optional fit and produce args and arg defaults - [Issue #95](https://github.com/MLBazaar/MLBlocks/issues/95) by @csala
        * Isolate primitives from their hyperparameters dictionary - [Issue #94](https://github.com/MLBazaar/MLBlocks/issues/94) by @csala
        * Add functions to explore the available primitives and pipelines - [Issue #90](https://github.com/MLBazaar/MLBlocks/issues/90) by @csala
        * Add primitive caching - [Issue #22](https://github.com/MLBazaar/MLBlocks/issues/22) by @csala
        
        0.3.1 - Pipelines Discovery
        ---------------------------
        
        * Support flat hyperparameter dictionaries - [Issue #92](https://github.com/MLBazaar/MLBlocks/issues/92) by @csala
        * Load pipelines by name and register them as `entry_points` - [Issue #88](https://github.com/MLBazaar/MLBlocks/issues/88) by @csala
        * Implement partial re-fit -[Issue #61](https://github.com/MLBazaar/MLBlocks/issues/61) by @csala
        * Move argument parsing to MLBlock - [Issue #86](https://github.com/MLBazaar/MLBlocks/issues/86) by @csala
        * Allow getting intermediate outputs - [Issue #58](https://github.com/MLBazaar/MLBlocks/issues/58) by @csala
        
        0.3.0 - New Primitives Discovery
        --------------------------------
        
        * New primitives discovery system based on `entry_points`.
        * Conditional Hyperparameters filtering in MLBlock initialization.
        * Improved logging and exception reporting.
        
        0.2.4 - New Datasets and Unit Tests
        -----------------------------------
        
        * Add a new multi-table dataset.
        * Add Unit Tests up to 50% coverage.
        * Improve documentation.
        * Fix minor bug in newsgroups dataset.
        
        0.2.3 - Demo Datasets
        ---------------------
        
        * Add new methods to Dataset class.
        * Add documentation for the datasets module.
        
        0.2.2 - MLPipeline Load/Save
        ----------------------------
        
        * Implement save and load methods for MLPipelines
        * Add more datasets
        
        0.2.1 - New Documentation
        -------------------------
        
        * Add mlblocks.datasets module with demo data download functions.
        * Extensive documentation, including multiple pipeline examples.
        
        0.2.0 - New MLBlocks API
        ------------------------
        
        A new MLBlocks API and Primitive format.
        
        This is a summary of the changes:
        
        * Primitives JSONs and Python code has been moved to a different repository, called MLPrimitives
        * Optional usage of multiple JSON primitive folders.
        * JSON format has been changed to allow more flexibility and features:
            * input and output arguments, as well as argument types, can be specified for each method
            * both classes and function as primitives are supported
            * multitype and conditional hyperparameters fully supported
            * data modalities and primitive classifiers introduced
            * metadata such as documentation, description and author fields added
        * Parsers are removed, and now the MLBlock class is responsible for loading and reading the
          JSON primitive.
        * Multiple blocks of the same primitive are supported within the same pipeline.
        * Arbitrary inputs and outputs for both pipelines and blocks are allowed.
        * Shared variables during pipeline execution, usable by multiple blocks.
        
        0.1.9 - Bugfix Release
        ----------------------
        
        * Disable some NetworkX functions for incompatibilities with some types of graphs.
        
        0.1.8 - New primitives and some improvements
        --------------------------------------------
        
        * Improve the NetworkX primitives.
        * Add String Vectorization and Datetime Featurization primitives.
        * Refactor some Keras primitives to work with single dimension `y` arrays and be compatible with `pickle`.
        * Add XGBClassifier and XGBRegressor primitives.
        * Add some `keras.applications` pretrained networks as preprocessing primitives.
        * Add helper class to allow function primitives.
        
        0.1.7 - Nested hyperparams dicts
        --------------------------------
        
        * Support passing hyperparams as nested dicts.
        
        0.1.6 - Text and Graph Pipelines
        --------------------------------
        
        * Add LSTM classifier and regressor primitives.
        * Add OneHotEncoder and MultiLabelEncoder primitives.
        * Add several NetworkX graph featurization primitives.
        * Add `community.best_partition` primitive.
        
        0.1.5 - Collaborative Filtering Pipelines
        -----------------------------------------
        
        * Add LightFM primitive.
        
        0.1.4 - Image pipelines improved
        --------------------------------
        
        * Allow passing `init_params` on `MLPipeline` creation.
        * Fix bug with MLHyperparam types and Keras.
        * Rename `produce_params` as `predict_params`.
        * Add SingleCNN Classifier and Regressor primitives.
        * Simplify and improve Trivial Predictor
        
        0.1.3 - Multi Table pipelines improved
        --------------------------------------
        
        * Improve RandomForest primitive ranges
        * Improve DFS primitive
        * Add Tree Based Feature Selection primitives
        * Fix bugs in TrivialPredictor
        * Improved documentation
        
        0.1.2 - Bugfix release
        ----------------------
        
        * Fix bug in TrivialMedianPredictor
        * Fix bug in OneHotLabelEncoder
        
        0.1.1 - Single Table pipelines improved
        ---------------------------------------
        
        * New project structure and primitives for integration into MIT-TA2.
        * MIT-TA2 default pipelines and single table pipelines fully working.
        
        0.1.0
        -----
        
        * First release on PyPI.
        
Keywords: auto machine learning classification regression data science pipeline
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6,<3.9
Description-Content-Type: text/markdown
Provides-Extra: dev
Provides-Extra: unit
Provides-Extra: test
Provides-Extra: examples
Provides-Extra: mlprimitives
