Metadata-Version: 2.1
Name: rdt
Version: 0.2.6.dev0
Summary: Reversible Data Transformsi
Home-page: https://github.com/sdv-dev/RDT
Author: MIT Data To AI Lab
Author-email: dailabmit@gmail.com
License: MIT license
Description: <p align="left">
        <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
        <i>An open source project from Data to AI Lab at MIT.</i>
        </p>
        
        [![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
        [![PyPi Shield](https://img.shields.io/pypi/v/RDT.svg)](https://pypi.python.org/pypi/RDT)
        [![Travis CI Shield](https://travis-ci.org/sdv-dev/RDT.svg?branch=master)](https://travis-ci.org/sdv-dev/RDT)
        [![Coverage Status](https://codecov.io/gh/sdv-dev/RDT/branch/master/graph/badge.svg)](https://codecov.io/gh/sdv-dev/RDT)
        [![Downloads](https://pepy.tech/badge/rdt)](https://pepy.tech/project/rdt)
        
        # RDT: Reversible Data Transforms
        
        * License: [MIT](https://github.com/sdv-dev/RDT/blob/master/LICENSE)
        * Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
        * Homepage: https://github.com/sdv-dev/RDT
        
        ## Overview
        
        **RDT** is a Python library used to transform data for data science libraries and preserve
        the transformations in order to revert them as needed.
        
        # Install
        
        ## Requirements
        
        **RDT** has been developed and tested on [Python 3.5, 3.6, 3.7 and 3.8](https://www.python.org/downloads/)
        
        Also, although it is not strictly required, the usage of a [virtualenv](
        https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid
        interfering with other software installed in the system where **RDT** is run.
        
        ## Install with pip
        
        The easiest and recommended way to install **RDT** is using [pip](
        https://pip.pypa.io/en/stable/):
        
        ```bash
        pip install rdt
        ```
        
        This will pull and install the latest stable release from [PyPi](https://pypi.org/).
        
        If you want to install from source or contribute to the project please read the
        [Contributing Guide](CONTRIBUTING.rst).
        
        
        # Quickstart
        
        In this short series of tutorials we will guide you through a series of steps that will
        help you getting started using **RDT** to transform columns, tables and datasets.
        
        ## Transforming a column
        
        In this first guide, you will learn how to use **RDT** in its simplest form, transforming
        a single column loaded as a `pandas.DataFrame` object.
        
        ### 1. Load the demo data
        
        You can load some demo data using the `rdt.get_demo` function, which will return some random
        data for you to play with.
        
        ```python3
        from rdt import get_demo
        
        data = get_demo()
        ```
        
        This will return a `pandas.DataFrame` with 10 rows and 4 columns, one of each data type supported:
        
        ```
           0_int    1_float 2_str          3_datetime
        0   38.0  46.872441     b 2021-02-10 21:50:00
        1   77.0  13.150228   NaN 2021-07-19 21:14:00
        2   21.0        NaN     b                 NaT
        3   10.0  37.128869     c 2019-10-15 21:39:00
        4   91.0  41.341214     a 2020-10-31 11:57:00
        5   67.0  92.237335     a                 NaT
        6    NaN  51.598682   NaN 2020-04-01 01:56:00
        7    NaN  42.204396     c 2020-03-12 22:12:00
        8   68.0        NaN     c 2021-02-25 16:04:00
        9    7.0  31.542918     a 2020-07-12 03:12:00
        ```
        
        Notice how the data is random, so your output might look a bit different. Also notice how
        RDT introduced some null values randomly.
        
        ### 2. Load the transformer
        
        In this example we will use the datetime column, so let's load a `DatetimeTransformer`.
        
        ```python3
        from rdt.transformers import DatetimeTransformer
        
        transformer = DatetimeTransformer()
        ```
        
        ### 3. Fit the Transformer
        
        Before being able to transform the data, we need the transformer to learn from it.
        
        We will do this by calling its `fit` method passing the column that we want to transform.
        
        ```python3
        transformer.fit(data['3_datetime'])
        ```
        
        ### 4. Transform the data
        
        Once the transformer is fitted, we can pass the data again to its `transform` method in order
        to get the transformed version of the data.
        
        ```python3
        transformed = transformer.transform(data['3_datetime'])
        ```
        
        The output will be a `numpy.ndarray` with two columns, one with the datetimes transformed
        to integer timestamps, and another one indicating with 1s which values were null in the
        original data.
        
        ```
        array([[1.61299380e+18, 0.00000000e+00],
               [1.62672924e+18, 0.00000000e+00],
               [1.59919923e+18, 1.00000000e+00],
               [1.57117554e+18, 0.00000000e+00],
               [1.60414542e+18, 0.00000000e+00],
               [1.59919923e+18, 1.00000000e+00],
               [1.58570616e+18, 0.00000000e+00],
               [1.58405112e+18, 0.00000000e+00],
               [1.61426904e+18, 0.00000000e+00],
               [1.59452352e+18, 0.00000000e+00]])
        ```
        
        ### 5. Revert the column transformation
        
        In order to revert the previous transformation, the transformed data can be passed to
        the `reverse_transform` method of the transformer:
        
        ```python3
        reversed_data = transformer.reverse_transform(transformed)
        ```
        
        The output will be a `pandas.Series` containing the reverted values, which should be exactly
        like the original ones.
        
        ```
        0   2021-02-10 21:50:00
        1   2021-07-19 21:14:00
        2                   NaT
        3   2019-10-15 21:39:00
        4   2020-10-31 11:57:00
        5                   NaT
        6   2020-04-01 01:56:00
        7   2020-03-12 22:12:00
        8   2021-02-25 16:04:00
        9   2020-07-12 03:12:00
        dtype: datetime64[ns]
        ```
        
        ## Transforming a table
        
        Once we know how to transform a single column, we can try to go the next level and transform
        a table with multiple columns.
        
        ### 1. Load the HyperTransformer
        
        In order to manuipulate a complete table we will need to load a `rdt.HyperTransformer`.
        
        ```python3
        from rdt import HyperTransformer
        
        ht = HyperTransformer()
        ```
        
        ### 2. Fit the HyperTransformer
        
        Just like the transfomer, the HyperTransformer needs to be fitted before being able to transform
        data.
        
        This is done by calling its `fit` method passing the `data` DataFrame.
        
        ```python3
        ht.fit(data)
        ```
        
        ### 3. Transform the table data
        
        Once the HyperTransformer is fitted, we can pass the data again to its `transform` method in order
        to get the transformed version of the data.
        
        ```python3
        transformed = ht.transform(data)
        ```
        
        The output, will now be another `pandas.DataFrame` with the numerical representation of our
        data.
        
        ```
            0_int  0_int#1    1_float  1_float#1  2_str    3_datetime  3_datetime#1
        0  38.000      0.0  46.872441        0.0   0.70  1.612994e+18           0.0
        1  77.000      0.0  13.150228        0.0   0.90  1.626729e+18           0.0
        2  21.000      0.0  44.509511        1.0   0.70  1.599199e+18           1.0
        3  10.000      0.0  37.128869        0.0   0.15  1.571176e+18           0.0
        4  91.000      0.0  41.341214        0.0   0.45  1.604145e+18           0.0
        5  67.000      0.0  92.237335        0.0   0.45  1.599199e+18           1.0
        6  47.375      1.0  51.598682        0.0   0.90  1.585706e+18           0.0
        7  47.375      1.0  42.204396        0.0   0.15  1.584051e+18           0.0
        8  68.000      0.0  44.509511        1.0   0.15  1.614269e+18           0.0
        9   7.000      0.0  31.542918        0.0   0.45  1.594524e+18           0.0
        ```
        
        ### 4. Revert the table transformation
        
        In order to revert the transformation and recover the original data from the transformed one,
        we need to call `reverse_transform` method of the `HyperTransformer` instance passing it the
        transformed data.
        
        ```python3
        reversed_data = ht.reverse_transform(transformed)
        ```
        
        Which should output, again, a table that looks exactly like the original one.
        
        ```
           0_int    1_float 2_str          3_datetime
        0   38.0  46.872441     b 2021-02-10 21:50:00
        1   77.0  13.150228   NaN 2021-07-19 21:14:00
        2   21.0        NaN     b                 NaT
        3   10.0  37.128869     c 2019-10-15 21:39:00
        4   91.0  41.341214     a 2020-10-31 11:57:00
        5   67.0  92.237335     a                 NaT
        6    NaN  51.598682   NaN 2020-04-01 01:56:00
        7    NaN  42.204396     c 2020-03-12 22:12:00
        8   68.0        NaN     c 2021-02-25 16:04:00
        9    7.0  31.542918     a 2020-07-12 03:12:00
        ```
        
        
        # History
        
        ## 0.2.5 - 2020-09-18
        
        Miunor bugfixing release.
        
        # Bugs Fixed
        
        * Handle NaNs in OneHotEncodingTransformer - Issue [#118](https://github.com/sdv-dev/RDT/issues/118) by @csala
        * OneHotEncodingTransformer fails if there is only one category - Issue [#119](https://github.com/sdv-dev/RDT/issues/119) by @csala
        * All NaN column produces NaN values enhancement - Issue [#121](https://github.com/sdv-dev/RDT/issues/121) by @csala
        * Make the CategoricalTransformer learn the column dtype and restore it back - Issue [#122](https://github.com/sdv-dev/RDT/issues/122) by @csala
        
        ## 0.2.4 - 2020-08-08
        
        ### General Improvements
        
        * Support Python 3.8 - Issue [#117](https://github.com/sdv-dev/RDT/issues/117) by @csala
        * Support pandas >1 - Issue [#116](https://github.com/sdv-dev/RDT/issues/116) by @csala
        
        ## 0.2.3 - 2020-07-09
        
        * Implement OneHot and Label encoding as transformers - Issue [#112](https://github.com/sdv-dev/RDT/issues/112) by @csala
        
        ## 0.2.2 - 2020-06-26
        
        ### Bugs Fixed
        
        * Escape `column_name` in hypertransformer - Issue [#110](https://github.com/sdv-dev/RDT/issues/110) by @csala
        
        ## 0.2.1 - 2020-01-17
        
        ### Bugs Fixed
        
        * Boolean Transformer fails to revert when there are NO nulls - Issue [#103](https://github.com/sdv-dev/RDT/issues/103) by @JDTheRipperPC
        
        ## 0.2.0 - 2019-10-15
        
        This version comes with a brand new API and internal implementation, removing the old
        metadata JSON from the user provided arguments, and making each transformer work only
        with `pandas.Series` of their corresponding data type.
        
        As part of this change, several transformer names have been changed and a new BooleanTransformer
        and a feature to automatically decide which transformers to use based on dtypes have been added.
        
        Unit test coverage has also been increased to 100%.
        
        Special thanks to @JDTheRipperPC and @csala for the big efforts put in making this
        release possible.
        
        ### Issues
        
        * Drop the usage of meta - Issue [#72](https://github.com/sdv-dev/RDT/issues/72) by @JDTheRipperPC
        * Make CatTransformer.probability_map deterministic - Issue [#25](https://github.com/sdv-dev/RDT/issues/25) by @csala
        
        ## 0.1.3 - 2019-09-24
        
        ### New Features
        
        * Add attributes NullTransformer and col_meta - Issue [#30](https://github.com/sdv-dev/RDT/issues/30) by @ManuelAlvarezC
        
        ### General Improvements
        
        * Integrate with CodeCov - Issue [#89](https://github.com/sdv-dev/RDT/issues/89) by @csala
        * Remake Sphinx Documentation - Issue [#96](https://github.com/sdv-dev/RDT/issues/96) by @JDTheRipperPC
        * Improve README - Issue [#92](https://github.com/sdv-dev/RDT/issues/92) by @JDTheRipperPC
        * Document RELEASE workflow - Issue [#93](https://github.com/sdv-dev/RDT/issues/93) by @JDTheRipperPC
        * Add support to Python 3.7 - Issue [#38](https://github.com/sdv-dev/RDT/issues/38) by @ManuelAlvarezC
        * Create way to pass HyperTransformer table dict - Issue [#45](https://github.com/sdv-dev/RDT/issues/45) by @ManuelAlvarezC
        
        ## 0.1.2
        
        * Add a numerical transformer for positive numbers.
        * Add option to anonymize data on categorical transformer.
        * Move the `col_meta` argument from method-level to class-level.
        * Move the logic for missing values from the transformers into the `HyperTransformer`.
        * Removed unreacheble lines in `NullTransformer`.
        * `Numbertransfomer` to set default value to 0 when the column is null.
        * Add a CLA for collaborators.
        * Refactor performance-wise the transformers.
        
        ## 0.1.1
        
        * Improve handling of NaN in NumberTransformer and CatTransformer.
        * Add unittests for HyperTransformer.
        * Remove unused methods `get_types` and `impute_table` from HyperTransformer.
        * Make NumberTransformer enforce dtype int on integer data.
        * Make DTTransformer check data format before transforming.
        * Add minimal API Reference.
        * Merge `rdt.utils` into `HyperTransformer` class. 
        
        ## 0.1.0
        
        * First release on PyPI.
        
Keywords: rdt
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.5,<3.9
Description-Content-Type: text/markdown
Provides-Extra: test
Provides-Extra: dev
