Metadata-Version: 2.1
Name: pyhard
Version: 1.1
Summary: Instance hardness package
Home-page: https://gitlab.com/ita-ml/instance-hardness
Author: Pedro Paiva
Author-email: paiva@ita.br
License: MIT
Download-URL: https://gitlab.com/ita-ml/pyhard/-/archive/v1.1/pyhard-v1.1.tar.gz
Description: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gl/ita-ml%2Finstance-hardness/binder?filepath=notebooks%2F)
        [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)
        [![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://en.wikipedia.org/wiki/MIT_License)
        
        # PyHard
        
        _Instance Hardness Python package_
        
        <!--![picture](docs/img/circle-fs.png)-->
        
        ## Getting Started
        
        Python 3.7 is required. Matlab is also required in order to run [matilda](https://matilda.unimelb.edu.au/matilda/). As far as we know, only recent versions of Matlab offer an engine for Python 3. Namely, we only tested from version R2019b on.
        
        Alternatively, take a look at [_Graphene_](https://gitlab.com/ita-ml/graphene), the Instance Hardness Analytics Tool. Matlab, and even Python, are not required in this case!
        
        ### Installation
        1. __PyHard package__
        ```
        pip install -e git+https://gitlab.com/ita-ml/instance-hardness#egg=pyhard
        ```
        Alternatively, another way to install the package:
        ```
        git clone https://gitlab.com/ita-ml/instance-hardness.git
        cd instance-hardness/
        pip install -e .
        ```
        
        2. __Install Matlab engine for Python__  
        Refer to this [link](https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html), which contains detailed instructions.
        
        ### Usage
        
        Running the command:
        
        ```
        python3 -m pyhard
        ```
        
        The following steps shall be taken:
        
        1. Calculate the _hardness measures_;
        
        2. Evaluate classification performance at instance level for each algorithm;
        
        3. Select the most relevant hardness measures with respect to the instance classification error;
        
        4. Join the outputs of steps 1, 2 and 3 to build the _metadata_ file (`metadata.csv`);
        
        5. Run __matilda__, which generates the _Instance Space_ (IS) representation and the _footprint_ areas;
        
        6. To explore the results from step 5, launch the visualization dashboard: 
        `python3 -m pyhard --app`
        
        
        One can choose which steps should be disabled or not
        
        * `--no-meta`: does not attempt to build the metadata file
        
        * `--no-matilda`: does not run matilda
        
        
        To see all command line options, run `python3 -m pyhard --help` to display help.
        
        
        ### Configuration
        
        The file `instance-hardness/conf/config.yaml` is used to configurate steps 1-4. Through this, options for file paths, measures, classifiers, feature selection and hyper-parameter optimization can be set. Inside the file, more instructions can be found.
        
        A configuration file in another location can be specified in the command line: 
        `python3 -m pyhard -c path/to/new_config.yaml`
        
        
        ### Visualization
        
        #### Demo
        
        ![picture](docs/img/demo.png)
        
        The demo visualization app can display any dataset located in `instance-hardness/data/`. Each folder within this directory (whose name is the problem name) should contain those three files:
        
        - `data.csv`: the dataset itself;
        
        - `metadata.csv`: the metadata with measures and algorithm performances (`feature_` and `algo_` columns);
        
        - `coordinates.csv`: the instance space coordinates generated by Matilda.
        
        The showed data can be chosen through the app interface. To run it use the command:
        
        ```
        python3 -m pyhard --demo
        ```
        
        New problems may be added as a new folder in `data/`. Multidimensional data will be reduced with the chosen dimensionality reduction method.
        
        #### App
        
        ![picture](docs/img/animation.gif)
        
        Through command line it is possible to launch an app for visualization of 2D-datasets along with their respective instance space. The graphics are linked, and options for color and displayed hover are available. In order to run the app, use the command:
        
        ```
        python3 -m pyhard --app
        ```
        
        It should open the browser automatically and display the data.
        
        
        ## References
        
        _Base_
        
        1. Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014. __An instance level analysis of data complexity__. Mach. Learn. 95, 2 (May 2014), 225–256.
        
        2. Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin Kam Ho. 2019. __How Complex Is Your Classification Problem? A Survey on Measuring Classification Complexity__. ACM Comput. Surv. 52, 5, Article 107 (October 2019), 34 pages.
        
        3. Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, and Kate Smith-Miles. 2018. __Instance spaces for machine learning classification__. Mach. Learn. 107, 1 (January   2018), 109–147.
        
        _Feature selection_
        
        4. Luiz H. Lorena, André C. Carvalho, and Ana C. Lorena. 2015. __Filter Feature Selection for One-Class Classification__. Journal of Intelligent and Robotic Systems 80, 1 (October   2015), 227–243.
        
        5. Artur J. Ferreira and MáRio A. T. Figueiredo. 2012. __Efficient feature selection filters for high-dimensional data__. Pattern Recognition Letters 33, 13 (October, 2012), 1794–1804.
        
        6. Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. __Feature Selection: A Data Perspective__. ACM Comput. Surv. 50, 6, Article 94 (January 2018), 45 pages.
        
        7. Shuyang Gao, Greg Ver Steeg, and Aram Galstyan. __Efficient Estimation of Mutual Information for Strongly Dependent Variables__. Available in http://arxiv.org/abs/1411.2003. AISTATS, 2015.
        
        _Hyper parameter optimization_
        
        8. James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. __Algorithms for hyper-parameter optimization__. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11). Curran Associates Inc., Red Hook, NY, USA, 2546–2554.
        
        9. Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. __Practical Bayesian optimization of machine learning algorithms__. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’12). Curran Associates Inc., Red Hook, NY, USA, 2951–2959.
          
        10. J. Bergstra, D. Yamins, and D. D. Cox. 2013. __Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures__. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML’13). JMLR.org, I–115–I–123.
          
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6, <3.8
Description-Content-Type: text/markdown
