Metadata-Version: 2.1
Name: pyhard
Version: 0.3
Summary: Instance hardness package
Home-page: https://gitlab.com/ita-ml/instance-hardness
Author: Pedro Paiva
Author-email: paiva@ita.br
License: UNKNOWN
Description: # PyHard
        
        _Instance Hardness Python package_
        
        ## Getting Started
        
        Python 3.7 is required. Matlab is also required in order to run [matilda](https://matilda.unimelb.edu.au/matilda/). As far as we know, only recent versions of Matlab offer an engine for Python 3. Namely, we have tested only version R2020a.
        
        ### Installation
        1. __Clone repository__
        ```
        git clone https://gitlab.com/ita-ml/instance-hardness.git
        ```
        
        2. __Install package via pip__
        ```
        cd instance-hardness/
        pip install -e .
        ```
        
        3. __Install Matlab engine for Python__  
        Refer to this [link](https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html), which contains detailed instructions.
        
        ### Usage
        
        In the command line (terminal):
        
        ```
        cd your/path/instance-hardness
        python pyhard
        ```
        
        Or run it from elsewhere with:
        
        ```
        python -m pyhard
        ```
        
        It should generate the `metadata.csv` file and run the Matilda software.
        
        One can choose which steps should be disabled or not (e.g. `--no-meta` or `--no-matilda`). To see all command line options, run `python pyhard -h` for help.
        
        ### Visualization
        
        #### Demo
        
        The demo visualization app can display any dataset located in `your-path/instance-hardness/data/`. Each folder within this directory (whose name is the problem name) should contain those three files:
        
        - `data.csv`: the dataset itself;
        
        - `metadata.csv`: the metadata with measures and algorithm performances (`feature_` and `algo_` columns);
        
        - `coordinates.csv`: the instance space coordinates generated by Matilda.
        
        The showed data can be chosen through the app interface. To run it use the command:
        
        ```
        python -m pyhard --demo
        ```
        
        New problems may be added as a new folder in `data/`. Multidimensional data will be reduced with the chosen dimensionality reduction method.
        
        #### App
        
        Through command line it is possible to launch an app for visualization of 2D-datasets along with their respective instance space. The graphics are linked, and options for color and displayed hover are available. In order to run only the app:
        
        ```
        python -m pyhard --no-meta --no-matilda --app
        ```
        
        It should open the browser automatically and display the data.
        
        
        ### Configuration
        
        See the file `config.yaml` in `/instance-hardness/conf/`. It contains options for file paths, measures to be calculated, which classifiers to use and their parametrization.
        
        
        ## References
        
        1. Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014. __An instance level analysis of data complexity__. Mach. Learn. 95, 2 (May 2014), 225–256.
        
        2. Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin Kam Ho. 2019. __How Complex Is Your Classification Problem? A Survey on Measuring Classification Complexity__. ACM Comput. Surv. 52, 5, Article 107 (October 2019), 34 pages.
        
        3. Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, and Kate Smith-Miles. 2018. __Instance spaces for machine learning classification__. Mach. Learn. 107, 1 (January   2018), 109–147.
        
        4. Luiz H. Lorena, André C. Carvalho, and Ana C. Lorena. 2015. __Filter Feature Selection for One-Class Classification__. Journal of Intelligent and Robotic Systems 80, 1 (October   2015), 227–243.
        
        5. Artur J. Ferreira and MáRio A. T. Figueiredo. 2012. __Efficient feature selection filters for high-dimensional data__. Pattern Recognition Letters 33, 13 (October, 2012), 1794–1804.
        
        6. Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. __Feature Selection: A Data Perspective__. ACM Comput. Surv. 50, 6, Article 94 (January 2018), 45 pages.
        
        7. Shuyang Gao, Greg Ver Steeg, and Aram Galstyan. __Efficient Estimation of Mutual Information for Strongly Dependent Variables__. Available in http://arxiv.org/abs/1411.2003. AISTATS, 2015.
        
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
