Metadata-Version: 2.1
Name: datalabs
Version: 0.1.3.dev0
Summary: Datalabs
Home-page: https://github.com/expressai/datalabs
Author: expressai
Author-email: stefanpengfei@gamil.com
License: Apache 2.0
Download-URL: https://github.com/expressai/datalabs/tags
Keywords: dataset
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown
Provides-Extra: audio
Provides-Extra: apache-beam
Provides-Extra: tensorflow
Provides-Extra: tensorflow_gpu
Provides-Extra: torch
Provides-Extra: s3
Provides-Extra: streaming
Provides-Extra: dev
Provides-Extra: tests
Provides-Extra: quality
Provides-Extra: benchmarks
Provides-Extra: docs
License-File: LICENSE

# DataLab

## Installation
#### Install

   
    pip install --upgrade pip
    pip install datalabs
 

   or 

    # This is suitable for SDK developers
    pip install --upgrade pip
    git clone git@github.com:ExpressAI/DataLab.git
    cd Datalab
    pip install .
 

 
#### Dataset Operation
 

 

```python

# pip install datalabs
from datalabs import operations, load_dataset
from featurize import *

 
dataset = load_dataset("ag_news")

# print(task schema)
print(dataset['test']._info.task_templates)

# data operators
res = dataset["test"].apply(get_text_length)
print(next(res))


# get entity
res = dataset["test"].apply(get_entities_spacy)
print(next(res))

# get postag
res = dataset["test"].apply(get_postag_spacy)
print(next(res))

from edit import *
# add typos
res = dataset["test"].apply(add_typo)
print(next(res))

#  change person name
res = dataset["test"].apply(change_person_name)
print(next(res))



```

### Task Schema

* `text-classification`
    * `text`:str
    * `label`:ClassLabel
    
* `text-matching`
    * `text1`:str
    * `text2`:str
    * `label`:ClassLabel
    
* `summarization`
    * `text`:str
    * `summary`:str
    
* `sequence-labeling`
    * `tokens`:List[str]
    * `tags`:List[ClassLabel]
    
* `question-answering-extractive`:
    * `context`:str
    * `question`:str
    * `answers`:List[{"text":"","answer_start":""}]


one can use `dataset[SPLIT]._info.task_templates` to get more useful task-dependent information, where
`SPLIT` could be `train` or `validation` or `test`.


### Supported Datasets
* [here](https://github.com/ExpressAI/DataLab/tree/main/datasets)

   

## Acknowledgment
DataLab originated from a fork of the awesome Huggingface Datasets and TensorFlow Datasets. We highly thank the Huggingface/TensorFlow Datasets for building this amazing library. More details on the differences between DataLab and them can be found in the section



