Metadata-Version: 2.1
Name: dataquality
Version: 0.8.11
Summary: dataquality
Home-page: https://www.github.com/rungalileo/dataquality
Author: Galileo Technologies, Inc
Author-email: team@rungalileo.io
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: pydantic==1.8.2
Requires-Dist: requests==2.26.0
Requires-Dist: types-requests==2.25.2
Requires-Dist: pandas>=0.20.0
Requires-Dist: pyarrow>=5.0.0
Requires-Dist: vaex-core==4.16.0
Requires-Dist: vaex-hdf5>=0.12,<0.13
Requires-Dist: diskcache==5.2.1
Requires-Dist: resource==0.2.1
Requires-Dist: tqdm==4.62.3
Requires-Dist: blake3==0.2.1
Requires-Dist: wrapt==1.13.3
Requires-Dist: scipy>=1.7.0
Requires-Dist: cachetools==5.2.0
Requires-Dist: importlib-metadata==4.12.0
Requires-Dist: evaluate
Requires-Dist: datasets==2.6.1
Requires-Dist: transformers<4.25.0
Requires-Dist: seqeval
Requires-Dist: sentence-transformers>=2.2
Requires-Dist: Pillow
Requires-Dist: h5py >=3.1.0
Requires-Dist: numpy<1.24.0
Requires-Dist: flake8==3.9.2 ; extra == "dev"
Requires-Dist: black==22.3.0 ; extra == "dev"
Requires-Dist: isort==5.9.3 ; extra == "dev"
Requires-Dist: autoflake==1.4 ; extra == "dev"
Requires-Dist: jupyter==1.0.0 ; extra == "dev"
Requires-Dist: mkdocs >=1.1.2,<2.0.0 ; extra == "doc"
Requires-Dist: mkdocs-material >=5.4.0,<6.0.0 ; extra == "doc"
Requires-Dist: pytest==6.2.5 ; extra == "test"
Requires-Dist: mypy==0.971 ; extra == "test"
Requires-Dist: freezegun==1.2.2 ; extra == "test"
Requires-Dist: coverage==6.1.1 ; extra == "test"
Requires-Dist: pytest-cov==3.0.0 ; extra == "test"
Requires-Dist: scikit-learn==1.0 ; extra == "test"
Requires-Dist: tensorflow==2.9.1 ; extra == "test"
Requires-Dist: pytest-env==0.6.2 ; extra == "test"
Requires-Dist: spacy==3.2.1 ; extra == "test"
Requires-Dist: types-setuptools==65.5.0.2 ; extra == "test"
Requires-Dist: types-cachetools==5.2.1 ; extra == "test"
Requires-Dist: torchvision==0.13.1 ; extra == "test"
Requires-Dist: torch==1.12.1 ; extra == "test"
Requires-Dist: torchtext==0.13.1 ; extra == "test"
Requires-Dist: torchdata==0.4.1 ; extra == "test"
Project-URL: Documentation, https://rungalileo.gitbook.io/galileo
Provides-Extra: dev
Provides-Extra: doc
Provides-Extra: test

# dataquality

The Official Python Client for [Galileo](https://rungalileo.io).

Galileo is a tool for understanding and improving the quality of your NLP (and soon CV!) data.

Galileo gives you access to all of the information you need, at a UI and API level, to continuously build better and more robust datasets and models.

`dataquality` is your entrypoint to Galileo. It helps you start and complete the loop of data quality improvements.

## Getting Started

Install the package.
```sh
pip install dataquality
```

Create an account at [Galileo](https://console.cloud.rungalileo.io/sign-up)

Grab your [token](https://console.cloud.rungalileo.io/get-token)

Get your dataset and analyze it with `dq.auto`
(You will be prompted for your token here)
```python
import dataquality as dq

dq.auto(
    train_data="/path/to/train.csv",
    val_data="/path/to/val.csv",
    test_data="/path/to/test.csv",
    project_name="my_first_project",
    run_name="my_first_run",
)
```

☕️ Wait for Galileo to train your model and analyze the results.  
✨ A link to your run will be provided automatically

### What kinds of datasets can I analyze?
Currently, you can analyze **Text Classification** and **NER**

If you want support for other kinds, [reach out!](https://github.com/rungalileo/dataquality/issues/new?assignees=ben-epstein&labels=enhancement&template=feature.md&title=%5BFEATURE%5D)

### Can I use auto with other data forms?
`auto` params `train_data`, `val_data`, and `test_data` can also take as input pandas dataframes and huggingface dataframes!

### What if all my data is in huggingface?
Use the `hf_data` param to point to a dataset in huggingface
```python
import dataquality as dq

dq.auto("rungalileo/emotion")
```

### Anything else? Can I learn more?
Run `help(dq.auto)` for more information on usage<br>
Check out our [docs](https://rungalileo.gitbook.io/galileo/getting-started/add-your-data-to-galileo/dq-auto) for the inspiration behind this methodology.


## Can I analyze data using a custom model?
Yes! Check out our [full documentation](https://rungalileo.gitbook.io/galileo/getting-started/byom-bring-your-own-model) and [example notebooks](https://rungalileo.gitbook.io/galileo/example-notebooks) on how to integrate your own model with Galileo

## Contibuting

Read our [contributing doc](./CONTRIBUTING.md)!


