Metadata-Version: 2.1
Name: document-tools
Version: 0.1.1
Summary: 🔧 Tools to automate your document understanding tasks.
Home-page: https://github.com/deeptools-ai/document-tools
License: Apache-2.0
Author: deeptools.ai
Author-email: contact@deeptools.ai
Requires-Python: >=3.7,<4.0
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Provides-Extra: dev
Provides-Extra: doc
Provides-Extra: test
Requires-Dist: Jinja2 (==3.0.3); extra == "doc"
Requires-Dist: Pillow (>=9.1.1,<10.0.0)
Requires-Dist: black (==22.3.0); extra == "test"
Requires-Dist: bump2version (>=1.0.1,<2.0.0); extra == "dev"
Requires-Dist: datasets (>=2.3.2,<3.0.0)
Requires-Dist: flake8 (>=3.9.2,<4.0.0); extra == "test"
Requires-Dist: flake8-docstrings (>=1.6.0,<2.0.0); extra == "test"
Requires-Dist: ipykernel (>=6.15.0,<7.0.0); extra == "dev"
Requires-Dist: isort (>=5.10.1,<6.0.0); extra == "test"
Requires-Dist: mkdocs (>=1.3.0,<2.0.0); extra == "doc"
Requires-Dist: mkdocs-autorefs (>=0.4.1,<0.5.0); extra == "doc"
Requires-Dist: mkdocs-include-markdown-plugin (>=1.0.0,<2.0.0); extra == "doc"
Requires-Dist: mkdocs-material (>=8.3.6,<9.0.0); extra == "doc"
Requires-Dist: mkdocs-material-extensions (>=1.0.3,<2.0.0)
Requires-Dist: mkdocstrings[python] (>=0.19.0,<0.20.0); extra == "doc"
Requires-Dist: mypy (>=0.961,<0.962); extra == "test"
Requires-Dist: pip (>=20.3.1,<21.0.0); extra == "dev"
Requires-Dist: pre-commit (>=2.19.0,<3.0.0); extra == "dev"
Requires-Dist: pytesseract (>=0.3.9,<0.4.0); extra == "test" or extra == "dev"
Requires-Dist: pytest (>=7.1.2,<8.0.0); extra == "test"
Requires-Dist: pytest-cov (>=3.0.0,<4.0.0); extra == "test"
Requires-Dist: toml (>=0.10.2,<0.11.0); extra == "dev"
Requires-Dist: tox (>=3.25.0,<4.0.0); extra == "dev"
Requires-Dist: transformers (>=4.20.0,<5.0.0)
Requires-Dist: twine (>=4.0.1,<5.0.0); extra == "dev"
Requires-Dist: virtualenv (>=20.2.2,<21.0.0); extra == "dev"
Description-Content-Type: text/markdown

# Document Tools


[![pypi](https://img.shields.io/pypi/v/document-tools.svg)](https://pypi.org/project/document-tools/)
[![python](https://img.shields.io/pypi/pyversions/document-tools.svg)](https://pypi.org/project/document-tools/)
[![Build Status](https://github.com/deeptools-ai/document-tools/actions/workflows/dev.yml/badge.svg)](https://github.com/deeptools-ai/document-tools/actions/workflows/dev.yml)
[![codecov](https://codecov.io/gh/deeptools-ai/document-tools/branch/main/graphs/badge.svg)](https://codecov.io/github/deeptools-ai/document-tools)



🔧 Tools to automate your document understanding tasks.

This package contains tools to automate your document understanding tasks by leveraging the power of
[🤗 Datasets](https://github.com/huggingface/datasets) and [🤗 Transformers](https://github.com/huggingface/transformers).

With this package, you can (or will be able to):

- 🚧 **Create** a dataset from a collection of documents.
- ✅ **Transform** a dataset to a format that is suitable for training a model.
- 🚧 **Train** a model on a dataset.
- 🚧 **Evaluate** the performance of a model on a dataset of documents.
- 🚧 **Export** a model to a format that is suitable for inference.


## Features

This project is under development and is in the alpha stage. It is not ready for production use, and if you find any
bugs or have any suggestions, please let us know by opening an [issue](https://github.com/deeptools-ai/document-tools/issues)
or a [pull request](https://github.com/deeptools-ai/document-tools/pulls).

### Featured models

- [ ] [DiT](https://huggingface.co/docs/transformers/model_doc/dit)
- [x] [LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)
- [x] [LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)
- [ ] [LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)

## Usage

One-liner to get started:

```python
from datasets import load_dataset
from document_tools import tokenize_dataset

# Load a dataset from 🤗 Hub
dataset = load_dataset("deeptools-ai/test-document-invoice", split="train")

# Tokenize the dataset
tokenized_dataset = tokenize_dataset(dataset, target_model="layoutlmv3")
```

For more information, please see the [documentation](https://deeptools-ai.github.io/document-tools/)

## Credits

This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and the [waynerv/cookiecutter-pypackage](https://github.com/waynerv/cookiecutter-pypackage) project template.

