Metadata-Version: 2.1
Name: zeef
Version: 0.1.2
Summary: A Python Framework for Deep Active Learning
Home-page: https://github.com/MLSysOps/zeef
Author: Yizheng Huang
Author-email: huangyz0918@gmail.com
License: UNKNOWN
Download-URL: https://github.com/MLSysOps/zeef/archive/master.zip
Keywords: active learning,deep learning,data processing,data mining,neural networks
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE

# Zeef: Active Learning for Data-Centric AI

![PyPI](https://img.shields.io/pypi/v/zeef?color=blue) ![PyPI - Downloads](https://img.shields.io/pypi/dm/zeef) [![Testing](https://github.com/MLSysOps/zeef/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/MLSysOps/zeef/actions/workflows/main.yml) [![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2FMLSysOps%2Fdeepal.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2FMLSysOps%2Fdeepal?ref=badge_shield)

An active learning framework that can be applied to real-world scenarios that leak labeled data.

## Installation

```shell
pip install zeef
```

For the local development, you can install from the [Anaconda](https://www.anaconda.com/) environment by

```shell
conda env create -f environment.yml
```

## Quick Start

We can start from the easiest example: random select data points from an unlabeled data pool.

```python
from sklearn import svm

from zeef.data import Pool
from zeef.learner.sklearn import Learner
from zeef.strategy import RandomSampling

learner = Learner(net=svm.SVC(probability=True))  # define the learner.
data_pool = Pool(unlabeled_data)  # generate the data pool.
strategy = RandomSampling(data_pool, learner=learner)  # define the sampling strategy.

query_ids = strategy.query(1000)  # query 1k samples for labeling.
strategy.update(query_ids, data_labels)  # label the 1k samples.
strategy.learn()  # train the model using all the labeled data.
strategy.infer(test_data)  # evaluate the model.
```

A quick MNIST CNN example can be found in [here](examples/mnist/main_torch.py). Run

```shell
python main_torch.py
```

to start the quick demonstration.

## License

[Apache License 2.0](./LICENSE)


