Metadata-Version: 2.1
Name: data-quality-tests
Version: 0.1.5.3
Summary: Data Quality Check Library
Home-page: https://github.com/beekiran00/Data-Quality
Author: Bhanu Venkata Kiran Velpula
Author-email: beekiran00@gmail.com
License: MIT
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown

## DATA QUALITY

A library which acts as a test cases for dataframes. Simply pass in your dataframe after initial import, or at each stage of your EDA to check for data quality with one line of code.

The test cases include(as of now)
1. check for null values
2. check for duplicates
3. check for dtype matching
4. check for outliers
5. check for whitespaces in column headers

The test cases work as a Pass/Fail type, where Passed indicates, good data quality and Failed indicates bad data quality

Example: 

TEST CASE FOR NULL VALUES: Passed means that the dataframe has no null values. Failed indicates otherwise.

## Requirements

* Python 3+
* Pandas
* Numpy


## Installation

```python
pip install data-quality-tests
```

## Updates & Changes

1. the import function changed from:

```python
from data_quality import DataQuality
```

to the following:

```python
from data_quality_tests import DataQuality
```

2. new function ```outlier_columns``` has been added in this update, which displays all the columns that have outliers.  
*For use case, refer to the get started section*

3. ```data_quality_check``` now checks for column header whitespaces for leading and trailing.

## Get Started

How to use this library:

### Data quality check

The most basic usage of this library, here for simplifiction,  
let's just se the iris dataset from seaborn library.

You can use any dataset.

```python
from data_quality_tests import DataQuality as dq
import seaborn as sns

#declare any dataframe

df = sns.load_dataset("iris")

#pass the dataframe as below  

dq.data_quality_check(df)
```

### Outlier columns

Sometimes, the test case for outliers fails, this is because the dataset containes outliers.

use ```outier_columns(df)``` function to display all the columns that have outliers.  

*NOTE* *If the dataset does not have outlier columns, the output is an empty list.*


```python

# display columns that have outliers
dq.outlier_columns(df)

```
