Metadata-Version: 2.1
Name: datayoga-core
Version: 1.63.0
Summary: DataYoga for Python
License: Apache-2.0
Author: DataYoga
Author-email: admin@datayoga.io
Requires-Python: >=3.7,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Database
Provides-Extra: cassandra
Provides-Extra: mssql
Provides-Extra: mysql
Provides-Extra: oracle
Provides-Extra: pg
Provides-Extra: redis
Provides-Extra: test
Requires-Dist: PyMySQL (>=1.0.2,<2.0.0) ; extra == "mysql" or extra == "test"
Requires-Dist: PyYAML (>=6.0,<7.0)
Requires-Dist: SQLAlchemy (>=1.4.44,<2.0.0) ; extra == "mssql" or extra == "mysql" or extra == "pg" or extra == "oracle" or extra == "test"
Requires-Dist: cassandra-driver (>=3.25.0,<4.0.0) ; extra == "cassandra" or extra == "test"
Requires-Dist: certifi (>=2022.12.7,<2023.0.0)
Requires-Dist: cryptography (>=38.0.4,<39.0.0)
Requires-Dist: jmespath (>=1.0.0,<2.0.0)
Requires-Dist: jsonschema (>=4.4.0,<5.0.0)
Requires-Dist: mock (>=4.0.3,<5.0.0) ; extra == "test"
Requires-Dist: oracledb (>=1.2.2,<2.0.0) ; extra == "oracle" or extra == "test"
Requires-Dist: psycopg2-binary (>=2.9.5,<3.0.0) ; extra == "pg" or extra == "test"
Requires-Dist: pymssql (>=2.2.7,<3.0.0) ; extra == "mssql" or extra == "test"
Requires-Dist: pysqlite3-binary (>=0.4.0,<0.5.0) ; sys_platform == "linux"
Requires-Dist: pytest (>=7.1.2,<8.0.0) ; extra == "test"
Requires-Dist: pytest-asyncio (>=0.20.2,<0.21.0) ; extra == "test"
Requires-Dist: pytest-describe (>=2.0.1,<3.0.0) ; extra == "test"
Requires-Dist: pytest-mock (>=3.7.0,<4.0.0) ; extra == "test"
Requires-Dist: pytest-timeout (>=2.1.0,<3.0.0) ; extra == "test"
Requires-Dist: redis (>=4.3.5,<5.0.0) ; extra == "redis" or extra == "test"
Requires-Dist: requests-mock (>=1.9.3,<2.0.0) ; extra == "test"
Requires-Dist: sqlglot (>=10.4.3,<11.0.0)
Requires-Dist: testcontainers (>=3.7.0,<4.0.0) ; extra == "test"
Project-URL: url, https://datayoga.io
Description-Content-Type: text/markdown

# DataYoga Core

## Introduction

`datayoga-core` is the transformation engine used in `DataYoga`, a framework for building and generating data pipelines.

## Installation

```bash
pip install datayoga-core
```

## Quick Start

This demonstrates how to transform data using a DataYoga job.

### Create a Job

Use this `example.yaml`:

```yaml
steps:
  - uses: add_field
    with:
      fields:
        - field: full_name
          language: jmespath
          expression: concat([fname, ' ' , lname])
        - field: country
          language: sql
          expression: country_code || ' - ' || UPPER(country_name)
  - uses: rename_field
    with:
      fields:
        - from_field: fname
          to_field: first_name
        - from_field: lname
          to_field: last_name
  - uses: remove_field
    with:
      fields:
        - field: credit_card
        - field: country_name
        - field: country_code
  - uses: map
    with:
      expression:
        {
          first_name: first_name,
          last_name: last_name,
          greeting: "'Hello ' || CASE WHEN gender = 'F' THEN 'Ms.' WHEN gender = 'M' THEN 'Mr.' ELSE 'N/A' END || ' ' || full_name",
          country: country,
          full_name: full_name
        }
      language: sql
```

### Transform Data Using `datayoga-core`

Use this code snippet to transform a data record using the job defined [above](#create-a-job). The transform method returns a tuple of processed, filtered, and rejected records:

```python
import datayoga_core as dy
from datayoga_core.job import Job
from datayoga_core.result import Result, Status
from datayoga_core.utils import read_yaml

job_settings = read_yaml("example.yaml")
job = dy.compile(job_settings)

assert job.transform([{"fname": "jane", "lname": "smith", "country_code": 1, "country_name": "usa", "credit_card": "1234-5678-0000-9999", "gender": "F"}]).processed == [
  Result(status=Status.SUCCESS, payload={"first_name": "jane", "last_name": "smith", "country": "1 - USA", "full_name": "jane smith", "greeting": "Hello Ms. jane smith"})]
```

The job can also be provided as a parsed json inline:

```python
import datayoga_core as dy
from datayoga_core.job import Job
from datayoga_core.result import Result, Status
import yaml
import textwrap

job_settings = textwrap.dedent("""
  steps:
    - uses: add_field
      with:
        fields:
          - field: full_name
            language: jmespath
            expression: concat([fname, ' ' , lname])
          - field: country
            language: sql
            expression: country_code || ' - ' || UPPER(country_name)
    - uses: rename_field
      with:
        fields:
          - from_field: fname
            to_field: first_name
          - from_field: lname
            to_field: last_name
    - uses: remove_field
      with:
        fields:
          - field: credit_card
          - field: country_name
          - field: country_code
    - uses: map
      with:
        expression:
          {
            first_name: first_name,
            last_name: last_name,
            greeting: "'Hello ' || CASE WHEN gender = 'F' THEN 'Ms.' WHEN gender = 'M' THEN 'Mr.' ELSE 'N/A' END || ' ' || full_name",
            country: country,
            full_name: full_name
          }
        language: sql
""")
job = dy.compile(yaml.safe_load(job_settings))

assert job.transform([{"fname": "jane", "lname": "smith", "country_code": 1, "country_name": "usa", "credit_card": "1234-5678-0000-9999", "gender": "F"}]).processed == [
  Result(status=Status.SUCCESS, payload={"first_name": "jane", "last_name": "smith", "country": "1 - USA", "full_name": "jane smith", "greeting": "Hello Ms. jane smith"})]
```

As can be seen, the record has been transformed based on the job:

- `fname` field renamed to `first_name`.
- `lname` field renamed to `last_name`.
- `country` field added based on an SQL expression.
- `full_name` field added based on a [JMESPath](https://jmespath.org/) expression.
- `greeting` field added based on an SQL expression.

### Examples

- Add a new field `country` out of an SQL expression that concatenates `country_code` and `country_name` fields after upper case the later:

  ```yaml
  uses: add_field
  with:
    field: country
    language: sql
    expression: country_code || ' - ' || UPPER(country_name)
  ```

- Rename `fname` field to `first_name` and `lname` field to `last_name`:

  ```yaml
  uses: rename_field
  with:
    fields:
      - from_field: fname
        to_field: first_name
      - from_field: lname
        to_field: last_name
  ```

- Remove `credit_card` field:

  ```yaml
  uses: remove_field
  with:
    field: credit_card
  ```

For a full list of supported block types [see reference](https://datayoga-io.github.io/library).

## Expression Language

DataYoga supports both SQL and [JMESPath](https://jmespath.org/) expressions. JMESPath are especially useful to handle nested JSON data, while SQL is more suited to flat row-like structures.

For more information about custom functions and supported expression language syntax [see reference](https://datayoga-io.github.io/expressions).

