Metadata-Version: 2.1
Name: dagrules
Version: 0.1.0
Summary: dagrules - dbt DAG rule creator and validator
Home-page: https://github.com/gnilrets/dagrules
Author: Sterling Paramore
Author-email: gnilrets@gmail.com
License: MIT
Keywords: dbt dag
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3
Description-Content-Type: text/markdown
Provides-Extra: dev
Provides-Extra: test
License-File: LICENSE.txt

# dagrules

dagrules is a tool that allows you to write your own dbt dag rules and check
that your dbt project is conforming to those rules.

## Overview

While the dbt community has established some excellent guidelines for
[how to structure dbt
projects](https://discourse.getdbt.com/t/how-we-structure-our-dbt-projects/355),
those conventions are not automatically enforced.  Those
conventions are simply guidelines, and each team may decide on a
slightly different set of conventions that work best for their
particular set up.  dagrules was developed to allow you to write your
own conventions in a simple `yaml` document, and have those
conventions enforced via your CI system.

To use dagrules, all you need is a dbt project and a `dagrules.yml`
file located in the root of the dbt project (e.g.,
`dbt/dagrules.yml`).  The yaml file should look like (for a more
complete example, see [tests/dagrules.yml](test/dagrules.yml):

````yaml
---
version: '1'
rules:
  - name: The name of my rule
    subject:
       ... # How to select nodes to check that they satisfy the rules
    must:
       ... # Define the rules that must be followed
  - name: Another one of my rules
    ...
````

### Installation and running rules

dagrules can be installed using pip:

````bash
pip install dagrules
````

And then run `dagrules` with the `--check` argument from your dbt project root:

````bash
dagrules --check
````

dagrules assumes that it is being executed from the dbt project root and that there is
a `target/manifest.json` file already present (so the dbt project must be compiled
any time the dag is changed before dagrules can be run).  These defaults can
be overridden by setting the `DBT_ROOT` and `DAGRULES_YAML` environment variable to
point to other locations.

## Subjects

For every rule, a subject should be declared that defines how to
select nodes of the dbt dag to use for rule validation.  Omitting the
subject means that the rule will be applied to every dbt model.
dagrules currently supports two ways to select subjects: 1) by node
type (source, snapshot, model) and 2) by tags.  For example, the
follow subject includes all models that are tagged "staging":

````yaml
rules:
  - name: All staging models must ...
    subject:
      type: model
      tags: staging
    must:
      ...
````


## Tag selection

Tag selection applies both to `subject` and `must` section of the
dagrules yaml spec.  Tags can be defined several ways.

**Single string** - Selecting with a single tag can be expressed as a simple string

````yaml
tags: staging
````

**List of tags: match any** - A list of tags can also be specified, and
dagrules will match nodes with **any** of the tags in the list.  The
example below will match nodes having either `base` or `intermediate`
tags.

````yaml
tags:
  - base
  - intermediate
````

**Include: match all with exclusions** - When you need to select nodes
that match **all** tags in a list, and possibly exclude nodes with
some tags as well, you can use include/exclude.  The example below
will select any nodes that have both "staging" and "finance" tags, but
that don't also have the `base` tag.

````yaml
tags:
  include:
    - staging
    - finance
  exclude:
    - base
````

The arguments to `include` and `exclude` can either be a list or single strings.


**Combine any/all** - We can also combine **any** and **all** syntaxes
at once.  The following will select all nodes that are either
"non-base staging", "core", or "mart" models.:

````yaml
tags:
  - include: staging
    exclude: base
  - core
  - mart
````

## Musts

"Musts" define the rules that must be adhered to by the subjects defined in the `subject`
section.  Multiple "musts" may be included in a rule definition, and all must be
satisfied for the rule to pass.

**Match name** - The `match-name` rule requires that each subject adhere to a
particular naming pattern.  dagrules currently only supports regular expression matching.
For example, the following rule enforces that all snapshot models must be named with
a `snap_` prefix:

````yaml
rules:
  - name: Snapshot must be prefixed with snap_
    subject:
      type: snapshot
    must:
      match-name: /snap_.*/
````

**Have tags** - The `have-tags-any` rule requires that all selected models must have
one of any of the listed tags.  The following example specifies that all nodes in the dag
must have at least one of the tags listed:

````yaml
rules:
  - name: All models must be tagged either snapshot, base, staging, intermediate, core, mart
    # Omit subject to include all nodes
    must:
      have-tags-any:
        - snaphost
        - base
        - staging
        - intermediate
        - core
        - mart
````

**Have parent or child relationship** - The `have-child-relationship`
and `have-parent-relationship` rules require that the subjects have a
certain kind of relationship to either their **immediate** children or
parents.  The types of relationship can involve:
  * `cardinality` - The cardinality of the relationship between a subject and its child/parent
    can either be `one_to_one` or `one_to_many` (default).  If `one_to_one` is selected,
    that a subject may only have one child/parent.
  * `required` - Indicates whether a child/parent relationship is required or not.  The default
    is `True`, meaning that if a relationship is defined, all subject must have at least
    one child or parent node.  If `False`, then a subject may have 0 children/parents.
  * `require-tags-any` - Contains a list of tags that the parent/child
    must have (with syntax defined in the "Tag selection" section
    above).
  * `require-node-type` - Indicates the node type (source, snapshot, model) that the child/parent
    must be in order to pass.
  * `select-tags-any` - Contains a list of tags that restricts the selection of parents/children
    involved in the rule.
  * `select-node-type` - Indicates that only the parents/children with the specified node
    type are to be considered when checking the rule.

For example,

````yaml
rules:
  - name: Snapshots must have 0 or 1 children, which must all be base models
    subject:
      type: snapshot
    must:
      have-child-relationship:
        cardinality: one_to_one
        required: false
        require-tags-any:
          - base

  - name: Intermediate models may only depend on non-base staging, core, mart, or other intermediate models
    subject:
      tags:
        include: intermediate
    must:
      have-parent-relationship:
        require-tags-any:
          - include: staging
            exclude: base
          - core
          - mart
          - intermediate
````


## Contributing

We welcome contributors!  Please submit any suggests or pull requests in Github.

### Developer setup

Create an appropriate python environment.  I like [miniconda](https://conda.io/miniconda.html),
but use whatever you like:

    conda create --name dagrules python=3.9
    conda activate dagrules

Then install pip packages

    pip install pip-tools
    pip install --ignore-installed -r requirements.txt

run tests via

    inv test

and the linter via

    inv lint


