Metadata-Version: 2.1
Name: lsf-stats
Version: 0.2.2
Summary: Summarize LSF job properties by parsing log files.
Home-page: https://github.com/kpj/lsf_stats
License: MIT
Author: kpj
Author-email: kim.philipp.jablonski@gmail.com
Requires-Python: >=3.8.0,<4.0.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: click (>=8.0.1,<9.0.0)
Requires-Dist: humanize (>=3.9.0,<4.0.0)
Requires-Dist: ipython (>=7.24.1,<8.0.0)
Requires-Dist: matplotlib (>=3.4.2,<4.0.0)
Requires-Dist: pandas (>=1.2.4,<2.0.0)
Requires-Dist: pyskim (==0.1.3)
Requires-Dist: seaborn (>=0.11.1,<0.12.0)
Requires-Dist: tqdm (>=4.61.1,<5.0.0)
Project-URL: Repository, https://github.com/kpj/lsf_stats
Description-Content-Type: text/markdown

# lsf_stats

[![PyPI](https://img.shields.io/pypi/v/lsf_stats.svg?style=flat)](https://pypi.python.org/pypi/lsf_stats)
[![Tests](https://github.com/kpj/lsf_stats/workflows/Tests/badge.svg)](https://github.com/kpj/lsf_stats/actions)

Summarize [LSF](https://www.ibm.com/support/pages/what-lsf-cluster) job properties by parsing log files of workflows executed by [Snakemake](https://github.com/snakemake/snakemake/).


## Installation

```python
$ pip install lsf_stats
```


## Usage

```bash
$ lsf_stats --help
Usage: lsf_stats [OPTIONS] COMMAND [ARGS]...

  Summarize LSF job properties by parsing log files.

Options:
  --help  Show this message and exit.

Commands:
  gather     Aggregate information from log files in single dataframe.
  summarize  Summarize and visualize aggregated information.
```

### Example

Assume that you executed your Snakemake workflow using the [lsf-profile](https://github.com/Snakemake-Profiles/lsf) and all generated log files are stored in the directory `./logs/`:
```bash
$ snakemake --profile lsf
[..]
```

You can then quickly aggregate resource, rule and other types of information about the workflow execution into a single dataframe:
```bash
$ lsf_stats gather -o workflow_stats.csv.gz ./logs/
[..]
```

This dataframe can then be summarized in various ways:
```bash
$ lsf_stats summarize \
    --query 'status == "Successfully completed."' \
    --split-wildcards \
    --grouping-variable category \
    workflow_stats.csv.gz
[..]
```

For example, the following plots will be generated:
Job execution                                 |  Job resources
:--------------------------------------------:|:----------------------------------------:
![Job execution](gallery/job_completions.png) | ![Job resources](gallery/scatterplot.png)

