Metadata-Version: 2.1
Name: d2ssect
Version: 0.0.4
Summary: Calculate d2s scores from short reads
Author-email: Jia Zhang <jia.zhang2@my.jcu.edu.au>, Ira Cooke <ira.cooke@jcu.edu.au>
License: The MIT License (MIT)
        Copyright © 2022 <copyright holders>
        
        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Keywords: feed,reader,tutorial
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# d2ssect

![example workflow](https://github.com/bakeronit/d2ssect/actions/workflows/run_test_v0.yml/badge.svg)

A tool to calculate d2s scores using short fastq reads
This repo will test and benchmark the existing [alignment-free tools](https://github.com/chanlab-genomics/alignment-free-tools) and the improving versions.

The originally version of this pipeline including three big steps:
1. get jellyfish count results
2. calculate d2s using jellyfish dump results of every pair of samples
3. generate a matrix

Our goal is to integrate these three steps and try to increase the speed of d2s calculation.


## Installation

`d2ssect` relies heavily on [jellyfish](https://github.com/gmarcais/Jellyfish).  You need the jellyfish program and also the jellyfish libraries.  To check that jellyfish is installed you can do;
```bash
jellyfish --version
```
Which should return a version > 2. In addition, you need the jellyfish libraries and headers. If you installed jellyfish via `conda` or by compiling from source these will be present in the right locations.  If you installed it your linux package manager they probably won't be present. 

If you do not want to use `conda` we recommend installing Jellyfish from source.  Once done you should then be able to install `d2ssect` using pip

```bash
pip3 install d2ssect
```



## Usage

Lets say we have a collection of fasta files corresponding to sequencing reads from samples that we want to compare with `d2ssect`.  First count kmers in these files using `jellyfish`

```bash
for f in *.fasta;do jellyfish count -m 21 -s 10000000 $f -o ${f%.fasta}.jf ;done
```

Note that the command above will create a corresponding `.jf` file for every `.fasta` file in the current directory. By keeping the base names of the `jf` and `fasta` files identical we can then run `d2ssect` as follows;

```bash
python3 ../d2ssect/d2ssect/main.py -l *.jf -f *.fasta
```


## Building from source

```
CC=g++ pip install .
```
