Metadata-Version: 2.1
Name: format_conversation_dataset
Version: 0.0.1
Summary: tools for formatting datasets for fine tuning
Home-page: https://github.com/russedavid/format_conversation_dataset
Author: David Russell
Author-email: david.russell04@gmail.com
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev

# format_conversation_dataset


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

Convert your diarized content into a dataset that can be used to
finetune a model!

## Install

``` sh
pip install format_conversation_dataset
```

## How to use

Designate a speaker number as the ‘assistant’ and supply input and
output file paths, and this module with do the rest.

from format_covnersation_dataset.core import \*

convert_file(‘input/file/path’, ‘output/file/path’, 1, “You are
participating in a conversation”)

This will output a json format like so:

{‘messages’: \[ { ‘role’ : ‘system’, ‘content’ : ‘You are participating
in a conversation’ }, { ‘role’ : ‘user’, ‘content’ : ‘SPEAKER_02 : Hello
everyone SPEAKER_03 : Good morning SPEAKER_04 : Hi’ }, { ‘role’ :
‘assistant’, ‘content’ : ‘Hi!’ },  
\]}
