Metadata-Version: 2.3
Name: anyvec
Version: 0.1.0
Summary: A Python package for seamless vectorization for any content type
Author: Mark Shteyn
Author-email: markshteyn1@gmail.com
Requires-Python: >=3.13,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: beautifulsoup4 (>=4.13.4,<5.0.0)
Requires-Dist: ebooklib (>=0.18,<0.19)
Requires-Dist: mammoth (>=1.9.0,<2.0.0)
Requires-Dist: numpy (>=2.2.5,<3.0.0)
Requires-Dist: odfpy (>=1.4.1,<2.0.0)
Requires-Dist: openpyxl (>=3.1.5,<4.0.0)
Requires-Dist: pdfplumber (>=0.11.6,<0.12.0)
Requires-Dist: pillow (>=11.2.1,<12.0.0)
Requires-Dist: pymupdf (>=1.25.5,<2.0.0)
Requires-Dist: python-docx (>=1.1.0,<2.0.0)
Requires-Dist: python-pptx (>=1.0.2,<2.0.0)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Requires-Dist: ruff (>=0.11.7,<0.12.0)
Requires-Dist: xlrd (>=2.0.1,<3.0.0)
Description-Content-Type: text/markdown

# anyvec

AnyVec is an open-source Python package that makes it easy to vectorize any type of file — text, images, audio, video, or code — through a single, unified interface. Traditionally, embedding different data types (like text vs. images) requires different models and disparate code paths. AnyVec abstracts away these complexities, allowing you to work with a unified API for all your vectorization needs, regardless of file type.

## Building the CLIP Docker Image

To build the Docker image for the CLIP component, run the following commands from the project root:

```bash
cd clip
LOCAL_REPO="multi2vec-clip" \
  TEXT_MODEL_NAME="sentence-transformers/clip-ViT-B-32-multilingual-v1" \
  CLIP_MODEL_NAME="clip-ViT-B-32" \
  ./scripts/build.sh
```

## Running the CLIP Docker Container

After building the image, run the container and map port 8000 on your host to port 8080 in the container (where the API runs):

```bash
docker run --rm -it -p 8000:8080 multi2vec-clip
```

The API will then be available at http://localhost:8000.

To run the container in detached mode (in the background), use:

```bash
docker run -d -p 8000:8080 multi2vec-clip
```

The API will still be available at http://localhost:8000 while the container runs in the background.

