Metadata-Version: 2.1
Name: hashed-list
Version: 1.0.0
Summary: An auto-indexed Python List data structure with constant time .index() method
Home-page: https://github.com/imranariffin/hashed-list
Author: Imran Ariffin
Author-email: ariffin.imran@gmail.com
License: UNKNOWN
Project-URL: Bug Tracker, https://github.com/imranariffin/hashed-list/issues
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.5
Description-Content-Type: text/markdown
License-File: LICENSE

# HashedList
A List with O(1) time complexity for `.index()` method, instead of O(N).

# Description

A data structure that is pretty much a Python list, except that:
1. The method `.index(value)` is O(1) (Good)
2. The data structure uses twice more memory due to indexing
   (Not good but still okay)
3. Items must be unique. It will raise DuplicateValueError if
   duplicate item is provided

## Main use case:
You have a huge list of unique items that:
1. You may update the list (remove, insert, set value etc) from
   time to time
2. You may get the index of a specific item in the list from
   time to time

In this case, using just a regular list definitely works but will cost
you O(N) each time you get the index of a specific item. Or, you can
maintain along the list a dictionary of item => index ,but that will cost
you the burden of updating the dictionary everytime the list is updated.

HashedList will make the work easy for you.

## Installation
```bash
pip install hashed-list
```

## Usage

Simply instantiate it from an iterable, and use it as you normally would a Python list
```python
from hashed_list import HashedList

# From list
hashed_list_1 = HashedList(["a", "b", "c"])
# From generator
hashed_list_2 = HashedList(x for x in range(100) if x % 2 == 0)
# From sequence
hashed_list_3 = HashedList(range(1, 100, 2))

# Exceptions
from hashed_list import DuplicateValueError
try:
    hashed_list_4 = HashedList([1, 1, 2])
except DuplicateValueError as e:
    print(e)

# Use it like a normal Python list
hashed_list_1[0]  # => "a"
len(hashed_list_3) == len(list(range(1, 100, 2)))  # => True
hashed_list_1.index("c")  # => 2
# The same for .extend(), .append(), .insert(), etc ...
```

## Simple benchmark
On a large list, `HashedList.index()` should be more than a thousand times faster 
than `list.index()` but you can try it yourself by copy-pasting the code below on 
your Python shell

```python
import time
from random import randint
from hashed_list import HashedList

list_size = 99999999

print("Testing list.index()")
very_huge_list = list(range(list_size))
t0 = time.time()
_ = very_huge_list.index(randint(0, list_size))
d1 = time.time() - t0
print(f"list.index took {d1} seconds")  # list.index took 0.0004763603210449219 seconds

print("Testing HashedList.index()")
very_huge_hashed_list = HashedList(range(list_size))
t0 = time.time()
_ = very_huge_hashed_list.index(randint(0, list_size))
d2 = time.time() - t0
print(f"HashList.index took {d2} seconds")  # HashList.index took 0.0004763603210449219 seconds
print(f"HashList.index is {d1 // d2} times faster than list.index!")
```

## Caveats
1. `HashedList` consumes 2 times more memory than Python list
2. `HashedList` consumes more time during initialization due to hashing

Given these caveats, use `HashedArray` only when you know that `.index` is going to be used a lot.

## Development
To start developing for and contributing to this repository:
```bash
# Start a virtual environment
python -m venv .venv
# Install the package from source
python -m install -e .
# Run all tests
python -m unittest discover tests/
# Now you can start developing. Please see src/ and tests/ 
# folders for source code and tests respectively
```


