Exercise 2 - Persist and Reload

Goal: Save your FAISS index to disk and reload it in a fresh process, without re-embedding the document.

Background

In Exercise 1 you built the index every time the script ran. That means re-embedding the entire document on each run - slow and wasteful. In a real pipeline you build the index once and reload it instantly.

FAISS handles vector storage with faiss.write_index / faiss.read_index. But FAISS only stores vectors, not text. You need to save the chunk list separately.

Assignment

Open 02_persist.py.

Part A - Save

If index.faiss does not already exist, build the index from Exercise 1.
Save the index: faiss.write_index(index, "index.faiss").
Save the chunks list to chunks.json using the json module.
Print the file sizes of both files.

Part B - Load

Load the index: faiss.read_index("index.faiss").
Load the chunks list from chunks.json.
Run the same query as in Exercise 1 and confirm results are identical.
Time the load. Compare it to the time it took to build from scratch.

Thinking questions

You saved chunks as plain text in JSON. What metadata would you want to save alongside the text in a production system?
faiss.write_index only saves vectors and index structure - not the original Document objects or their page numbers. Design a simple file format (JSON or otherwise) that would store everything you need to fully reconstruct the search results.

← Exercise 1 · Next: Exercise 3 - ChromaDB →

Background​

Assignment​

Thinking questions​

Background

Assignment

Thinking questions