Exercise 2 - First API Calls
Goal: Make a real embedding call and a real generation call. Confirm both return sensible output.
Background
Ollama exposes two endpoints you will use throughout this curriculum:
POST /api/embeddings- converts text to a vectorPOST /api/generate- generates text given a prompt
Both are simple JSON over HTTP. No SDK required.
Assignment
Open 02_first_calls.py.
Part A - Embeddings
- Call
/api/embeddingswith the modelnomic-embed-textand any short sentence. - Convert the returned list to a
numpyfloat32 array. - Print the shape. It should be
(768,). - Print the min, max, and L2 norm of the raw vector.
Part B - Generation
- Call
/api/generatewith the modelllama3.2, a simple question as the prompt, and"stream": false. - Print the
"response"field from the JSON. - Time how long the call takes using
time.time(). Print the duration.
Part C - Reflect
Add a comment at the bottom of the file answering:
- Is the embedding vector already normalised (norm ≈ 1.0)?
- How many seconds did generation take on your machine?
Thinking questions
- What happens if you pass an empty string to the embedding endpoint?
- The generation endpoint also accepts
"stream": true(the default). What would you need to change in your code to handle a streamed response?