Learning LLM Basics with OpenAI

For some time now, we’ve been using tools like ChatGPT and CoPilot for day-to-day tasks, but mostly through the conversational AI chatbots they provide. We’ve used them for everything from drafting emails to providing type-ahead coding assistance. We haven’t, however, gone a step further and integrated them into a development project. Honestly, we weren’t sure where to start. Looking into the available options, you quickly run into a dozen new concepts, from vector stores to agents, and different SDKs that all seem to solve similar problems.
We want to document our learning as we go, and hopefully provide a useful resource for others who are also looking to get started. We’ll start with the basics, using the OpenAI API directly, and then move on to more complex scenarios. Additionally, if you’d like to follow along, all the code is available in Jupyter Notebooks on GitHub.
Setting up your environment
To start, we’ll use OpenAI, so you’ll need an API key. You can get one by signing up at platform.openai.com. Once you have your key, set it as an environment variable in your terminal:
export OPENAI_API_KEY="<your_api_key_here>"
Raw JSON Response using curl
Let’s try something simple: ask for a short kid-friendly joke. We’ll use curl
to make a request to the OpenAI API directly, specifically the newer Responses API. This way, we can see the raw JSON response from the model:
curl https://api.openai.com/v1/responses -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-5-nano",
"input": "Tell me a silly joke for a kid in 8 words or less"
}'
There’s a lot in the JSON response, but we can see the actual text response in output[1].content
:
{
"type": "output_text",
"annotations": [],
"logprobs": [],
"text": "Why did the scarecrow win? He was outstanding."
}
That’s easy enough! We got a response back from the model, and can see how to extract the actual text from the JSON.
Using the Python SDK
The same can be done using Python, again using the Responses API:
import openai
client = openai.Client()
response = client.responses.create(
model="gpt-5-nano",
input="Tell me a silly joke for a kid.",
reasoning={"effort": "minimal"},
)
print(response)
The response object will be similar to the JSON we saw above:
Response(
id='resp_0e26a5811b5bcb540068c9971e7b008196a141320e0b520202',
created_at=1758041886.0,
error=None,
incomplete_details=None,
instructions=None,
metadata={},
model='gpt-5-nano-2025-08-07',
object='response',
output=[
ResponseReasoningItem(
id='rs_0e26a5811b5bcb540068c9971f15248196a3ff83acab83afe3',
summary=[],
type='reasoning',
content=None,
encrypted_content=None,
status=None
),
ResponseOutputMessage(
id='msg_0e26a5811b5bcb540068c9971f2f848196b6b1fca2ee9d22e0',
content=[
ResponseOutputText(
annotations=[],
text='Why did the tomato turn red?\n\nBecause it saw the salad dressing!',
type='output_text',
logprobs=[]
)
],
role='assistant',
status='completed',
type='message'
)
],
parallel_tool_calls=True,
temperature=1.0,
tool_choice='auto',
tools=[],
top_p=1.0,
background=False,
conversation=None,
max_output_tokens=None,
max_tool_calls=None,
previous_response_id=None,
prompt=None,
prompt_cache_key=None,
reasoning=Reasoning(effort='minimal', generate_summary=None, summary=None),
safety_identifier=None,
service_tier='default',
status='completed',
text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'),
top_logprobs=0,
truncation='disabled',
usage=ResponseUsage(
input_tokens=15,
input_tokens_details=InputTokensDetails(cached_tokens=0),
output_tokens=20,
output_tokens_details=OutputTokensDetails(reasoning_tokens=0),
total_tokens=35
),
user=None,
store=True
)
As we can see, the response is very similar to what we got using curl
and the API is quite straightforward to use.
Token probabilities
We can also ask for the top token probabilities for each token in the response. This can be useful for understanding how the model works and how confident it is in its responses. I’ve always heard that LLMs are just “guessing” the next word based on probabilities, so this should be interesting to see.
OpenAI makes this easy to do by adding the top_logprobs
parameter to our request:
response = client.responses.create(
model="gpt-4",
input="Tell me a silly joke for a kid in 5 words or less.",
top_logprobs=5,
include=["message.output_text.logprobs"],
)
print(response.output[0].content[0])
The response will include the top token probabilities for each token in the response:
ResponseOutputText(
annotations=[],
text='"Why don\'t skeletons fight? Spineless!"',
type='output_text',
logprobs=[
Logprob(
token='"Why',
bytes=[34, 87, 104, 121],
logprob=-2.385021,
top_logprobs=[
LogprobTopLogprob(token='Why', bytes=[87, 104, 121], logprob=-0.099057),
LogprobTopLogprob(token='"Why', bytes=[34, 87, 104, 121], logprob=-2.385021),
LogprobTopLogprob(token='"', bytes=[34], logprob=-6.253838),
LogprobTopLogprob(token='"W', bytes=[34, 87], logprob=-9.471165),
LogprobTopLogprob(token='What', bytes=[87, 104, 97, 116], logprob=-10.46451)
]
),
Logprob(
token=' don',
bytes=[32, 100, 111, 110],
logprob=-0.003614,
top_logprobs=[
LogprobTopLogprob(token=' don', bytes=[32, 100, 111, 110], logprob=-0.003614),
LogprobTopLogprob(token=' did', bytes=[32, 100, 105, 100], logprob=-6.201395),
LogprobTopLogprob(token=' was', bytes=[32, 119, 97, 115], logprob=-6.978532),
LogprobTopLogprob(token=' can', bytes=[32, 99, 97, 110], logprob=-8.329646),
LogprobTopLogprob(token=' didn', bytes=[32, 100, 105, 100, 110], logprob=-8.952241)
]
),
Logprob(
token="'t",
bytes=[39, 116],
logprob=-0.008196,
top_logprobs=[
LogprobTopLogprob(token="'t", bytes=[39, 116], logprob=-0.008196),
LogprobTopLogprob(token='’t', bytes=[226, 128, 153, 116], logprob=-4.808213),
LogprobTopLogprob(token="'", bytes=[39], logprob=-16.455168),
LogprobTopLogprob(token='‘', bytes=[226, 128, 152], logprob=-16.503437),
LogprobTopLogprob(token='´t', bytes=[194, 180, 116], logprob=-16.78155)
]
),
Logprob(
token=' skeletons',
bytes=[32, 115, 107, 101, 108, 101, 116, 111, 110, 115],
logprob=-0.848137,
top_logprobs=[
LogprobTopLogprob(
token=' skeletons',
bytes=[32, 115, 107, 101, 108, 101, 116, 111, 110, 115],
logprob=-0.848137
),
LogprobTopLogprob(token=' eggs', bytes=[32, 101, 103, 103, 115], logprob=-1.243387),
LogprobTopLogprob(
token=' elephants',
bytes=[32, 101, 108, 101, 112, 104, 97, 110, 116, 115],
logprob=-2.184046
),
LogprobTopLogprob(
token=' scientists',
bytes=[32, 115, 99, 105, 101, 110, 116, 105, 115, 116, 115],
logprob=-2.718372
),
LogprobTopLogprob(token=' cats', bytes=[32, 99, 97, 116, 115], logprob=-3.315445)
]
),
Logprob(
token=' fight',
bytes=[32, 102, 105, 103, 104, 116],
logprob=-1.9e-05,
top_logprobs=[
LogprobTopLogprob(token=' fight', bytes=[32, 102, 105, 103, 104, 116], logprob=-1.9e-05),
LogprobTopLogprob(token=' pick', bytes=[32, 112, 105, 99, 107], logprob=-11.475693),
LogprobTopLogprob(token=' start', bytes=[32, 115, 116, 97, 114, 116], logprob=-12.405285),
LogprobTopLogprob(token=' battle', bytes=[32, 98, 97, 116, 116, 108, 101], logprob=-14.309147),
LogprobTopLogprob(token=' get', bytes=[32, 103, 101, 116], logprob=-14.33321)
]
),
Logprob(
token='?',
bytes=[63],
logprob=-0.002157,
top_logprobs=[
LogprobTopLogprob(token='?', bytes=[63], logprob=-0.002157),
LogprobTopLogprob(token=' each', bytes=[32, 101, 97, 99, 104], logprob=-6.623376),
LogprobTopLogprob(token='?\n', bytes=[63, 10], logprob=-8.196818),
LogprobTopLogprob(token='?"', bytes=[63, 34], logprob=-8.674199),
LogprobTopLogprob(token='?"\n\n', bytes=[63, 34, 10, 10], logprob=-8.889853)
]
),
Logprob(
token=' Sp',
bytes=[32, 83, 112],
logprob=-6.169154,
top_logprobs=[
LogprobTopLogprob(token=' No', bytes=[32, 78, 111], logprob=-0.404456),
LogprobTopLogprob(token=' They', bytes=[32, 84, 104, 101, 121], logprob=-1.38358),
LogprobTopLogprob(token=' Bone', bytes=[32, 66, 111, 110, 101], logprob=-2.637419),
LogprobTopLogprob(
token=' Because',
bytes=[32, 66, 101, 99, 97, 117, 115, 101],
logprob=-5.129202
),
LogprobTopLogprob(token=' Sp', bytes=[32, 83, 112], logprob=-6.169154)
]
),
Logprob(
token='ine',
bytes=[105, 110, 101],
logprob=-0.000145,
top_logprobs=[
LogprobTopLogprob(token='ine', bytes=[105, 110, 101], logprob=-0.000145),
LogprobTopLogprob(token='ooky', bytes=[111, 111, 107, 121], logprob=-8.955933),
LogprobTopLogprob(token='ook', bytes=[111, 111, 107], logprob=-11.477196),
LogprobTopLogprob(token='ines', bytes=[105, 110, 101, 115], logprob=-12.532326),
LogprobTopLogprob(token='ooks', bytes=[111, 111, 107, 115], logprob=-13.523472)
]
),
Logprob(
token='less',
bytes=[108, 101, 115, 115],
logprob=-0.001359,
top_logprobs=[
LogprobTopLogprob(token='less', bytes=[108, 101, 115, 115], logprob=-0.001359),
LogprobTopLogprob(token='-less', bytes=[45, 108, 101, 115, 115], logprob=-6.637013),
LogprobTopLogprob(
token='lessness',
bytes=[108, 101, 115, 115, 110, 101, 115, 115],
logprob=-10.479989
),
LogprobTopLogprob(token='-ch', bytes=[45, 99, 104], logprob=-11.556344),
LogprobTopLogprob(token='-', bytes=[45], logprob=-12.172013)
]
),
Logprob(
token='!"',
bytes=[33, 34],
logprob=-0.069934,
top_logprobs=[
LogprobTopLogprob(token='!"', bytes=[33, 34], logprob=-0.069934),
LogprobTopLogprob(token='."', bytes=[46, 34], logprob=-2.748925),
LogprobTopLogprob(token='!"\n', bytes=[33, 34, 10], logprob=-5.926631),
LogprobTopLogprob(token='"', bytes=[34], logprob=-7.845021),
LogprobTopLogprob(token='."\n', bytes=[46, 34, 10], logprob=-8.335368)
]
)
]
)
This is helpful. We can see the tokens that make up the response, along with the top tokens that the model considered at each step, and their associated probabilities. Let’s take this a step further and put this data into a Pandas DataFrame for easier analysis:
import pandas as pd
rows = []
# Loop through the response probabilities
for prob in response.output[0].content[0].logprobs:
# Loop through each token's top logprobs
for top_logprob in prob.top_logprobs:
rows.append(
{
"token": prob.token,
"top_logprob_token": top_logprob.token,
"top_logprob": top_logprob.logprob,
}
)
pd.DataFrame(rows)
This will give us a nice table to analyze the token probabilities:
token top_logprob_token top_logprob
0 "Why Why -0.099057
1 "Why "Why -2.385021
2 "Why " -6.253838
3 "Why "W -9.471165
4 "Why What -10.464510
5 don don -0.003614
6 don did -6.201395
7 don was -6.978532
8 don can -8.329646
9 don didn -8.952241
10 't 't -0.008196
11 't ’t -4.808213
12 't ' -16.455168
13 't ‘ -16.503437
14 't ´t -16.781550
15 skeletons skeletons -0.848137
16 skeletons eggs -1.243387
17 skeletons elephants -2.184046
18 skeletons scientists -2.718372
19 skeletons cats -3.315445
20 fight fight -0.000019
21 fight pick -11.475693
22 fight start -12.405285
23 fight battle -14.309147
24 fight get -14.333210
25 ? ? -0.002157
26 ? each -6.623376
27 ? ?\n -8.196818
28 ? ?" -8.674199
29 ? ?"\n\n -8.889853
30 Sp No -0.404456
31 Sp They -1.383580
32 Sp Bone -2.637419
33 Sp Because -5.129202
34 Sp Sp -6.169154
35 ine ine -0.000145
36 ine ooky -8.955933
37 ine ook -11.477196
38 ine ines -12.532326
39 ine ooks -13.523472
40 less less -0.001359
41 less -less -6.637013
42 less lessness -10.479989
43 less -ch -11.556344
44 less - -12.172013
45 !" !" -0.069934
46 !" ." -2.748925
47 !" !"\n -5.926631
48 !" " -7.845021
49 !" ."\n -8.335368
Here we can see the top tokens the model considered at each step. OpenAI provides a helpful cookbook on logprobs:
Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is
log(p)
, wherep
= probability of a token occurring at a specific position based on the previous tokens in the context. Some key points aboutlogprobs
:
- Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model’s confidence in its output or explore alternative responses the model considered.
- Logprob can be any negative number or
0.0
.0.0
corresponds to 100% probability.
We found this helpful to see how the model is “thinking” as it generates a response. We can see that for some tokens, the model was very confident (e.g., “fight”), while for others, it was less so (e.g., “scientists”). This gives us some insight into how the model is generating its responses.
Next steps
In summary, we found it helpful to see that it’s just sending JSON and receiving JSON. Seeing the raw JSON responses from the model and the probabilities helps show us how the model is “thinking” as it generates text.
Next up, we’ll explore running models locally using Ollama.