Learning LLM Basics with OpenAI

For some time now, we’ve been using tools like ChatGPT and CoPilot for day-to-day tasks, but mostly through the conversational AI chatbots they provide. We’ve used them for everything from drafting emails to providing type-ahead coding assistance. We haven’t, however, gone a step further and integrated them into a development project. Honestly, we weren’t sure where to start. Looking into the available options, you quickly run into a dozen new concepts, from vector stores to agents, and different SDKs that all seem to solve similar problems.

We want to document our learning as we go, and hopefully provide a useful resource for others who are also looking to get started. We’ll start with the basics, using the OpenAI API directly, and then move on to more complex scenarios. Additionally, if you’d like to follow along, all the code is available in Jupyter Notebooks on GitHub.

Setting up your environment

To start, we’ll use OpenAI, so you’ll need an API key. You can get one by signing up at platform.openai.com. Once you have your key, set it as an environment variable in your terminal:

export OPENAI_API_KEY="<your_api_key_here>"

Raw JSON Response using curl

Let’s try something simple: ask for a short kid-friendly joke. We’ll use curl to make a request to the OpenAI API directly, specifically the newer Responses API. This way, we can see the raw JSON response from the model:

curl https://api.openai.com/v1/responses -s \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-5-nano",
    "input": "Tell me a silly joke for a kid in 8 words or less"
  }'

There’s a lot in the JSON response, but we can see the actual text response in output[1].content:

{
  "type": "output_text",
  "annotations": [],
  "logprobs": [],
  "text": "Why did the scarecrow win? He was outstanding."
}

That’s easy enough! We got a response back from the model, and can see how to extract the actual text from the JSON.

Using the Python SDK

The same can be done using Python, again using the Responses API:

import openai

client = openai.Client()
response = client.responses.create(
    model="gpt-5-nano",
    input="Tell me a silly joke for a kid.",
    reasoning={"effort": "minimal"},
)
print(response)

The response object will be similar to the JSON we saw above:

Response(
    id='resp_0e26a5811b5bcb540068c9971e7b008196a141320e0b520202',
    created_at=1758041886.0,
    error=None,
    incomplete_details=None,
    instructions=None,
    metadata={},
    model='gpt-5-nano-2025-08-07',
    object='response',
    output=[
        ResponseReasoningItem(
            id='rs_0e26a5811b5bcb540068c9971f15248196a3ff83acab83afe3',
            summary=[],
            type='reasoning',
            content=None,
            encrypted_content=None,
            status=None
        ),
        ResponseOutputMessage(
            id='msg_0e26a5811b5bcb540068c9971f2f848196b6b1fca2ee9d22e0',
            content=[
                ResponseOutputText(
                    annotations=[],
                    text='Why did the tomato turn red?\n\nBecause it saw the salad dressing!',
                    type='output_text',
                    logprobs=[]
                )
            ],
            role='assistant',
            status='completed',
            type='message'
        )
    ],
    parallel_tool_calls=True,
    temperature=1.0,
    tool_choice='auto',
    tools=[],
    top_p=1.0,
    background=False,
    conversation=None,
    max_output_tokens=None,
    max_tool_calls=None,
    previous_response_id=None,
    prompt=None,
    prompt_cache_key=None,
    reasoning=Reasoning(effort='minimal', generate_summary=None, summary=None),
    safety_identifier=None,
    service_tier='default',
    status='completed',
    text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'),
    top_logprobs=0,
    truncation='disabled',
    usage=ResponseUsage(
        input_tokens=15,
        input_tokens_details=InputTokensDetails(cached_tokens=0),
        output_tokens=20,
        output_tokens_details=OutputTokensDetails(reasoning_tokens=0),
        total_tokens=35
    ),
    user=None,
    store=True
)

As we can see, the response is very similar to what we got using curl and the API is quite straightforward to use.

Token probabilities

We can also ask for the top token probabilities for each token in the response. This can be useful for understanding how the model works and how confident it is in its responses. I’ve always heard that LLMs are just “guessing” the next word based on probabilities, so this should be interesting to see.

OpenAI makes this easy to do by adding the top_logprobs parameter to our request:

response = client.responses.create(
    model="gpt-4",
    input="Tell me a silly joke for a kid in 5 words or less.",
    top_logprobs=5,
    include=["message.output_text.logprobs"],
)
print(response.output[0].content[0])

The response will include the top token probabilities for each token in the response:

ResponseOutputText(
    annotations=[],
    text='"Why don\'t skeletons fight? Spineless!"',
    type='output_text',
    logprobs=[
        Logprob(
            token='"Why',
            bytes=[34, 87, 104, 121],
            logprob=-2.385021,
            top_logprobs=[
                LogprobTopLogprob(token='Why', bytes=[87, 104, 121], logprob=-0.099057),
                LogprobTopLogprob(token='"Why', bytes=[34, 87, 104, 121], logprob=-2.385021),
                LogprobTopLogprob(token='"', bytes=[34], logprob=-6.253838),
                LogprobTopLogprob(token='"W', bytes=[34, 87], logprob=-9.471165),
                LogprobTopLogprob(token='What', bytes=[87, 104, 97, 116], logprob=-10.46451)
            ]
        ),
        Logprob(
            token=' don',
            bytes=[32, 100, 111, 110],
            logprob=-0.003614,
            top_logprobs=[
                LogprobTopLogprob(token=' don', bytes=[32, 100, 111, 110], logprob=-0.003614),
                LogprobTopLogprob(token=' did', bytes=[32, 100, 105, 100], logprob=-6.201395),
                LogprobTopLogprob(token=' was', bytes=[32, 119, 97, 115], logprob=-6.978532),
                LogprobTopLogprob(token=' can', bytes=[32, 99, 97, 110], logprob=-8.329646),
                LogprobTopLogprob(token=' didn', bytes=[32, 100, 105, 100, 110], logprob=-8.952241)
            ]
        ),
        Logprob(
            token="'t",
            bytes=[39, 116],
            logprob=-0.008196,
            top_logprobs=[
                LogprobTopLogprob(token="'t", bytes=[39, 116], logprob=-0.008196),
                LogprobTopLogprob(token='’t', bytes=[226, 128, 153, 116], logprob=-4.808213),
                LogprobTopLogprob(token="'", bytes=[39], logprob=-16.455168),
                LogprobTopLogprob(token='‘', bytes=[226, 128, 152], logprob=-16.503437),
                LogprobTopLogprob(token='´t', bytes=[194, 180, 116], logprob=-16.78155)
            ]
        ),
        Logprob(
            token=' skeletons',
            bytes=[32, 115, 107, 101, 108, 101, 116, 111, 110, 115],
            logprob=-0.848137,
            top_logprobs=[
                LogprobTopLogprob(
                    token=' skeletons',
                    bytes=[32, 115, 107, 101, 108, 101, 116, 111, 110, 115],
                    logprob=-0.848137
                ),
                LogprobTopLogprob(token=' eggs', bytes=[32, 101, 103, 103, 115], logprob=-1.243387),
                LogprobTopLogprob(
                    token=' elephants',
                    bytes=[32, 101, 108, 101, 112, 104, 97, 110, 116, 115],
                    logprob=-2.184046
                ),
                LogprobTopLogprob(
                    token=' scientists',
                    bytes=[32, 115, 99, 105, 101, 110, 116, 105, 115, 116, 115],
                    logprob=-2.718372
                ),
                LogprobTopLogprob(token=' cats', bytes=[32, 99, 97, 116, 115], logprob=-3.315445)
            ]
        ),
        Logprob(
            token=' fight',
            bytes=[32, 102, 105, 103, 104, 116],
            logprob=-1.9e-05,
            top_logprobs=[
                LogprobTopLogprob(token=' fight', bytes=[32, 102, 105, 103, 104, 116], logprob=-1.9e-05),
                LogprobTopLogprob(token=' pick', bytes=[32, 112, 105, 99, 107], logprob=-11.475693),
                LogprobTopLogprob(token=' start', bytes=[32, 115, 116, 97, 114, 116], logprob=-12.405285),
                LogprobTopLogprob(token=' battle', bytes=[32, 98, 97, 116, 116, 108, 101], logprob=-14.309147),
                LogprobTopLogprob(token=' get', bytes=[32, 103, 101, 116], logprob=-14.33321)
            ]
        ),
        Logprob(
            token='?',
            bytes=[63],
            logprob=-0.002157,
            top_logprobs=[
                LogprobTopLogprob(token='?', bytes=[63], logprob=-0.002157),
                LogprobTopLogprob(token=' each', bytes=[32, 101, 97, 99, 104], logprob=-6.623376),
                LogprobTopLogprob(token='?\n', bytes=[63, 10], logprob=-8.196818),
                LogprobTopLogprob(token='?"', bytes=[63, 34], logprob=-8.674199),
                LogprobTopLogprob(token='?"\n\n', bytes=[63, 34, 10, 10], logprob=-8.889853)
            ]
        ),
        Logprob(
            token=' Sp',
            bytes=[32, 83, 112],
            logprob=-6.169154,
            top_logprobs=[
                LogprobTopLogprob(token=' No', bytes=[32, 78, 111], logprob=-0.404456),
                LogprobTopLogprob(token=' They', bytes=[32, 84, 104, 101, 121], logprob=-1.38358),
                LogprobTopLogprob(token=' Bone', bytes=[32, 66, 111, 110, 101], logprob=-2.637419),
                LogprobTopLogprob(
                    token=' Because',
                    bytes=[32, 66, 101, 99, 97, 117, 115, 101],
                    logprob=-5.129202
                ),
                LogprobTopLogprob(token=' Sp', bytes=[32, 83, 112], logprob=-6.169154)
            ]
        ),
        Logprob(
            token='ine',
            bytes=[105, 110, 101],
            logprob=-0.000145,
            top_logprobs=[
                LogprobTopLogprob(token='ine', bytes=[105, 110, 101], logprob=-0.000145),
                LogprobTopLogprob(token='ooky', bytes=[111, 111, 107, 121], logprob=-8.955933),
                LogprobTopLogprob(token='ook', bytes=[111, 111, 107], logprob=-11.477196),
                LogprobTopLogprob(token='ines', bytes=[105, 110, 101, 115], logprob=-12.532326),
                LogprobTopLogprob(token='ooks', bytes=[111, 111, 107, 115], logprob=-13.523472)
            ]
        ),
        Logprob(
            token='less',
            bytes=[108, 101, 115, 115],
            logprob=-0.001359,
            top_logprobs=[
                LogprobTopLogprob(token='less', bytes=[108, 101, 115, 115], logprob=-0.001359),
                LogprobTopLogprob(token='-less', bytes=[45, 108, 101, 115, 115], logprob=-6.637013),
                LogprobTopLogprob(
                    token='lessness',
                    bytes=[108, 101, 115, 115, 110, 101, 115, 115],
                    logprob=-10.479989
                ),
                LogprobTopLogprob(token='-ch', bytes=[45, 99, 104], logprob=-11.556344),
                LogprobTopLogprob(token='-', bytes=[45], logprob=-12.172013)
            ]
        ),
        Logprob(
            token='!"',
            bytes=[33, 34],
            logprob=-0.069934,
            top_logprobs=[
                LogprobTopLogprob(token='!"', bytes=[33, 34], logprob=-0.069934),
                LogprobTopLogprob(token='."', bytes=[46, 34], logprob=-2.748925),
                LogprobTopLogprob(token='!"\n', bytes=[33, 34, 10], logprob=-5.926631),
                LogprobTopLogprob(token='"', bytes=[34], logprob=-7.845021),
                LogprobTopLogprob(token='."\n', bytes=[46, 34, 10], logprob=-8.335368)
            ]
        )
    ]
)

This is helpful. We can see the tokens that make up the response, along with the top tokens that the model considered at each step, and their associated probabilities. Let’s take this a step further and put this data into a Pandas DataFrame for easier analysis:

import pandas as pd

rows = []
# Loop through the response probabilities
for prob in response.output[0].content[0].logprobs:
    # Loop through each token's top logprobs
    for top_logprob in prob.top_logprobs:
        rows.append(
            {
                "token": prob.token,
                "top_logprob_token": top_logprob.token,
                "top_logprob": top_logprob.logprob,
            }
        )

pd.DataFrame(rows)

This will give us a nice table to analyze the token probabilities:

         token top_logprob_token  top_logprob
0         "Why               Why    -0.099057
1         "Why              "Why    -2.385021
2         "Why                 "    -6.253838
3         "Why                "W    -9.471165
4         "Why              What   -10.464510
5          don               don    -0.003614
6          don               did    -6.201395
7          don               was    -6.978532
8          don               can    -8.329646
9          don              didn    -8.952241
10          't                't    -0.008196
11          't                ’t    -4.808213
12          't                 '   -16.455168
13          't                 ‘   -16.503437
14          't                ´t   -16.781550
15   skeletons         skeletons    -0.848137
16   skeletons              eggs    -1.243387
17   skeletons         elephants    -2.184046
18   skeletons        scientists    -2.718372
19   skeletons              cats    -3.315445
20       fight             fight    -0.000019
21       fight              pick   -11.475693
22       fight             start   -12.405285
23       fight            battle   -14.309147
24       fight               get   -14.333210
25           ?                 ?    -0.002157
26           ?              each    -6.623376
27           ?               ?\n    -8.196818
28           ?                ?"    -8.674199
29           ?            ?"\n\n    -8.889853
30          Sp                No    -0.404456
31          Sp              They    -1.383580
32          Sp              Bone    -2.637419
33          Sp           Because    -5.129202
34          Sp                Sp    -6.169154
35         ine               ine    -0.000145
36         ine              ooky    -8.955933
37         ine               ook   -11.477196
38         ine              ines   -12.532326
39         ine              ooks   -13.523472
40        less              less    -0.001359
41        less             -less    -6.637013
42        less          lessness   -10.479989
43        less               -ch   -11.556344
44        less                 -   -12.172013
45          !"                !"    -0.069934
46          !"                ."    -2.748925
47          !"              !"\n    -5.926631
48          !"                 "    -7.845021
49          !"              ."\n    -8.335368

Here we can see the top tokens the model considered at each step. OpenAI provides a helpful cookbook on logprobs:

Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is log(p), where p = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about logprobs:

  • Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model’s confidence in its output or explore alternative responses the model considered.
  • Logprob can be any negative number or 0.0. 0.0 corresponds to 100% probability.

We found this helpful to see how the model is “thinking” as it generates a response. We can see that for some tokens, the model was very confident (e.g., “fight”), while for others, it was less so (e.g., “scientists”). This gives us some insight into how the model is generating its responses.

Next steps

In summary, we found it helpful to see that it’s just sending JSON and receiving JSON. Seeing the raw JSON responses from the model and the probabilities helps show us how the model is “thinking” as it generates text.

Next up, we’ll explore running models locally using Ollama.