Generative Artificial intelligence

Deep Agents: What LangChain Quietly Built While Everyone Was Arguing About Prompts

Pritam Chakroborty — Fri, 17 Apr 2026 18:36:57 GMT

The "Shallow Agent" Problem Nobody Talks About

If you've been following my previous articles on Prompt Engineering and Context Engineering, you know I've been going deeper and deeper into how LLMs actually work under the hood.

This article is the natural next step. Because here's the thing —

We've all seen the basic agent architecture. An LLM running in a loop, calling tools, getting results, calling more tools. It's elegant. It's simple. And for basic tasks, it works great.

But try giving it a complex task — something like "research the top 5 competitors in the Indian EdTech space, compare their pricing models, and draft a strategy doc" — and watch it fall apart.

Why? Because the agent becomes shallow.

"Shallow" here means the agent can't plan over longer time horizons. It solves whatever is immediately in front of it rather than thinking about the bigger picture. After 15-20 tool calls, the original objective gets buried under thousands of tokens of intermediate results. The agent starts drifting, repeating work, or just... stopping early because it forgot there were more steps.

Sound familiar? It should. This is exactly the Context Distraction problem I wrote about in my Context Engineering article. The context grows so long that the model over-focuses on recent content and loses sight of the original goal.

So, what's the fix?

Enter Deep Agents

LangChain noticed something interesting. Applications like Claude Code, Deep Research, and Manus — the agents that actually work on complex, long-running tasks — they all share four characteristics:

A planning tool
Sub-agents
Access to a file system
A detailed system prompt

That's it. The core algorithm is the same — an LLM calling tools in a loop. The difference is infrastructure around the loop.

LangChain packaged these four primitives into an open-source library called deepagents. They call it an "agent harness" — not a new framework, not a new reasoning paradigm, just an opinionated wrapper that gives your agent the equipment it needs to go deep instead of staying shallow.

Think of it this way:

LangGraph gives you an engine and a transmission. Deep Agents gives you a car.

Install it with one line:

pip install deepagents

And the simplest possible agent looks like this:

from deepagents import create_deep_agent

def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"

agent = create_deep_agent(
    model="openai:gpt-4o",
    tools=[get_weather],
    system_prompt="You are a helpful assistant",
)

agent.invoke(
    {"messages": [{"role": "user", "content": "what is the weather in sf"}]}
)

One function. Under the hood, it handles the LangGraph graph, state management, streaming, and context window management — none of which you touched.

But the real magic is in those four primitives. Let's break each one down.

The Four Pillars of Deep Agents

🧠 Pillar 1: The Planning Tool (The Most Counterintuitive One)

Every deep agent automatically gets a write_todos tool.

Now here's the part that blew my mind when I first learned about it —

This tool is a no-op. It literally does nothing.

When the model calls write_todos(["research X", "compare Y and Z", "draft summary"]), no scheduler runs. No task queue gets populated. No database row is written. The tool just accepts the input and returns something like "todos updated."

So why does it work?

Remember the "Attention Budget" concept from my Context Engineering article? LLMs have no hidden scratchpad, no persistent working memory between tool calls. The context window IS the model's memory. Whatever isn't written down effectively doesn't exist for the model on the next step.

When the agent calls write_todos, three things happen:

The plan gets serialized into tokens. The todo list now physically exists as text in the message history.
Future token predictions attend to it. Every subsequent word the model generates is conditioned on that plan being right there in recent context.
The act of writing forced decomposition. To emit the tool call, the model had to commit to a specific breakdown. That commitment is now anchored.

Without this, on a long task the model drifts. After 30 tool calls and 15,000 tokens of intermediate results, the original objective gets buried. The plan tool prevents this by keeping the goal in the "hot zone" of recent context.

💡 Harrison Chase (LangChain CEO) specifically credits Claude Code as the inspiration. Claude Code's Todo list tool is also a no-op — it's pure context engineering strategy. Prompting matters still!

This connects directly to what I discussed in the Context Engineering article — every token in the context affects how the model behaves. The plan tool is deliberately injecting high-value tokens that anchor the model's behavior across dozens of turns.

The tool's functional uselessness is the feature, not a bug. A "real" scheduler would couple the agent to external infrastructure. A no-op just shapes the language model's own behavior — which is the only thing that actually matters.

📁 Pillar 2: The Virtual File System (Context Offloading in Action)

Deep agents get built-in tools: ls, read_file, write_file, edit_file.

Now, if you've read my Context Engineering article, this should immediately ring a bell. This is Context Offloading — the strategy where the agent stores information outside the LLM's context window and pulls it back in when needed.

Instead of stuffing 50,000 tokens of research results into the conversation history (hello, Context Distraction 👋), the agent writes intermediate results to files. When it needs that information later, it reads just the relevant file.

The file system is virtual by default — "files" live in agent state, not on your actual disk. But you can swap backends:

In-memory — for quick, ephemeral tasks
Local disk — for development
LangGraph Store — for cross-thread persistence
Sandboxes (Modal, Daytona, Deno) — for isolated code execution

When using a sandbox backend, agents also get an execute tool to run shell commands — tests, builds, git operations. That's how the CLI version works as a terminal coding agent comparable to Claude Code.

💡 The key insight: The file system isn't just storage. It's a context management strategy. Write large results to a file, keep a short summary in context, read the file back only when you need the details. This directly combats Context Confusion and Context Distraction.

🤖 Pillar 3: Sub-agents (Context Quarantine in Disguise)

A built-in task tool lets the main agent spawn specialized sub-agents. Each sub-agent gets its own clean context window, goes deep on a specific subtask, and returns only a condensed summary.

Again — this is exactly the Context Quarantine strategy from my previous article. Rather than one agent attempting to maintain state across an entire project, specialized sub-agents handle focused tasks with isolated contexts.

Here's how it works mechanically:

The supervisor agent calls task(description="research competitor pricing").
The runtime spins up a sub-agent with a fresh context window.
The sub-agent runs its full tool-calling loop — maybe 20-30 LLM calls, tens of thousands of tokens.
It returns only a condensed summary (1,000–2,000 tokens) to the supervisor.
The supervisor's context stays clean.

The sub-agent might explore extensively, but the supervisor only sees the distilled result. The main context never gets polluted with raw search results, API responses, or intermediate reasoning.

Inline vs. Async Sub-agents

Now, there's an important nuance here that the latest release (v0.5) addresses.

Inline sub-agents block the supervisor. When the supervisor calls task(), its entire execution loop freezes until the sub-agent finishes. For a sub-agent doing deep research — 40 LLM calls, each taking 2-10 seconds, plus tool calls — that's easily 5-15 minutes of wall-clock time where the supervisor can't do anything. Can't respond to the user, can't work on other tasks, can't spawn other sub-agents.

Think of it like a restaurant where the head chef personally goes to the farm to pick vegetables every time an order comes in. The entire kitchen stops.

Async sub-agents fix this. Instead of blocking, start_async_task() returns a task ID immediately. The actual work runs on a separate Agent Protocol server — different process, possibly different machine. The supervisor continues its loop, works on other things, and polls for results via check_async_task(task_id) when it's ready.

Same head chef analogy — now the chef calls the farm, places the order, and keeps cooking other dishes. When the delivery arrives, the chef incorporates the ingredients.

💡 Rule of thumb: Inline for sub-second to tens-of-seconds work. Async for minutes-plus. Short, focused tasks (classify this input, extract these fields) should stay inline. Long-running research and multi-step pipelines are where async pays off.

📝 Pillar 4: The Detailed System Prompt

This one might seem obvious, but it's more nuanced than you think.

Claude Code's system prompts are long. Really long. They contain:

Detailed instructions on how to use each tool
Few-shot examples for specific situations
Rules about when to plan vs. when to act
Guidelines for verifying work before reporting results

Without these prompts, the agents would not be nearly as deep. Prompting matters still!

Deep Agents ships with opinionated defaults inspired by Claude Code's prompt structure. These teach the model to:

Plan before acting
Verify work after completing it
Manage context proactively (write to files, summarize when needed)
Use sub-agents for context isolation

You can extend these with custom instructions or replace them entirely. But the defaults are strong — they encode hard-won lessons about what makes agents actually reliable.

How It All Connects to Context Engineering

If you've been reading my articles in order, you might be seeing a pattern emerging. Let me make it explicit:

Context Engineering Strategy	Deep Agents Implementation
Compaction / Summarization	Auto-summarization middleware compacts older messages when context grows long
Context Offloading	Virtual file system — write results to files, read back when needed
Context Quarantine	Sub-agents with isolated context windows
Tool Loadout	Skills system — reusable bundles of workflows and domain knowledge

Deep Agents is essentially Context Engineering, productized. Every pillar directly addresses one of the context failure modes I wrote about:

Context Poisoning → The plan tool lets the agent self-correct by checking todos against actual progress
Context Distraction → File system offloads intermediate results so they don't dilute attention
Context Confusion → Sub-agents quarantine irrelevant context from the main thread
Context Clash → Permission rules and structured tools prevent conflicting information from accumulating

This is why I got so excited about this library. It's not just another framework. It's context engineering principles turned into reusable infrastructure.

When to Use What — The LangChain Stack

LangChain now has three tiers. Choosing correctly matters:

LangChain (create_agent) — For simple agents and standardized team patterns. The tool-calling loop is enough. Think: a customer service bot that looks up order status.

LangGraph — The low-level runtime. For when you need full control over state, conditional edges, and custom graph topology. Think: a complex approval workflow with branching logic you need to define precisely.

Deep Agents — For complex, non-deterministic, long-running tasks where you want planning, filesystem, sub-agents, and context compaction out of the box. Think: a research agent that explores a topic for 30 minutes and produces a comprehensive report.

For simple Q&A or single-tool tasks, a basic agent is fine. Deep Agents shine when the task feels more like a project than a question.

Quick Start — Building Your First Deep Agent

Here's a minimal but real example using Tavily for web search:

import os
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool
from deepagents import create_deep_agent
from tavily import TavilyClient

os.environ["OPENAI_API_KEY"] = "your-key"
os.environ["TAVILY_API_KEY"] = "your-key"

tavily = TavilyClient()

@tool
def web_search(query: str) -> str:
    """Search the web for current information."""
    results = tavily.search(query, max_results=3)
    return "\n".join([r["content"] for r in results["results"]])

model = init_chat_model("openai:gpt-4o")

agent = create_deep_agent(
    model=model,
    tools=[web_search],
    system_prompt="You are a research assistant. Always plan before acting.",
)

result = agent.invoke({
    "messages": [
        {"role": "user", "content": "Research the current state of AI agents in 2025 and write a summary"}
    ]
})

The agent will automatically:

Create a plan using write_todos
Search the web using your tool
Write intermediate findings to its virtual file system
Synthesize everything into a final summary
Check off todos as it goes

All of that behavior comes from the harness — you didn't code any of it.

Things I Wish Someone Told Me Earlier

After spending considerable time understanding this library, here are my honest takeaways:

1. The plan tool is psychological, not functional. Don't expect write_todos to do scheduling. It shapes the model's behavior by making planning explicit in context. That's it. And that's enough.

2. Sub-agents aren't free. Each spawn is another full LLM call stack. Use them for genuine context isolation, not just to make your architecture look fancy.

3. Model choice changes behavior significantly. Some models plan well but execute tool calls unreliably. Some are the opposite. Benchmark before committing. Deep Agents is provider-agnostic — try GPT-4o, Claude, Qwen, Llama and compare.

4. The file system is virtual by default. "Files" live in agent state unless you configure a durable backend. Don't assume writes persist across threads without explicit setup.

5. Prompting still matters. A lot. The defaults are strong but opinionated toward coding/research work. Domain-specific agents need prompt customization.

Managing context is often the toughest part of creating an agent. Deep Agents doesn't eliminate that challenge — but it gives you principled tools to handle it, so you can focus on building what actually matters.

Sources

#deep-agents #langchain #context-engineering #gen-ai #agents #langgraph

Beyond Prompts: The Rise of Context Engineering

Pritam Chakroborty — Mon, 24 Nov 2025 14:25:06 GMT

For past few years ‘Prompt Engineering’ has been the center of attention. The focus was centered on crafting the perfect sequence of words and instructions to get a desired response from the model in a single turn or a simple, short-term task.

However, as the industry matured and moved from simple chat interfaces to building complex, autonomous AI Agents—systems that operate in a loop, use tools, and manage multi-step, long-horizon tasks. We got different engineering problem at hand -

“what configuration of context is most likely to generate our model’s desired behavior?"

To make it more simpler - “How can we feed only relevant information in context during LLM inference?”

Now, To get the gist of the problem first we have to understand the term called “Attention Budget”.

Despite LLM’s speed and ability to manage larger and larger volumes of data, like humans, LLM’s also lose focus or experience confusion at a certain point. Context, therefore, should be considered a limited resource with decreasing benefits as more is used. Like humans, who have limited working memory capacity, LLMs have an “Attention Budget” that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount, increasing the need to carefully curate the tokens available to the LLM.

This attention scarcity is a architectural constraint of LLM’s. Therefore, the need of utilizing the limited context window emerges - A new term called “Context Engineering” comes into the display. As well as a new doubt - How is context engineering different from prompt engineering?

To answer first that we have to understand what context engineering is -

As Karpathy explains -

Context Engineering VS Prompt Engineering

Prompt engineering refers to methods for writing and organizing LLM instructions for optimal outcomes (see our docs for an overview and useful prompt engineering strategies).

Context engineering refers to the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including all the other information that may land there outside of the prompts.

As the term implies, the main focus of prompt engineering is on writing effective prompts, especially system prompts. Prompts instruct the model on how to behave and what actions to take. However, to create an interactive system guided by these prompts, your system needs contextual awareness.

Therefore, context is not meant to replace prompts, nor is one better than the other. Both are essential parts of a fully functional agentic system.

The need of Context Engineering :-

Initially we all thought that long enough context window and giving access to tools would be enough for creating a sustainable agentic architecture. As model context windows continue to grow, with many supporting up to 1 million tokens, it might seem like you can include everything you need in a prompt—tools, documents, instructions, and more—and let the model handle it.

However, in reality, longer contexts don't always lead to better responses. Overloading your context can cause your agents and applications to fail in unexpected ways. Contexts can become contaminated, distracting, confusing, or conflicting. This is especially problematic for agents, which rely on context to gather information, synthesize findings, and coordinate actions.

LLMs use the transformer architecture, which allows each token to connect with every other token in the entire context. This creates n² pairwise relationships for n tokens. As the context length increases, the model's ability to manage these relationships becomes strained, leading to a natural conflict between context size and attention focus.

Here are some of the ways context can get out of hand ——

💡

Collected from this excellent article by Drew Breunig

Context Poisoning

Context Poisoning is when a hallucination or other error makes it into the context, where it is repeatedly referenced.

Context Distraction

Context Distraction is when a context grows so long that the model over-focuses on the context, neglecting what it learned during training.

Context Confusion

Context Confusion is when superfluous content in the context is used by the model to generate a low-quality response.

Here the explanation is: If you put anything on context the model has to pay attention to it, does not matter it is relevant or not. Large models, especially reasoning models, are getting better at ignoring or discarding superfluous context compare to smaller models. But longer context comes with it’s own downside.

Context Clash

Context Clash is when you accrue new information and tools in your context that conflicts with other information in the context.

This is a more problematic version of Context Confusion: the bad context here isn’t irrelevant, it directly conflicts with other information in the prompt.

Here is the explanation of this problem : When you stuffed the model lots of information as context, as an effect it creates confusion. The model gets confused between the information it has in context therefore for each separate question it tries to solve the problem based on assumed facts, but when it has the question with all details in hand it doesn't assume anything just answer based on the information.

Therefore even we reach million-token context window and throw all information into that, does not matter how well defined your prompts are due to the architectural constraint LLM will not be able meet desired agentic behavior.

These failures hit agents hardest because agents operate in exactly the scenarios where contexts balloon: gathering information from multiple sources, making sequential tool calls, engaging in multi-turn reasoning, and accumulating extensive histories.

Strategies for mitigating context failure issues:-

Fortunately, There are various solutions available for us to tackle these context failure issues.

💡

These strategies has been defined in depth in the sources given at the end of the article.

Compaction / Context Summarization

Context Summarization is the process of condensing a large amount of context into a brief summary.

Context Summarization was first used to manage smaller context windows. When your chat session was about to exceed the maximum context length, a summary would be created, and a new thread would start. The skill in compaction is choosing what to keep and what to discard, as being too aggressive can lead to losing subtle but important context that might be needed later.
Structured note-taking / Context Offloading

Context Offloading is the act of storing information outside the LLM’s context, usually via a tool that stores and manages the data. It is a technique where the agent regularly writes notes persisted to memory outside of the context window. These notes get pulled back into the context window at later times.

Context Quarantine / Sub-agent architectures

Context Quarantine is the act of isolating contexts in their own dedicated threads, each used separately by one or more LLMs.

We see better results when our contexts aren’t too long and don’t sport irrelevant content. One way to achieve this is to break our tasks up into smaller, isolated jobs – each with their own context. Rather than one agent attempting to maintain state across an entire project, specialized sub-agents can handle focused tasks with clean context windows. The main agent coordinates with a high-level plan while subagents perform deep technical work or use tools to find relevant information. Each subagent might explore extensively, using tens of thousands of tokens or more, but returns only a condensed, distilled summary of its work (often 1,000-2,000 tokens).
Tool Loadout

Tool Loadout is the act of selecting only relevant tool definitions to add to your context.

In this strategy, We use RAG for selecting the most relevant tools for the model to solve the user query. By storing tool descriptions in a vector database, the model is able to select the most relevant tools given an input prompt.

Now before moving on we have to know when to use what ,The choice between these approaches depends on task characteristics. For example:

Compaction maintains conversational flow for tasks requiring extensive back-and-forth;
Note-taking excels for iterative development with clear milestones;
Multi-agent architectures handle complex research and analysis where parallel exploration pays dividends.
Tool-Loadout becomes important in a agentic system with multiple tools.

Managing context is often the toughest part of creating an agent.

The important point in all these strategies is that context isn't free. Every token in the context affects how the model behaves, for better or worse. The large context windows of modern LLMs are a strong feature, but they don't justify careless information management.

When you build a new agent or improve an existing one, ask yourself: Is everything in this context necessary? If not, you have these strategies to fix it.

Context engineering isn't just about fitting more information into a prompt—it's about understanding that every token is a vote of confidence. When you include irrelevant or conflicting information, you're essentially asking the model to guess which signals matter. That's a losing bet, especially for agents that need to reason reliably across multiple steps and tool calls.

The strategies outlined—compaction, offloading, quarantine, and tool loadout—are all manifestations of the same principle: intentionality over abundance. Modern LLMs have unprecedented context windows, but that's a capability, not a mandate. The future of effective AI systems won't belong to those who can cram the most context into a request, but to those who architect context strategically.

Our responsibility as a builder: Question every token. Measure context quality, not just context size. Build agents that fail gracefully when context is corrupted, and succeed reliably when context is clean.

As you build or refine your agents, remember: context management is context optimization. Clean context doesn't just reduce token waste—it dramatically improves model reasoning, reduces hallucination, and makes agents genuinely reliable at scale.

The next frontier isn't bigger context windows. It's smarter ones.

Sources : -

Prompting Techniques: Your Toolkit for Mastering AI Communication

Pritam Chakroborty — Sat, 26 Jul 2025 16:50:00 GMT

💡

I'm not an expert in this, but I'm an enthusiastic learner of generative artificial intelligence. All the knowledge I share is verified from reliable sources. Please feel free to point out areas of improvement and how I can do better. I'm definitely looking forward to your valuable feedback. Now go ahead….. 😊

Quick Recap: In Part 1, we looked at how LLMs create responses using settings like temperature, Top-K, and Top-P. Now, we're going to explore the details of prompting techniques that make your prompts work effectively. Ready to improve your skills?

Part 2: Fundamentals of prompting techniques - that separates good prompts from great ones.

Remember this picture from Part 1?

"Well, I'm good with English. So does that mean I'm automatically good at prompt engineering?"

Let's find out, shall we?

Here's the thing - being fluent in English is like having a good voice. But knowing prompting techniques? That's like learning how to sing professionally. Both are English, but the results are very different.

To get more clarity, Let’s assume a situation:

Suppose you are asking someone "Can you help me?"

You can say "I need guidance on English Project which is on X topic, I have Y number of information on this topic, I need you to guide me on Z ."

See the difference? Same language, completely different outcomes.

Therefore, it's not just about your English skills, but also about how you frame your question to get the best possible result.

Now let's take a deep dive into Prompting techniques.

The Foundation: Building blocks of good Prompt

Before we jump into specific techniques, let's break down what makes a prompt effective. Every great prompt has these core components:

Context → What's the situation / story behind the question ?
Task → What do you want the AI to do?
Format → How should the output look?
Constraints → What are the rules/limitations?
Examples → What does good look like?

Think of it like giving directions to a friend. You wouldn't just say "Go there."

You'd say "Take the main road, turn left at the coffee shop, look for the blue building, and park in the back."

Technique #1: Zero-Shot Prompting

The name “Zero-Shot” stands for ’no examples’.

This prompt is the simplest type of prompt. It only provides a description of a task and some text for the LLM to get started with - giving the AI a task without any examples.

The model temperature should be set to a low number, since no creativity is needed,

When to use: Simple, straightforward tasks where the instructions are clear.

💡

These code blocks are for visual ease; they are not actual TypeScript code.

   # System prompt to define the task clearly

    system_prompt = """
    You are a movie review classifier. Your task is to classify movie reviews into exactly one of these three categories:
    - POSITIVE: Reviews that express overall satisfaction, praise, or recommendation
    - NEUTRAL: Reviews that are balanced, mixed, or indifferent
    - NEGATIVE: Reviews that express overall dissatisfaction, criticism, or disappointment

    Respond with only the classification label: POSITIVE, NEUTRAL, or NEGATIVE.
    """

   # User prompt with the specific review
    user_prompt = f"""
    Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.

    Review: "{review_text}"

    Sentiment:
    """

    try:
        response = client.chat.completions.create(
            model="gpt-4", 
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.1, 
            max_tokens=5,   
            top_p=1.0,       
            frequency_penalty=0,
            presence_penalty=0
        )

        result= response.choices[0].message.content.strip()
        return result

    except Exception as e:
        print(f"Error occurred: {e}")
        return "ERROR"



Expected Output: POSITIVE

The AI knows exactly what you want, how long it should be, what tone to use, and what to include.

Even in zero-shot, be as specific as possible - clear, direct, and detailed.

When zero-shot doesn’t work, you can provide demonstrations or examples in the prompt, which leads to “one-shot” and “few-shot” prompting.

Technique #2: Few-Shot Prompting

Give the AI examples of what you want, then ask for more of the same. Examples are especially useful when you want to steer the model to a certain output structure or pattern.

A one-shot prompt, provides a single example, hence the name one-shot. The idea is the model has an example it can imitate to best complete the task.

A few-shot prompt , provides multiple examples to the model. This approach shows the model a pattern that it needs to follow. The idea is similar to one-shot, but multiple examples of the desired pattern increases the chance the model follows the pattern.

When to use: When you have a specific format, style, or pattern you want to replicate.

 # System prompt to define the task clearly
    system_prompt = """
    You are a pizza order parser. Your task is to parse customer pizza orders into valid JSON format with this exact structure:
    - "size": can be "small", "medium", "large", or "extra large"
    - "type": can be "normal" (single topping set) or "half-half" (two different topping sets)
    - "ingredients": array of arrays containing ingredients for each half

    For normal pizzas: ingredients should be a single array inside the main array
    For half-half pizzas: ingredients should contain two separate arrays for each half

    Respond with only valid JSON in the specified format.
    """

    # User prompt with examples and the specific order
    user_prompt = f"""
    Parse a customer's pizza order into valid JSON:

    EXAMPLE:
    I want a small pizza with cheese, tomato sauce, and pepperoni.
    JSON Response:

{{ "size": "small", "type": "normal", "ingredients": [["cheese", "tomato sauce", "pepperoni"]] }}


    EXAMPLE:
    Can I get a large pizza with tomato sauce, basil and mozzarella
    {{
    "size": "large",
    "type": "normal",
    "ingredients": [["tomato sauce", "basil", "mozzarella"]]
    }}

    Now parse this order:
    {order_text}
    JSON Response:
    """

    try:
        # Make the API call with settings matching the original configuration
        response = client.chat.completions.create(
            model="gpt-4",  
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.1, 
            max_tokens=250,   
            top_p=1.0,       
        )

        # Extract and return the parsed JSON
        parsed_order = response.choices[0].message.content.strip()
        return parsed_order

    except Exception as e:
        print(f"Error occurred: {e}")
        return "ERROR"

Input: 

"Now, I would like a large pizza, with the first half cheese and mozzarella. 
And the other tomato sauce, ham and pineapple."

Expected Output:

{
"size": "large",
"type": "half-half",
"ingredients": [["cheese", "mozzarella"], ["tomato sauce", "ham", "pineapple"]]
}

When you choose examples for your prompt, use examples that are relevant to the task you want to perform. The examples should be diverse, of high quality, and well written. One small mistake can confuse the model and will result in undesired output.

Technique #3: Chain-of-Thought (CoT) Prompting

Chain of Thought prompting is a technique for improving the reasoning capabilities of LLMs by generating intermediate reasoning steps. This helps the LLM generate more accurate answers.

You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding as it’s a challenge with a zero-shot chain of thought.

user_prompt= """
When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. 
How old is my partner?
 """

Output :  63 years old

Yikes. That’s obviously the wrong answer. As a matter of fact, LLMs often struggle with mathematical tasks and can provide incorrect answers – even for a task as simple as multiplying two numbers. This is because they are trained on large volumes of text and math may require a different approach.

This is where you combine CoT with few-shot prompting to get better results on more complex tasks that require reasoning before responding as it’s a challenge with a zero-shot chain of thought.

Now let’s see if intermediate reasoning steps will improve the output.

Example of ‘zero-shot’ Chain of thought.

Prompt : 
""" When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. 
How old is my partner? Let's think step by step. """

Output : 
"""
1. Let’s assume my current age is “x” years.
2. When I was 3 years old, my age was 3 years.
3. My partner’s age at that time was 3 times my age, which means my partner’s
age was 3 * 3 = 9 years.
4. Now, I am 20 years old, which means my age has increased by 20 - 3 = 17
years since I was 3 years old.
5. Since my partner’s age was 9 years when I was 3 years old, and my age has
increased by 17 years, my partner’s age must have also increased by 17 years.
6. Therefore, my partner’s current age is 9 + 17 = 26 years.
So, my partner is 26 years old.
"""

Now the final answer is correct. That’s because we clearly instructed the LLM to explain each step instead of just returning an answer.

Now let’s help the model to think a little bit more by combining it with few shot approach.

Prompt : 
"""
Q: When my brother was 2 years old, I was double his age. Now
I am 40 years old. How old is my brother? Let's think step
by step.

A: When my brother was 2 years, I was 2 * 2 = 4 years old.
That's an age difference of 2 years and I am older. Now I am 40
years old, so my brother is 40 - 2 = 38 years old. The answer
is 38.

Q: When I was 3 years old, my partner was 3 times my age. Now,
I am 20 years old. How old is my partner? 
Let's think step
by step.

A:
"""
Output :

""" When I was 3 years old, my partner was 3 * 3 = 9 years old. That’s an age
difference of 6 years and my partner is older. Now I am 20 years old, so my
partner is 20 + 6 = 26 years old. The answer is 26.
"""

Chain of thought can be useful for various use-cases. Think of code generation, for breaking down the request into a few steps, and mapping those to specific lines of code. Or for creating synthetic data when you have some kind of seed like “The product is called XYZ, write a description guiding the model through the assumptions you would make based on the product given title.” Generally, any task that can be solved by ‘talking through is a good candidate for a chain of thought. If you can explain the steps to solve the problem, try chain of thought.

With this process, you get both the answer AND the reasoning, making it easier to verify accuracy and learn from the process.

Technique #4: Role-Based Prompting

Role prompting is a technique in prompt engineering that involves assigning a specific role to the gen AI model. This can help the model to generate more relevant and informative output, as the model can craft its responses to the specific role that it has been assigned.

Prompt : """ I want you to act as a travel guide. I will write to you about
my location and you will suggest 3 places to visit near me in
a humorous style.
My suggestion: "I am in Manhattan."
Travel Suggestions:
"""
Output : """
1. Behold the Empire State of Mind: Ascend to the dizzying heights of the Empire
State Building and bask in the glory of Manhattan's skyline. Prepare to feel like
King Kong atop the Big Apple, minus the giant ape-sized banana.

2. Get Artsy-Fartsy at MoMA: Unleash your inner art aficionado at the Museum
of Modern Art (MoMA). Gaze upon masterpieces that will boggle your
mind and make you question whether your stick-figure drawings have any
artistic merit.

3. Shop ‘Til You Drop on Fifth Avenue: Indulge in some retail therapy on the
iconic Fifth Avenue. Brace yourself for sticker shock as you window-shop (or
actually shop) at designer boutiques that will make your wallet cry. But hey,
"""

Here are some styles you can choose from which I find effective: Confrontational, Descriptive, Direct, Formal, Humorous, Influential, Informal, Inspirational, Persuasive.

In this approach, AI draws from relevant knowledge patterns and adopts the thinking style of that profession.

Technique #5: Step-Back Prompting

Step-back prompting is a technique where instead of directly asking the question, first ask a broader or more general question to activate relevant background context. Then, we use that output as context to craft the final prompt. This extra “step back” gives the model more context and insight, ultimately leading to higher-quality, more accurate responses.

This approach helps the model to think more critically and more importantly it helps the model to avoid any of biasness / toxicity. It’s particularly useful for improving performance on complex or abstract prompts, where diving in directly might result in generic or shallow output.

Making the model contextually align with your query is the main goal here.

Prompt : """ Based on popular first-person shooter action games, what are
5 fictional key settings that contribute to a challenging and
engaging level storyline in a first-person shooter video game?
"""
Output : """
1. **Abandoned Military Base**: A post-apocalyptic zone crawling with rogue robots and mutated enemies.
2. **Cyberpunk City**: A neon-lit urban maze filled with cybernetic foes and hacking puzzles.
3. **Alien Spaceship**: A stranded vessel featuring zero gravity combat and alien terror.
4. **Zombie-Infested Town**: Hordes of undead in a decaying town with tight quarters and puzzle challenges.
5. **Underwater Research Facility**: A sunken lab teeming with aquatic monsters and stealth missions.
"""

Now, instead of jumping into the storyline directly, we include this rich context from the broader step-back question:

Prompt : """Context: 5 engaging themes for a first-person shooter video game:
1. Abandoned Military Base
2. Cyberpunk City
3. Alien Spaceship
4. Zombie-Infested Town
5. Underwater Research Facility

Take one of the themes and write a one paragraph storyline for a new level of a first-person shooter video game that is challenging and engaging.
"""
Output : """
In the abyss of the Atlantic, an underwater research facility crumbles beneath the weight of forgotten ambition. The player, a battle-hardened marine armed with deep-sea weaponry and razor-sharp instincts, must infiltrate the station to recover lost data—before mutated sea creatures claim it first. With claustrophobic tunnels, power outages, and lurking horrors, survival demands more than just firepower—it demands wit, timing, and a heart unshaken by the deep’s darkest secrets.
"""

By asking a general question, the model got aligned with relevant concepts and scenarios, making the final result much concise and more aligned with expectations. Step-back prompting leverages the model’s internal knowledge graph more efficiently, and even mitigates bias or irrelevant outputs by anchoring it in core principles.

It’s like you are letting your friend know your break up story before he comment / suggest any stupid idea.

To put simply, You are building the contextual frame (mental frame) before the model respond to your query.

Technique #6: Re-Act Prompting (Reason + Act)

Re-Act (short for Reason and Act) is a powerful prompting framework that allows a language model to alternate between thinking and doing—just like humans do when they face unfamiliar tasks.

This technique break-downs the process of solving a task into few steps.

Those steps are :

*Steps*	*Description of the Process*
Thought	What the model is thinking or trying to figure out?
Action	What action the model is taking: Search[query], Database[query], or Calculate[Problem Statement]
Observation	What did the model get from performing that action?

This loop of Thought → Action → Observation → Thought allows the model to gather fresh information from the outside world and update its internal decision-making process dynamically. It’s particularly useful in agent-based tasks, data retrieval, or solving problems with multiple steps and dependencies.

Prompt : """
You are a helpful AI assistant with the ability to find reason logically and use tools like 
web search, calculator, or database to find accurate and up-to-date answers.

Please answer the following query step-by-step while following the format given below.

Use the following structure repeatedly until you find the final answer:

Thought: [What are you thinking or trying to figure out?]  
Action: [What action are you taking: Search[query], Lookup[keyword], or Calculate[expression]]  
Observation: [What did you get from performing the action?]

Once you are confident in the answer, respond with:

Final Answer: [The answer to the user's original question]

---

Question: "Who won the FIFA World Cup in 2022 and what country will host the next tournament?"

Begin:

"""
Output : """
Question: "Who won the FIFA World Cup in 2022 and what country will host the next tournament?"

Thought: I need to find out which country won the FIFA World Cup in 2022, and also determine
which country will host the next (2026) World Cup. I will search for both pieces of 
information.

Action: Search["2022 FIFA World Cup winner"]

Observation: Argentina won the 2022 FIFA World Cup, defeating France in the final match.

Thought: Next, I need to find out which country is hosting the 2026 FIFA World Cup.

Action: Search["2026 FIFA World Cup host country"]

Observation: The 2026 FIFA World Cup will be jointly hosted by the United States, Canada, and 
Mexico.

Final Answer: Argentina won the 2022 FIFA World Cup. The 2026 FIFA World Cup will be jointly 
hosted by the United States, Canada, and Mexico.
"""

In this process of prompting gives the model, the ability to self-reflect, fetch external knowledge, and update its path dynamically—making it an ideal approach for real-world applications like virtual agents, data scraping, or tools-based reasoning.

And trust me that’s not all there are lots of other prompting methods as well.
I’ll mention some of them below:

Trees of Thought (ToT)
Contextual Prompting
Automatic Prompt Engineering
Self-consistency ….. etc.

Incase you want to get more details on these techniques, I have mentioned sources

Best Practices

The most important best practice is to provide (one shot / few shot) examples within a prompt. This is highly effective because it acts as a powerful teaching tool.
Prompts should be concise, clear, and easy to understand for both you and the model. As a rule of thumb, if it’s already confusing for you it will likely be also confusing for the model. Try not to use complex language and don’t provide unnecessary information.
Be specific about the desired output. A concise instruction might not guide the LLM enough or could be too generic.
Use Instructions over Constraints.
- An instruction provides explicit instructions on the desired format, style, or content of the response. It guides the model on what the model should do or produce.
- A constraint is a set of limitations or boundaries on the response. It limits what the model should not do or avoid.
  
  Just like human , LLM also prefers positive instruction over constraints means what not to do. Because it leaves the model guessing what else is allowed and it may result in hallucination.
Try using verbs that describe the action. Here’s a set of examples: Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define, Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse, Pick, Predict, Provide, Rank, Recommend, Return, Retrieve, Rewrite, Select, Show, Sort, Summarize, Translate, Write.

Conclusion

Prompt Engineering is an evolving field. And it will take sometime for us to get the hold of it.
Therefore, Keep experimenting with your prompt and another important suggestion, I received is that you should document your ‘Prompt History‘ means your question and the response you got from the model.
Keeping track of your prompt helps you to refine your prompts and tweak the model response into a desired one.

Here is one demo template :

That's all for this article, and this marks the end of my Prompt Engineering Series. In case you missed it, there is another article on this topic - Part One.

This is Part Two of the series, where I have tried to put together all my learnings on prompt engineering.

Here are some sources of my knowledge:

Prompt Engineering: The Art of Effective AI Communication

Pritam Chakroborty — Wed, 25 Jun 2025 10:05:49 GMT

💡

I'm not an expert in this, but I'm an enthusiastic learner of generative artificial intelligence. All the knowledge I share is verified from reliable sources. Please feel free to point out areas for improvement and how I can do better. I'm definitely looking forward to your valuable feedback. Now go ahead….. 😊

Why the most criticized AI skill might be the most valuable one you'll learn.

You must have seen pictures like this. It’s all over the internet.

Haha😉, the hate is real.
I also hated this term until I understood it better.

I'm not here to convince you to change your mind about "Prompt Engineering."

Instead, I'll leave that decision to you. But I do suggest that before making a final judgment, we should at least try to understand what “Prompt Engineering” is.

So, let's give it a try, shall we?

Does this tweet make any sense to you? If not, that’s completely fine because trust me, after reading this article, you will have a clear picture of why someone said so.

Before we jump into the topic, let me answer some of the hottest questions on the internet.

Is Prompt Engineering only a software developer thing?

I am a physician/doctor/graphic designer/painter/businessman/creative writer/architect … (list goes on). Should I learn prompt engineering?

— My thoughts on these questions —→ No, Prompt Engineering is not only for Software Developers. It’s a universal skill for anyone willing to skill up in their respective domain using the advantage of LLMs. And let me tell you, using AI in your domain is not a typical big NO NO anymore. Whether you are a doctor looking for a precise medical query or a graphic designer trying to get creative inspiration, or a singer looking for the tune that matches your rhythm, knowing how to communicate to get the best out of these language models amplifies your expertise.

Think of it as learning a new language that turns AI into a powerful assistant in your field. Investing time in learning prompt engineering benefits you in the long run, no matter your profession.

Now, with a positive attitude, let's move forward.

What is a prompt?

A prompt is simply an instruction or a question you give to an LLM to get it to do something for you. A prompt can be a single word, a sentence, or even a detailed paragraph, depending on what you're trying to accomplish.

Now, what is Prompt Engineering?

Prompt engineering is the process of designing high-quality prompts that guide LLMs to produce accurate outputs.

When you write a prompt, you are attempting to set up the LLM to predict the right sequence of tokens.

You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. However, writing a prompt that suits your requirements can be tricky. Because it depends on multiple factors like - Model Configuration, your requirements, choice of words, tune of words, examples (if there are any), structure of the question, context of the question, etc.

An ambiguous question will result in an ambiguous answer. As simple as that. If you ask a question that already sounds confusing to you, most probably it will confuse LLM models too and will result in hallucination.

As I said, it’s more than the hate we see on the internet.

In a recent speech at YC AI Startup School, Andrej Karpathy discussed the psychology of LLMs. — He characterizes LLMs as "fallible savants" with unique cognitive quirks. They can be superhuman at some tasks but make mistakes that no human will make. He said LLMs are like a patient who is diagnosed with Anterograde Amnesia, which means they can't form new long-term memories within a conversation beyond their context window.

So now it is fair to say that it is certainly not any other buzzword, but a universal skill to move forward.

💡

In this article, we will discuss how to adjust and modify LLM outputs based on our needs.

Some Jargon to Define LLMs’ Output

Output Length

Although it seems straightforward, many people find it confusing. It simply refers to how many tokens the model should use in its response. It doesn't guarantee the quality of the response, just the length.

Keep in mind: a long response means using many tokens, which means LLMs use more computing power, making the response more expensive.
A shorter response uses fewer tokens, so LLMs use less computing power, making the response less expensive.

Reducing the output length of the LLM doesn’t cause the LLM to become more stylistically or textually succinct in the output it creates, it just causes the LLM to stop predicting more tokens once the limit is reached.

Sampling Control ( It’s the settings that define how the next token will be chosen )

Language Models do not predict any single token; it’s just predict the probability of what the next token could be in LLM’s vocabulary. Given a list of probabilities, now it’s up to the Sampling Control settings that determine how these probabilities are used to choose one output token.

Some of the most common sampling control configurations ——

2a. Temperature: Creativity Control Room

Temperature — (You may think I know this one, I know this one, this is classic. No, it’s not what you may have convinced yourself to think.) Temperature simply defines how much creativity/risk LLM should consider with word choices.

How It Works: The Pizza Restaurant Analogy

Imagine you're at a pizza restaurant and the waiter asks what you want. The menu shows popularity ratings:

Margherita Pizza: 45% of customers order this
Pepperoni Pizza: 30% of customers order this
Hawaiian Pizza: 15% of customers order this
Veggie Supreme: 8% of customers order this
Anchovy Special: 2% of customers order this

Temperature controls how much you stick to popular choices vs. trying something different.

Let’s take another example :

—— " The cat walked into the room and saw a..."

Now, let's use this sentence to see how different 'Sampling Control' settings affect the LLM's word choices.

💡

These code blocks are for visual ease; they are not actual TypeScript code.

The LLMs internal probability distribution might look like:

"mouse" (35%)
"ball" (20%)
"person" (18%)
"toy" (12%)
"shadow" (8%)
"ghost" (4%)
"unicorn" (2%)
"spaceship" (1%)

Now let's see how different temperature settings affect the output:

Temperature = 0 (Greedy Decoding)

Behavior: Always picks the highest probability word, no exceptions.

Output: "The cat walked into the room and saw a mouse." 
Every single time: "mouse" (because it has 35% probability - the highest)

Why use this:

Math problems: "2 + 2 = 4" (not "2 + 2 = purple")
Code generation: Variable names should be consistent
Factual questions: "Paris is the capital of France" (not "Paris is the capital of pizza")

Temperature = 0.2 (Very Low)

Behavior: Heavily favors high-probability words, but allows tiny variations.

Possible outputs:

"The cat walked into the room and saw a mouse." (90% of the time)
"The cat walked into the room and saw a ball." (8% of the time)
"The cat walked into the room and saw a person." (2% of the time)

Why use this:

Professional emails: Consistent tone, occasional word variety
Technical documentation: Clear but not robotic
Customer service responses: Reliable but slightly personalized

Temperature = 0.5 (Low-Medium)

Behaviour: Still prefers likely words but gives reasonable alternatives a fair chance.

Possible outputs:

"The cat walked into the room and saw a mouse." (50% of the time)
"The cat walked into the room and saw a ball." (25% of the time)
"The cat walked into the room and saw a person." (20% of the time)
"The cat walked into the room and saw a toy." (5% of the time)

Why use this:

Blog writing: Natural variety without being weird
Conversational AI: Engaging but predictable
Content creation: Fresh but coherent

Temperature = 0.8 (Medium-High)

Behavior: Good balance - explores less likely options while staying sensible.

Possible outputs:

"The cat walked into the room and saw a mouse." (25% of the time)
"The cat walked into the room and saw a ball." (20% of the time)
"The cat walked into the room and saw a person." (20% of the time)
"The cat walked into the room and saw a toy." (15% of the time)
"The cat walked into the room and saw a shadow." (12% of the time)
"The cat walked into the room and saw a ghost." (8% of the time)

Why use this:

Creative writing: Interesting but logical
Story generation: Surprising but believable plot points
Dialogue writing: Natural human-like conversation

Temperature = 1.2 (High)

Behavior: Much more willing to pick unusual options, probability differences matter less.

Possible outputs:

"The cat walked into the room and saw a unicorn."
"The cat walked into the room and saw a spaceship."
"The cat walked into the room and saw a mouse wearing a tiny hat."
"The cat walked into the room and saw a portal to another dimension."
"The cat walked into the room and saw a philosophical debate."

Why use this:

Brainstorming: Need completely unexpected ideas
Surreal creative writing: Fantasy, sci-fi, experimental fiction
Idea generation: Breaking out of conventional thinking

Temperature = 2.0+ (Very High)

Behavior: Almost random - all words become nearly equally likely.

Possible outputs:

"The cat walked into the room and saw a refrigerator singing opera."
"The cat walked into the room and saw a mathematics of purple seventeen."
"The cat walked into the room and saw a democracy flavored with nostalgia."

Why use this:

Abstract art generation
Surreal poetry
Random idea sparks (though often nonsensical)

What it does: Only considers the K most likely words, ignoring the rest.

Top-K = 1

Available choices: Only "mouse" (the single highest probability word)
Output: "The cat walked into the room and saw a mouse." (always exactly the same)

Why use this: When you need 100% predictable results. Same as setting temperature to 0.

Top-K = 3

Available choices: "mouse," "ball," "person" (top 3 most likely)

Possible outputs:
"The cat walked into the room and saw a mouse."
"The cat walked into the room and saw a ball."
"The cat walked into the room and saw a person."

Why use this: Very controlled creativity - safe options only.

Top-K = 6

Available choices: "mouse," "ball," "person," "toy," "shadow," "ghost" (top 6)

Possible outputs:
"The cat walked into the room and saw a shadow."
"The cat walked into the room and saw a ghost."
(Plus all previous options)

Why use this: Moderate creativity - includes some interesting but reasonable choices.

Top-K = 50

Available choices: All 8 words from our list, plus 42 other possible words like "butterfly," "mirror," 
"photograph," "rainbow," "telephone," etc.

Possible outputs:
"The cat walked into the room and saw a butterfly."
"The cat walked into the room and saw a mysterious photograph."
"The cat walked into the room and saw a ringing telephone."

Why use this: High creativity - many unexpected but sensible possibilities.

2c. Top-P (Nucleus Sampling): The Probability Threshold

What it does: Includes words until their combined probability reaches P, then stops.

Top-P = 0.5 (50% threshold)
Process: Keep adding words until we hit 50% total probability

"mouse" (35%) → Running total: 35%
"ball" (20%) → Running total: 55% ← STOP! We passed 50%

Available choices: Only "mouse" and "ball"

Possible outputs:
"The cat walked into the room and saw a mouse."
"The cat walked into the room and saw a ball."

Why use this: Very safe, predictable outputs. Good for professional writing where you can not afford weird
 word choices.
---------------------------------------------------------------------------------------------------------
Top-P = 0.75 (75% threshold)
Process: Keep adding until 75%

"mouse" (35%) → 35%
"ball" (20%) → 55%
"person" (18%) → 73%
"toy" (12%) → 85% ← STOP! We passed 75%

Available choices: "mouse," "ball," "person," "toy"

Possible outputs:
"The cat walked into the room and saw a mouse."
"The cat walked into the room and saw a person."
"The cat walked into the room and saw a toy."

Why use this: Good balance - includes reasonable alternatives while filtering out the weird stuff.
---------------------------------------------------------------------------------------------------------
Top-P = 0.95 (95% threshold)
Process: Keep adding until 95%

"mouse" (35%) → 35%
"ball" (20%) → 55%
"person" (18%) → 73%
"toy" (12%) → 85%
"shadow" (8%) → 93%
"ghost" (4%) → 97% ← STOP! We passed 95%

Available choices: "mouse," "ball," "person," "toy," "shadow," "ghost"

Possible outputs:
"The cat walked into the room and saw a shadow."
"The cat walked into the room and saw a ghost."
(Plus all the previous options)

Why use this: Most common setting - gives creativity while avoiding the really bizarre choices like "unicorn" and
"spaceship."
---------------------------------------------------------------------------------------------------------
Top-P = 1.0 (100% threshold)
Process: Include everything, no matter how unlikely
Available choices: Every single word, including "unicorn" and "spaceship"

Possible outputs:
"The cat walked into the room and saw a unicorn."
"The cat walked into the room and saw a spaceship."
"The cat walked into the room and saw a philosophical debate."

Why use this: Maximum creativity, but risk of nonsensical results.

The best way to choose between top-K and top-P is to experiment with both methods (or both together) and see which one produces the results you are looking for.

Now that we have the whole picture of how LLMs determine the possible token, it becomes really important to know that we use all these configurations together while forming a prompt, because the end goal here is being able to write an appropriate prompt based on our requirements.

So, how do They Work Together:-

The AI applies these filters in sequence:

First, apply Top-K and Top-P filters to create a shortlist
Then, use temperature to sample from that shortlist

Example:

Settings: Temperature=0.7, Top-K=10, Top-P=0.8

For "Write a story about a dragon who...":

Top-K limits to 10 most likely next words: "lived," "flew," "breathed," "loved," "feared," "discovered," "lost,"
 "wanted," "dreamed," "fought"

Top-P removes words until cumulative probability ≤ 80%: keeps "lived," "flew," "breathed," "loved," "feared"

Temperature=0.7 samples from these 5 options with moderate randomness

Result: "Write a story about a dragon who feared the dark."

Common Problems and Solutions

The "Repetition Loop Bug"

Problem: AI gets stuck repeating the same phrases over and over.

Example of bad output: "The solution is simple, the solution is simple, the solution is simple, the solution is..."

Why does it happen?

Low temperature: AI becomes too rigid, follows the same probability path
High temperature: Random choices accidentally cycle back to the previous text

Solution: Adjust temperature to 0.3-0.7 range and use Top-P around 0.9-0.95

Extreme Settings Cancel Each Other Out

Temperature = 0: Makes Top-K and Top-P irrelevant (always picks the most probable)

Top-K = 1: Makes temperature irrelevant (only one choice available)

Top-P = 0: Makes other settings irrelevant (only the most probable word allowed)

Quick Reference

Here is a table of different use case scenarios with various sampling control configurations.

Task Type	Temperature	Top-P	Top-K	Example Use
Math/Code	0	0.9	20	"What is 2+2?"
Technical Writing	0.1-0.2	0.9	20	API documentation
Business Writing	0.2-0.5	0.95	30	Professional emails
Creative Writing	0.7-0.9	0.99	40	Short stories
Brainstorming	0.8-1.0	0.99	50	Idea generation

Conclusion: -

So far, we have seen how to tweak LLM’s output based on your needs through Sampling control configurations. Also discussed some problems and their solutions. Understanding temperature, Top-K, and Top-P settings allows you to fine-tune creativity versus consistency based on your specific needs.

Therefore, the question isn't whether prompt engineering is "real" or just hype. The question is: Are you ready to harness the full potential of AI tools in your field?

Haha 😒 Did you forget that at the start of this article I showed you a tweet saying——

The hottest new programming language is ‘English‘.

- Andrej Karpathy

Now, in the whole article, have you seen anything related to any particular field? No, right. As I said, it is universal.

Isn't it amazing that we can communicate with these incredibly smart language models using just plain, structured English?

We can use English to communicate with LLMs and build applications, businesses, solve our problems, demystify ongoing challenges, and create exciting new opportunities. In the very near future ( Within the next 6 months ), the whole industry will be driven by an agentic AI ecosystem, where your prompts become your intellectual asset of the business. Therefore, it is extremely important to know how we craft our intellectual assets.

And all these are in Simple English, but strategically crafted for specific use cases.

I mean, it’s mind-boggling, right?

See this article to get more insights: Takeaways from the AI Engineer World's Fair: The startup playbook is being rewritten in real-time – GeekWire (It’s not about prompt engineering)

Now, you may assume that, well, I’m good with English. So does that mean —

Well, we will see……

In my next article, we will continue from here onwards, where I’ll write about various prompting techniques.

Feedbacks are appreciated.

AI-Generated Summary