Ai agents Notes

10 Feb, 2026 • 8 min read

Worlflow

Prompt chaining decomposes a task into a sequence of steps, where each LLM call processes the output of the previous one. You can add programmatic checks on any intermediate steps to ensure that the process is still on track.

When to use this workflow: This workflow is ideal for situations where the task can be easily and cleanly decomposed into fixed subtasks. The main goal is to trade off latency for higher accuracy, by making each LLM call an easier task.

Prompt chaining decomposes a task into a sequence of steps, where each LLM call processes the output of the previous one. You can add programmatic checks (see “gate” in the diagram below) on any intermediate steps to ensure that the process is still on track.

Routing classifies an input and directs it to a specialized followup task. This workflow allows for separation of concerns, and building more specialized prompts. Without this workflow, optimizing for one kind of input can hurt performance on other inputs.

When to use this workflow: Routing works well for complex tasks where there are distinct categories that are better handled separately, and where classification can be handled accurately, either by an LLM or a more traditional classification model/algorithm.

LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow, parallelization, manifests in two key variations:

Sectioning: Breaking a task into independent subtasks run in parallel.
Voting: Running the same task multiple times to get diverse outputs.

When to use this workflow: Parallelization is effective when the divided subtasks can be parallelized for speed, or when multiple perspectives or attempts are needed for higher confidence results. For complex tasks with multiple considerations, LLMs generally perform better when each consideration is handled by a separate LLM call, allowing focused attention on each specific aspect.

In the orchestrator-workers workflow, a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results.When to use this workflow: This workflow is well-suited for complex tasks where you can’t predict the subtasks needed (in coding, for example, the number of files that need to be changed and the nature of the change in each file likely depend on the task). Whereas it’s topographically similar, the key difference from parallelization is its flexibility—subtasks aren’t pre-defined, but determined by the orchestrator based on the specific input.

In the evaluator-optimizer workflow, one LLM call generates a response while another provides evaluation and feedback in a loop. When to use this workflow: This workflow is particularly effective when we have clear evaluation criteria, and when iterative refinement provides measurable value. The two signs of good fit are, first, that LLM responses can be demonstrably improved when a human articulates their feedback; and second, that the LLM can provide such feedback. This is analogous to the iterative writing process a human writer might go through when producing a polished document.

Text to SQL in Pinterest

The user asks an analytical question, choosing the tables to be used.

The relevant table schemas are retrieved from the table metadata store.
The question, selected SQL dialect, and table schemas are compiled into a Text-to-SQL prompt.
The prompt is fed into the LLM.
A streaming response is generated and displayed to the user.

The table schema acquired from the metadata store includes: Table name Table description Columns Column name Column type Column description

Low-Cardinality Columns

Certain analytical queries, such as “how many active users are on the ‘web’ platform”, may generate SQL queries that do not conform to the database’s actual values if generated naively. For example, the where clause in the response might bewhere platform=’web’ as opposed to the correct where platform=’WEB’. To address such issues, unique values of low-cardinality columns which would frequently be used for this kind of filtering are processed and incorporated into the table schema, so that the LLM can make use of this information to generate precise SQL queries.

Context Window Limit

Extremely large table schemas might exceed the typical context window limit. To address this problem, we employed a few techniques:

Reduced version of the table schema: This includes only crucial elements such as the table name, column name, and type. Column pruning: Columns are tagged in the metadata store, and we exclude certain ones from the table schema based on their tags.

you are a {dialect} expert.

Please help to generate a {dialect} query to answer the question. Your response should ONLY be based on the given context and follow the response guidelines and format instructions.

===Tables
{table_schemas}

===Original Query
{original_query}

===Response Guidelines
1. If the provided context is sufficient, please generate a valid query without any explanations for the question. The query should start with a comment containing the question being asked.
2. If the provided context is insufficient, please explain why it can't be generated.
3. Please use the most relevant table(s).
5. Please format the query before responding.
6. Please always respond with a valid well-formed JSON object with the following format

===Response Format
{{
    "query": "A generated SQL query when context is sufficient.",
    "explanation": "An explanation of failing to generate the query."
}}

===Question
{question}

spider dataset : https://arxiv.org/pdf/2204.00498

An offline job is employed to generate a vector index of tables’ summaries and historical queries against them. If the user does not specify any tables, their question is transformed into embeddings, and a similarity search is conducted against the vector index to infer the top N suitable tables. The top N tables, along with the table schema and analytical question, are compiled into a prompt for LLM to select the top K most relevant tables. The top K tables are returned to the user for validation or alteration. The standard Text-to-SQL process is resumed with the user-confirmed tables.

Offline Vector Index Creation

Table Summarization There is an ongoing table standardization effort at Pinterest to add tiering for the tables. We index only top-tier tables, promoting the use of these higher-quality datasets. The table summarization generation process involves the following steps:

Retrieve the table schema from the table metadata store. Gather the most recent sample queries utilizing the table. Based on the context window, incorporate as many sample queries as possible into the table summarization prompt, along with the table schema. Forward the prompt to the LLM to create the summary. Generate and store embeddings in the vector store. The table summary includes description of the table, the data it contains, as well as potential use scenarios. Here is the current prompt we are using for table summarization:

prompt_template = """
You are a data analyst that can help summarize SQL tables.

Summarize below table by the given context.

===Table Schema
{table_schema}

===Sample Queries
{sample_queries}

===Response guideline
 - You shall write the summary based only on provided information.
 - Note that above sampled queries are only small sample of queries and thus not all possible use of tables are represented, and only some columns in the table are used.
 - Do not use any adjective to describe the table. For example, the importance of the table, its comprehensiveness or if it is crucial, or who may be using it. For example, you can say that a table contains certain types of data, but you cannot say that the table contains a 'wealth' of data, or that it is 'comprehensive'.
 - Do not mention about the sampled query. Only talk objectively about the type of data the table contains and its possible utilities.
 - Please also include some potential usecases of the table, e.g. what kind of questions can be answered by the table, what kind of analysis can be done by the table, etc.
"""

Query Summarization Besides their role in table summarization, sample queries associated with each table are also summarized individually, including details such as the query’s purpose and utilized tables. Here is the prompt we are using:

prompt_template = """
You are a helpful assistant that can help document SQL queries.

Please document below SQL query by the given table schemas.

===SQL Query
{query}

===Table Schemas
{table_schemas}

===Response Guidelines
Please provide the following list of descriptions for the query:
-The selected columns and their description
-The input tables of the query and the join pattern
-Query's detailed transformation logic in plain english, and why these transformation are necessary
-The type of filters performed by the query, and why these filters are necessary
-Write very detailed purposes and motives of the query in detail
-Write possible business and functional purposes of the query
"""

NLP Table Search When a user asks an analytical question, we convert it into embeddings using the same embedding model. Then we conduct a search against both table and query vector indices. We’re using OpenSearch as the vector store and using its built in similarity search ability.

Considering that multiple tables can be associated with a query, a single table could appear multiple times in the similarity search results. Currently, we utilize a simplified strategy to aggregate and score them. Table summaries carry more weight than query summaries, a scoring strategy that could be adjusted in the future.

Other than being used in the Text-to-SQL, this NLP-based table search is also used in the general table search in Querybook.

RAG

#Blog #LLM #Notes #Tips #Training