NaturalQL - Guardrailed Natural Language → SQL

11 September 2025 - 9 mins read time
Tags: LLMs NLP Data Engineering DuckDB Streamlit Prompt Engineering

Demo: natural-language → SQL with safety rails, small dataset, and a friendly UI.

Why this exists

Enter NaturalQL: a Streamlit app that converts natural-language questions into safe SQL, runs them on a small DuckDB database, and shows results - with schema-bounded prompting, static SQL checks, and a single repair pass if the first attempt fails.

View the code on GitHub

TL;DR - What I built

NL → SQL (DuckDB dialect) using OpenAI Chat Completions
Guardrails:
- SELECT-only, LIMIT enforced
- Schema-bounded generation (exact tables/columns passed to the model)
- Static validation with sqlglot (alias-aware, m.* supported)
- One repair loop using the parser error as feedback
Deterministic time phrases (“this summer”, “July 2025”) resolved against a fixed demo date
Streamlit UI with Query and About tabs, optional Show SQL and Explain SQL
Zero-ops DB: DuckDB + seeded “cinema” schema that’s simple yet query-rich

Architecture (at a glance)

flowchart LR
  U[User] --> UI[Streamlit UI]
  UI -->|NL prompt| LLM[OpenAI Chat Completions]
  UI --> SCH[Schema Text]
  LLM -->|SQL| GR[Guardrails: sanitize & sqlglot]
  GR -->|valid SELECT| DB[(DuckDB)]
  GR -. on error .-> RP[Repair Pass]
  RP --> LLM
  DB -->|DataFrame| UI

  subgraph Guardrails
    GR
    RP
  end
  subgraph Data
    SCH
    DB
  end

The schema (why queries are fun)

We want something anyone can reason about, but rich enough to form tricky questions (debut directors, festival ranks, overlapping date windows, “cast never acted in award-winning films”, etc.).

erDiagram
  CINEMAS ||--o{ SCREENINGS : has
  MOVIES  ||--o{ SCREENINGS : is_shown_at
  MOVIES  ||--o{ MOVIE_DIRECTORS : has
  PEOPLE  ||--o{ MOVIE_DIRECTORS : directs
  MOVIES  ||--o{ MOVIE_CAST : has
  PEOPLE  ||--o{ MOVIE_CAST : acts_in
  MOVIES  ||--o{ MOVIE_GENRES : categorized_as
  GENRES  ||--o{ MOVIE_GENRES : includes
  FESTIVALS ||--o{ FESTIVAL_ENTRIES : includes
  MOVIES  ||--o{ FESTIVAL_ENTRIES : submits
  AWARDS  ||--o{ MOVIE_AWARDS : grants
  MOVIES  ||--o{ MOVIE_AWARDS : receives

Tables: cinemas, movies, people, movie_directors, movie_cast, genres, movie_genres, festivals, festival_entries, awards, movie_awards, screenings.

Guardrails that keep things safe (and demo-friendly)

1) Sanitization

Reject non-SELECT statements
Strip semicolons; enforce LIMIT (configurable)

def sanitize_sql(sql: str, limit: int) -> str:
    s = sql.strip().rstrip(";")
    if not re.match(r"(?is)^\\s*select\\b|^\\s*with\\b", s):
        raise ValueError("Only SELECT / WITH queries are allowed.")
    if re.search(r"(?i)\\b(insert|update|delete|drop|alter|create|grant|revoke)\\b", s):
        raise ValueError("Only read queries are allowed.")
    if not re.search(r"(?i)\\blimit\\b", s):
        s += f" LIMIT {limit}"
    return s

2) Static validation (sqlglot)

Parse and walk the AST
Alias-aware table/column checks
Allow * and m.* but validate the table/alias exists

tree = parse_one(sql, read="duckdb")
# …build alias map from FROM/JOIN, validate tables…
# …for each Column node: resolve alias→base table, ensure column exists…

3) One repair pass

If parsing/validation fails, we call a second prompt that includes the error message and the same schema, often fixing things like rank vs festival_rank, missing GROUP BY columns, or date overlaps.

Prompting: tiny rules = big stability

We keep the system prompt short and specific (DuckDB dialect):

You convert natural language to DuckDB SQL.

HARD REQUIREMENTS:
1) Single SELECT (or WITH…SELECT). No DDL/DML. No comments.
2) Use ONLY these tables/columns (schema below). Explicit JOINs with ON.
3) Assume TODAY = 2025-09-10.
   Summer = Jun 1–Aug 31. Use date overlap: start <= end AND end >= start.
4) Always include LIMIT {result_limit}.
5) Grouping: GROUP BY all non-aggregated columns.
6) 'New releases' => screenings.is_new_release = TRUE.
7) Use festivals.festival_rank (not 'rank').
8) Interpret "cast never acted in an award-winning movie" as:
   none of that movie's cast appear in movie_cast for any movie where movie_awards.is_winner = TRUE.
   Use NOT EXISTS.

We then append the exact schema text and the user request.

Implementation highlights

Streamlit: stable connection, no reload surprises

Use st.session_state to hold the DuckDB connection and initialize the schema once per server process.

def get_conn(path: str):
    if "conn" not in st.session_state:
        conn = duckdb.connect(path)
        st.session_state["conn"] = conn
    if not st.session_state.get("db_initialized"):
        init_db(st.session_state["conn"], force_rebuild=False)
        st.session_state["db_initialized"] = True
    return st.session_state["conn"]

Aliases + qualified stars

We updated the validator to accept * and m.* while still enforcing valid tables/aliases.

Explain step

An optional “Explain SQL” button prompts the model to summarize the generated query in a couple of bullet points.

Try these queries (they work on the seed data)

Long multi-constraint example (robust) “Show all new Sci-Fi movies screened at Cinema Luna between 1 Jun and 31 Aug 2025, directed by debut directors, that never participated in A or S ranked festivals, share no cast member with any award-winning movie, have runtime ≥ 100 minutes, and were shown in 2D (not IMAX).”
“For each cinema, count how many new releases were screening in August 2025.”
“List movies released in 2025 with their directors and primary genre.”
“Find actors who worked with more than one director.”

“Return all movies from the most prolific director.” (Tie-aware reference SQL:)

WITH director_counts AS (
  SELECT person_id, COUNT(DISTINCT movie_id) AS film_count
  FROM movie_directors
  GROUP BY person_id
),
top_directors AS (
  SELECT person_id
  FROM director_counts
  WHERE film_count = (SELECT MAX(film_count) FROM director_counts)
)
SELECT m.title, p.name AS director, dc.film_count
FROM movies m
JOIN movie_directors md ON m.movie_id = md.movie_id
JOIN people p ON md.person_id = p.person_id
JOIN director_counts dc ON dc.person_id = md.person_id
WHERE md.person_id IN (SELECT person_id FROM top_directors)
ORDER BY dc.film_count DESC, p.name, m.release_date
LIMIT 50;

Tip: keep examples to one question per bullet. If you paste two sentences (e.g., after a “·”), the model may switch tasks mid-prompt.

Running it yourself

# Poetry workflow
poetry install
poetry run streamlit run src/naturalql/app.py

Create a .env in the project root:

OPENAI_API_KEY=sk-...
NQL_MODEL=gpt-4o-mini
NQL_DB_PATH=naturalql.duckdb
NQL_RESULT_LIMIT=50
NQL_TODAY=2025-09-10

On duckdb wheels: I recommend Python 3.11 with duckdb==0.10.3 for smooth sailing.

What went wrong (and how we fixed it)

Model “ate” multiple sentences → If a prompt had two tasks (“… and also count …”), it sometimes pivoted. Fix: keep examples to one intent; add a simple pre-processor to cut at the first bullet/period if you like.
Alias/m.* false positives → Our initial validator blocked alias.*. Fix: teach the checker to accept qualified stars while validating aliases.
Streamlit + native wheels → Some stacks trip over pybind reloads. Fix: keep the connection in session_state, turn off the file watcher, or use Python 3.11 + DuckDB 0.10.x.
Date windows → Be explicit: overlap logic (start <= end AND end >= start) beats naive BETWEEN.

Where I’d take it next

Self-verification pass: a second agent critiques and rewrites SQL before execution
RLS / column whitelist for real datasets
Postgres back end + connection pool
Telemetry: token/latency charts and a small query cache
Domain glossary: map synonyms (“premiere”, “opening weekend”) to schema terms

Professional takeaway

This tiny project lands a few interesting points:

Agentic NL→SQL doesn’t have to be risky - guardrails + schema bounding go a long way.
Determinism (fixed “today”, explicit date overlaps) makes demos robust.
Design over flash: a small, readable stack (Streamlit + DuckDB + sqlglot) lets you talk about LLM prompting, validation, and data plumbing without getting lost in infra.

If you want to adapt NaturalQL to your domain (e.g., logistics or finance), swap the seed schema + prompts, keep the guardrails, and you’ve got a focused demo in under an hour.