sqlgg: SQL Guided (code) Generator ygrek.org.ua / p / sqlgg

sqlgg: SQL Guided (code) Generator

SQL query parser and binding code generator for C#, C++, Java, OCaml.


Writing database layer code is usually tedious and error-prone, due to the mix of different languages. SQL queries constructed dynamically need to bind external data (from application), and the resulting rowset must be decomposed into application native data. The impedance mismatch between database layer and application code, as well as the absence of static (compile-time) checking, frequently leads to runtime failures (and sometimes even security holes). One can factor out all common database communication code, hide the database under some application-specific abstraction, but one always needs to manually specify correspondence between SQL query binding slots (or resulting rowset columns) and code variables. This mapping should be updated manually every time SQL query is modified.


sqlgg is an SQL query parser and code generator which ensures that application code and database queries are in sync. It analyzes SQL query and determines the set of input parameters (values for INSERT, run-time substitution parameters) and the set of resulting columns (for SELECT). Then it generates the code in host language, mapping query parameters on function arguments with corresponding native data types. So basically you provide an SQL query and generator creates a function which takes the set of typed parameters as required to fill slots in a query. Generated code binds provided parameters into query and executes it. The code for SELECT statements additionally invokes a strongly-typed callback function for every row of the resulting rowset. The most fruitful consequence of such approach is that the host language compiler will itself check that functions generated from SQL queries will be called correctly (i.e. all parameters bound with correct types). So if you modify the query and forget to update the code -- the compiler will point on erroneous parts.

Complete example

Suppose we have the database of some money relationships for the group of people. We use the following SQL tables and queries:

-- @create_person
-- @add_person
INSERT INTO person (name,surname) VALUES;

-- @create_money
-- @add_money

-- @calc_total
SELECT name || ' ' || surname AS fullname, SUM(amount) as total FROM person JOIN money ON src = id GROUP BY id;
-- @list_donors
SELECT DISTINCT surname FROM person JOIN money ON src = id AND dst = (SELECT id FROM person WHERE surname LIKE ?) LIMIT ?;


Generate the binding C++ code (boilerplate omitted, only function prototypes shown): (corresponding C# C++ Java OCaml XML)

// generated by sqlgg 0.2.2+ (f5b76ac7) on 2009-05-30T19:22Z
// visit http://ygrek.org.ua/p/sqlgg/

#pragma once

template <class Traits>
struct sqlgg
  struct create_person
    create_person(typename Traits::connection db);
    bool operator()();
  }; // struct create_person

  struct add_person
    add_person(typename Traits::connection db);
    bool operator()(typename Traits::Text const& name, typename Traits::Text const& surname);
  }; // struct add_person

  struct create_money
    create_money(typename Traits::connection db);
    bool operator()();
  }; // struct create_money

  struct add_money
    add_money(typename Traits::connection db);
    bool operator()(typename Traits::Int const& src, typename Traits::Int const& dst, typename Traits::Int const& amount);
  }; // struct add_money

  struct calc_total
    calc_total(typename Traits::connection db);
    template<class T>
    bool operator()(T result);
  }; // struct calc_total

  struct list_donors
    list_donors(typename Traits::connection db);
    template<class T>
    bool operator()(typename Traits::Text const& _0, typename Traits::Int const& _1, T result);
  }; // struct list_donors

  struct drop_person
    drop_person(typename Traits::connection db);
    bool operator()();
  }; // struct drop_person

  struct drop_money
    drop_money(typename Traits::connection db);
    bool operator()();
  }; // struct drop_money

  create_person create_person;
  add_person add_person;
  create_money create_money;
  add_money add_money;
  calc_total calc_total;
  list_donors list_donors;
  drop_person drop_person;
  drop_money drop_money;

  sqlgg(typename Traits::connection db) : create_person(db), add_person(db), create_money(db), add_money(db), calc_total(db), list_donors(db), drop_person(db), drop_money(db)
}; // struct sqlgg

Things to note above:

  1. Syntax. All queries are written in plain SQL. Every statement should be terminated with semicolon. There is no need to write (?,?) after VALUES in INSERT statement. The annotation [sqlgg] name=function_name (or simply @function_name) before the query specifies the name of the generated function (this annotation is optional but very convenient).
  2. The generated code is parametrized by database-specific class Traits. It specifies the correspondence between SQL and native types, provides types for database connection and implements actual code to execute statements. Traits should be implemented once for every specific database API.
  3. Each statement has a corresponding struct which takes connection object as a constructor argument. This allows to reuse prepared statements several times (if database supports this feature).
  4. add_money() function call operator takes three data parameters -- the values to INSERT into table (note the names and types).
  5. calc_total() returns data via result parameter, which is a callback that gets executed for each row of the resulting rowset, with statically typed parameters binding each column of a row. In this case the parameters are fullname of type Traits::Text and total of type Traits::Int.
  6. The types of parameters for list_donors were inferred correctly (limit is Int and surname is Text.
  7. Statements of arbitrary depth across many tables are supported.
  8. Statements are checked for correctness as far as generator is concerned, so it will detect syntax errors, non-existent columns in expressions, mismatched columns in compound statements, ambiguous column names etc.

Then manually-written C++ code boils down to (without error-checking): (corresponding C# C++ Java OCaml)

#include "../impl/sqlite3_traits.hpp" // sqlite3 traits
#include "demo_cxx_gen.hpp" // generated
#include <iostream>
#include <vector>

using namespace std;

typedef sqlgg<sqlite3_traits> gen_t;
typedef long long int64;

struct output_transfers
  void operator()(std::string const& fullname, int total)
     cout << fullname << " = " << total << endl;

struct output_donors
  void operator()(std::string const& surname)
     cout << surname << endl;

int main()
  sqlite3* db = NULL;
  sqlite3_open(":memory:", &db);

  gen_t gen(db);

  // create tables

  // add all person records
  int64 john = sqlite3_last_insert_rowid(db);
  int64 ivan = sqlite3_last_insert_rowid(db);
  int64 sancho = sqlite3_last_insert_rowid(db);

  // add money relations

  // summarize by person
  cout << "Total transfers:" << endl;

  // list donors
  cout << "Donors:" << endl;

  // properly close database

  return 0;

The code is straightforward and free of most database-specific details. More importantly it is statically checked by the compiler -- suppose we change the database schema and add one more field to money table (e.g. timestamp of transfer). Compilation rightfully fails:

demo_cxx.cpp: In function `int main()':
demo_cxx.cpp:29: error: no matching function for call to
`sqlgg<sqlite3_traits>::add_money(sqlite3*&, int64&, int64&, int)'
demo_cxx_gen.hpp:75: note: candidates are: static bool sqlgg<Traits>::add_money(typename
Traits::connection, const typename Traits::Int&, const typename Traits::Int&, const typename
Traits::Int&, const typename Traits::Int&) [with Traits = sqlite3_traits]


Distinguishing feature of sqlgg is that it starts off with actual SQL queries, not object models or SQL table descriptions. Ideally, it generates {meta,template,abstract,generic} code which is later specialized for the actual database/environment by the means of the target language. The generated code doesn't require any special runtime support and doesn't impose any restrictions on the application.

The main idea is that the generator should take care only of semantic binding between SQL and code sides, being as unobtrusive as possible. So the choice of the specific database and API is a programmer's choice. Similarly, queries to the database are expressed in plain SQL, so that the generator can be easily plugged in any existing project -- just move all SQL statements used in the code to a separate file and feed it to generator. The generated code can be easily inspected just as any other code in the project, and you can even edit it manually if such need arises (though it is highly not recommended). In order to keep code and database layer in sync the generator must be invoked every time SQL queries are modified.

For now the status of this project is works for me . It is evolving and there is plenty of room for improvement. The generated code and traits interfaces may change from time to time. I am interested in opinions on output code structure/naming best fitting the target language/platform.

This project was started when I found myself editing existing code with tons of C++ wrappers for SQL queries, each binding several parameters and decomposing results. I used it successfully for C++ desktop applications with sqlite3 engine and C#/Mono server-side component with mysql database.

Sqlgg understands the subset of the standard SQL language with arbitrary database-specific extensions. The goal is to cover as much SQL as possible in order to accept most of the real-world SQL statements unmodified.

Available output languages:


Latest release: sqlgg-0.3.0.tar.gz (sig), 14 Apr 2015. See changes history.

Get the source at github or repo.or.cz.

Install with OPAM.



ygrek at autistici dot org