2. Java SQL Transpiler Library

Sonatype Nexus (Snapshots) javadoc Gradle CI/QA Quality Coverage license Issues PRs Welcome

Transpile, Resolve, Lineage – A Database independent, stand-alone SQL Transpiler, Column Resolver and Lineage Tracer written in pure Java, translating various large RDBMS SQL Dialects into a few smaller RDBMS Dialects for Unit Testing.

JSQLTranspiler allows you to develop and test your Big Data SQL at no cost on a local DuckDB instance before deploying and running it in the cloud.

Internal Functions will be rewritten based on the actual meaning and purpose of the function, respecting different function arguments’ count, order and type. Rewrite of Window- and Aggregate-Functions as well.

Download

Java 11 Binaries

Flavor

File

Size

Java Library Stable

JSQLTranspiler-0.6.jar

(80 kb)

Java Library Snapshot

JSQLTranspiler-0.7-SNAPSHOT.jar

(80 kb)

CLI Fat JAR Stable

JSQLTranspilerCLI-0.4-all.jar

(1.3 MB)

CLI Fat JAR Snapshot

JSQLTranspilerCLI-0.5-SNAPSHOT-all.jar

(1.3 MB)

Like us on the JSQLTranspiler Git Repository

Cloning the repository
git clone https://github.com/starlake-ai/JSQLTranspiler.git
cd JSQLTranspiler
./gradlew build
<dependency>
    <groupId>com.starlake-ai.jsqltranspiler</groupId>
    <artifactId>jsqltranspiler</artifactId>
    <version>0.6</version>
</dependency>
<repositories>
    <repository>
        <id>jsqltranspiler-snapshots</id>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
        <url>https://s01.oss.sonatype.org/content/repositories/snapshots/</url>
    </repository>
</repositories>
<dependency>
    <groupId>com.starlake-ai.jsqltranspiler</groupId>
    <artifactId>jsqltranspiler</artifactId>
    <version>0.7-SNAPSHOT</version>
</dependency>
repositories {
    mavenCentral()
}

dependencies {
    implementation 'com.starlake-ai.jsqltranspiler:jsqltranspiler:0.6'
}
repositories {
    maven {
        url = uri('https://s01.oss.sonatype.org/content/repositories/snapshots/')
    }
}

dependencies {
    implementation 'com.starlake-ai.jsqltranspiler:jsqltranspiler:0.7-SNAPSHOT'
}

Examples

Rewrite Google BigQuery to DuckDB and execute
-- Google BigQuery
SELECT
    DATE(2016, 12, 25) AS date_ymd,
    DATE(DATETIME '2016-12-25 23:59:59') AS date_dt,
    DATE(TIMESTAMP '2016-12-25 05:30:00+07', 'America/Los_Angeles') AS date_tstz;

-- Rewritten DuckDB compliant statement
SELECT
    MAKE_DATE(2016, 12, 25) AS date_ymd,
        CAST(DATETIME '2016-12-25 23:59:59' AS DATE) AS date_dt,
        CAST(TIMESTAMP '2016-12-25 05:30:00+07' AT TIME ZONE 'America/Los_Angeles' AS DATE) AS date_tstz;

-- Same Tally
1

-- Same Result
"date_ymd","date_dt","date_tstz"
"2016-12-15","2016-12-15","2016-12-15"
Resolve the Star Operator of a deeply nested Query with disguised physical columns
/* Schema:
Table a: Columns col1, col2, col3, colAA, colBA
Table b: Columns col1, col2, col3, colBA, colBB
*/

-- provided SELECT with STAR Operators
SELECT *
FROM (  (   SELECT *
            FROM b ) c
            INNER JOIN a
                ON c.col1 = a.col1 ) d
;

-- Resolved Columns via JSQLColumnResolver.rewrite(...)
-- Without needing an actual database connection
SELECT  d.col1
        , d.col2
        , d.col3
        , d.colBA
        , d.colBB
        , d.col1_1
        , d.col2_1
        , d.col3_1
        , d.colAA
        , d.colAB
FROM (  (   SELECT  b.col1
                    , b.col2
                    , b.col3
                    , b.colBA
                    , b.colBB
            FROM b ) c
            INNER JOIN a
                ON c.col1 = a.col1 ) d
;
Trace the actual physical columns
/* Schema:
Table a: Columns col1, col2, col3, colAA, colBA
Table b: Columns col1, col2, col3, colBA, colBB
*/

-- provided SELECT with STAR Operator
-- Without needing an actual database connection
SELECT  Sum( colBA + colBB ) AS total
        , ( SELECT col1 AS test
            FROM b ) col2
        , CURRENT_TIMESTAMP() AS col3
FROM a
    INNER JOIN (    SELECT *
                    FROM b ) c
        ON a.col1 = c.col1
;
Trace of the physical tables and columns
<?xml version="1.0" encoding="UTF-8"?>
<ColumnSet>
    <Column alias='total' name='Sum'>
        <ColumnSet>
            <Column name='Addition'>
                <ColumnSet>

                    <!-- scope points on the actual physical column b.colBA -->
                    <Column name='colBA' table='c' scope='b.colBA' dataType='java.sql.Types.OTHER' typeName='Other' columnSize='0' decimalDigits='0' nullable=''/>

                    <!-- scope points on the actual physical column b.colBB -->
                    <Column name='colBB' table='c' scope='b.colBB' dataType='java.sql.Types.OTHER' typeName='Other' columnSize='0' decimalDigits='0' nullable=''/>

                </ColumnSet>
            </Column>
        </ColumnSet>
    </Column>
    <Column alias='col2' name='col1'>
        <ColumnSet>
            <Column alias='test' name='col1' table='b' dataType='java.sql.Types.OTHER' typeName='Other' columnSize='0' decimalDigits='0' nullable=''/>
        </ColumnSet>
    </Column>
    <Column alias='col3' name='CURRENT_TIMESTAMP'/>
</ColumnSet>
Retrieve list of actual physical tables
String sqlStr = "SELECT * FROM (SELECT * FROM A) AS A \n" +
        "JOIN B ON A.a = B.a \n" +
        "JOIN C ON A.a = C.a;";
Set<String> tables = TablesNamesFinder.findTablesOrOtherSources(sqlStr);
assertThat(tables).containsExactlyInAnyOrder("A", "B", "C");

tables = TablesNamesFinder.findTables(sqlStr);
assertThat(tables).containsExactlyInAnyOrder("B", "C");

SQL Dialects

JSQLTranspiler currently understands the following Big RDBMS dialects:

  • Google BigQuery

  • Databricks

  • Snowflake

  • Amazon Redshift

and rewrites into to the following small RDBMS dialects:

  • DuckDB

  • planned: H2

  • planned: Postgres

Features

  • Comprehensive support for Query and DML statements (INSERT, DELETE, UPDATE, MERGE)

  • RDBMS specific Functions, Predicates and Operators

  • RDBMS specific Date and Number formatting parameters

  • Extensive ARRAY, ROW and STRUCT support

  • Deeply Nested Expressions such as correlated Sub-Selects, CTE’s and WITH clauses

  • Explicit and Implicit Cast expressions, e. g. DATE '2023-12-31', '2023-12-31'::Date and Cast('2023-12-31' AS Date)

  • SQL Named and Ordinal Parameters: ?, ?1 or :parameter

  • Lateral Table and Sub-Select Functions, e. g. UNNEST(), TABLE()

  • Window and Aggregate Functions

  • Columns Resolution for EXCEPT and REPLACE filters as well as for USING joins (left or right)