SQLでWITH句を使ってテーブルにデータをINSERTせずにqueryを確かめたい。あるいはSQLの復習。
主に手抜きのため。bigqueryのドキュメントの実行例で使われていてなるほどなーと思ったのでメモ。 ついでに色々なJOINの利用例を列挙してみる。
WITH句を使って仮想的なtableと見做す
WITH句自体は概ねどのような処理系でもサポートしているみたい。これを使うのが今回の肝。
- bigquery https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax?hl=ja#with_clause
- sqlite https://www.sqlite.org/lang_with.html
- postgresql https://www.postgresql.jp/document/9.0/html/queries-with.html
- mysql https://dev.mysql.com/doc/refman/8.0/en/with.html
例えば、以下はbigqueryでのコードの実行例。データとqueryが全て1つの文に入っているので状況がわかりやすい。
https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions?hl=ja#extract
WITH Timestamps AS ( SELECT TIMESTAMP("2005-01-03 12:34:56+00") AS timestamp_value UNION ALL SELECT TIMESTAMP("2007-12-31 12:00:00+00") UNION ALL SELECT TIMESTAMP("2009-01-01 12:00:00+00") UNION ALL SELECT TIMESTAMP("2009-12-31 12:00:00+00") UNION ALL SELECT TIMESTAMP("2017-01-02 12:00:00+00") UNION ALL SELECT TIMESTAMP("2017-05-26 12:00:00+00") ) SELECT timestamp_value, EXTRACT(ISOYEAR FROM timestamp_value) AS isoyear, EXTRACT(ISOWEEK FROM timestamp_value) AS isoweek, EXTRACT(YEAR FROM timestamp_value) AS year, EXTRACT(WEEK FROM timestamp_value) AS week FROM Timestamps ORDER BY timestamp_value; -- Results may differ, depending upon the environment and time zone where this query was executed. +-------------------------+---------+---------+------+------+ | timestamp_value | isoyear | isoweek | year | week | +-------------------------+---------+---------+------+------+ | 2005-01-03 12:34:56 UTC | 2005 | 1 | 2005 | 1 | | 2007-12-31 12:00:00 UTC | 2008 | 1 | 2007 | 52 | | 2009-01-01 12:00:00 UTC | 2009 | 1 | 2009 | 0 | | 2009-12-31 12:00:00 UTC | 2009 | 53 | 2009 | 52 | | 2017-01-02 12:00:00 UTC | 2017 | 1 | 2017 | 1 | | 2017-05-26 12:00:00 UTC | 2017 | 21 | 2017 | 21 | +-------------------------+---------+---------+------+------+
(以降は諸々の関係上sqliteでの実行結果になっている)
通常のCTEの使い方
ちなみに普通はどうやって使うかと言えば、RECURSIVE付きで再帰的に問い合わせたりして使う。
WITH RECURSIVE f(n, name) AS ( SELECT 1 as n, 'f' as name UNION ALL SELECT n+1 as n, 'f'|| substr('oooooooooooo', 1, n) as name FROM f LIMIT 10 ) SELECT * from f
    n = 1
 name = f
    n = 2
 name = fo
    n = 3
 name = foo
    n = 4
 name = fooo
    n = 5
 name = foooo
これらを悪用して仮想的にテーブルがあるかのようにSELECT文を作りqueryの練習をすることができる。
queryの実行例
team,userみたいなテーブルがあるとする。テキトーにjoinを試してみることにしよう。
joinなし
WITH teams AS ( SELECT 1 as id, 'x' as name UNION ALL SELECT 2, 'y' UNION ALL SELECT 3, 'z' ) SELECT id, name FROM teams
普通のquery。withを除けば既存のSELECT文の例に近い。
id = 1 name = x id = 2 name = y id = 3 name = z
INNER JOIN
WITH teams AS ( SELECT 1 as id, 'x' as name UNION ALL SELECT 2, 'y' UNION ALL SELECT 3, 'z' ), users as ( SELECT 10 as id, 'foo' as name, 1 as team_id UNION ALL SELECT 20, 'bar', 1 UNION ALL SELECT 30, 'boo', 3 ) SELECT t.id as team_id, t.name as team_name, u.id as user_id, u.name as user_name FROM teams as t INNER JOIN users as u ON t.id = u.team_id
よくある例。
team_id = 1 team_name = x user_id = 10 user_name = foo team_id = 1 team_name = x user_id = 2 user_name = bar team_id = 3 team_name = z user_id = 3 user_name = boo
teamごとのuser数なんかを数えたい場合。
WITH teams AS ( SELECT 1 as id, 'x' as name UNION ALL SELECT 2, 'y' UNION ALL SELECT 3, 'z' ), users as ( SELECT 10 as id, 'foo' as name, 1 as team_id UNION ALL SELECT 20, 'bar', 1 UNION ALL SELECT 30, 'boo', 3 ) SELECT t.id as team_id, t.name as team_name, count(*) as c FROM teams as t INNER JOIN users as u ON t.id = u.team_id GROUP BY t.id
  team_id = 1
team_name = x
        c = 2
  team_id = 3
team_name = z
        c = 1
LEFT OUTER JOIN
userを持たないteamの分も数えたいのでouter joinを使う。対応する行が存在しない側は値がNULLになるのでcountではなくsumで数える。
WITH teams AS ( SELECT 1 as id, 'x' as name UNION ALL SELECT 2, 'y' UNION ALL SELECT 3, 'z' ), users as ( SELECT 10 as id, 'foo' as name, 1 as team_id UNION ALL SELECT 20, 'bar', 1 UNION ALL SELECT 30, 'boo', 3 ) SELECT t.id as team_id, t.name as team_name, sum(CASE WHEN u.id is NULL then 0 ELSE 1 END) as c FROM teams as t LEFT OUTER JOIN users as u ON t.id = u.team_id GROUP BY t.id
  team_id = 1
team_name = x
        c = 2
  team_id = 2
team_name = y
        c = 0
  team_id = 3
team_name = z
        c = 1
many to many
多対多の関係の場合には中間テーブルが必要になる。
WITH teams AS ( SELECT 1 as id, 'x' as name UNION ALL SELECT 2, 'y' UNION ALL SELECT 3, 'z' ), users as ( SELECT 10 as id, 'foo' as name -- x,yに所属 UNION ALL SELECT 20, 'bar' -- x,zに所属 UNION ALL SELECT 30, 'boo' -- xに所属 ), teams2users as ( SELECT 1 as team_id, 10 as user_id UNION ALL SELECT 1, 20 UNION ALL SELECT 1, 30 UNION ALL SELECT 2, 10 UNION ALL SELECT 3, 20 ) SELECT t.id as team_id, t.name as team_name, u.id as user_id, u.name as user_name FROM teams as t INNER JOIN users as u INNER JOIN teams2users as xref ON t.id = xref.team_id AND xref.user_id = u.id
team_id = 1 team_name = x user_id = 10 user_name = foo team_id = 1 team_name = x user_id = 20 user_name = bar team_id = 1 team_name = x user_id = 30 user_name = boo team_id = 2 team_name = y user_id = 10 user_name = foo team_id = 3 team_name = z user_id = 20 user_name = bar
CROSS JOIN
tableの行数とqueryの行数が乖離したようなsummary的なものを1発で作りたい時。 例えば、teamに対する所属期間のような列があるとして、それを年次で集計したい場合など
そういう実用的な例の前にトリビアルな例を。基本的には直積なので3行のものと3行のものを組み合わせたら9行のものができる。
WITH ids AS ( (SELECT 1 as id UNION ALL SELECT 2 UNION ALL 3) ), ys AS ( (SELECT 'foo' as name UNION ALL SELECT 'bar' UNION ALL 'boo') ) SELECT * FROM ids CROSS JOIN ys
はい。
1 foo 1 bar 1 boo 2 foo 2 bar 2 boo 3 foo 3 bar 3 boo
これを使って以下をやってみる。
例えば、teamに対する所属期間のような列があるとして、それを年次で集計したい場合など
WITH teams AS ( SELECT 1 as id, 'x' as name UNION ALL SELECT 2, 'y' UNION ALL SELECT 3, 'z' ), users as ( SELECT 10 as id, 'foo' as name -- x,yに所属 UNION ALL SELECT 20, 'bar' -- x,zに所属 UNION ALL SELECT 30, 'boo' -- xに所属 ), teams2users as ( SELECT 1 as team_id, 10 as user_id, '2019' as start, '2020' as `end` UNION ALL SELECT 1, 20, '2020', '2020' UNION ALL SELECT 1, 30, '2020', NULL UNION ALL SELECT 2, 10, '2020', NULL UNION ALL SELECT 3, 20, '2021', '2021' ), years as ( SELECT '2019' as year UNION ALL SELECT '2020' UNION ALL SELECT '2021' UNION ALL SELECT '2022' ) SELECT y.year as year, t.name as team_name, sum(CASE WHEN u.id is NULL then 0 ELSE 1 END) as c, group_concat(u.name) as names FROM teams as t INNER JOIN teams2users as xref ON t.id = xref.team_id LEFT OUTER JOIN users as u ON xref.user_id = u.id CROSS JOIN years as y ON xref.start <= y.year AND (xref.`end` IS NULL OR xref.`end` >= y.year) GROUP BY y.year, t.id, t.name
     year = 2019
team_name = x
        c = 1
    names = foo
     year = 2020
team_name = x
        c = 3
    names = foo,bar,boo
     year = 2020
team_name = y
        c = 1
    names = foo
     year = 2021
team_name = x
        c = 1
    names = boo
     year = 2021
team_name = y
        c = 1
    names = foo
     year = 2021
team_name = z
        c = 1
    names = bar
     year = 2022
team_name = x
        c = 1
    names = boo
     year = 2022
team_name = y
        c = 1
    names = foo
misc
DISTINCT, having, ORDER BY, window関数とかは省略。
gist
追記: (bigqueryならではの別解)
後日、TobakuCptlsmさんにtwitterでbigqueryでArrayを使って表現する方法を教えてもらいました。こちらの方が他のプログラミング言語に似たような形式なので直感的かもしれません。
WITH points AS ( select * from UNNEST(ARRAY<STRUCT<id int, p STRUCT<x int64, y int64>>>[ (1, (0, 10)), (2, (0, 10)), (2, (0, 10)) ]) ) SELECT * FROM points
こちらのgist https://gist.github.com/TobCap/83d6d874f40e265b45cdb0c992317e00