Step 1. A sharded, replicated fact table
CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER `{cluster}` (
`Year` UInt16,
`Quarter` UInt8,
...
)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{shard}/{database}/ontime_shard',
'{replica}')
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (FlightDate, `Year`, `Month`, DepDel15)
What does ON CLUSTER do?
ON CLUSTER executes a command over a set of nodes
Step 2. A distributed table to find data
CREATE TABLE IF NOT EXISTS ontime ON CLUSTER '{cluster}'
AS ontime_shard
ENGINE = Distributed(
'{cluster}', currentDatabase(), ontime_shard, rand())
Step 3. A fully replicated dimension table
CREATE TABLE IF NOT EXISTS airports ON CLUSTER 'all-replicated' (
AirportID String,
Name String,
...
)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/all/{database}/airports',
'{replica}')
PARTITION BY tuple()
PRIMARY KEY AirportID
ORDER BY AirportID