https://clickhouse.com/presentations/meetup36/new_features/
CONSTRAINTs Prepared Statements Format ORC Format Template ORDER BY optimization WITH FILL text_log metric_log Query Profiler Globs for File/URL/HDFS RPM packages neighbor os_schedule_priority ALTER MODIFY SETTING SELECT FROM VALUES COLUMNS matcher LIVE VIEW Settings constraints Row Level Security Secondary Indices TTL expressions MySQL protocol Adaptive index granularity Advanced text search JSON functions Cascade MATERIALIZED VIEWs WITH ROLLUP/CUBE ASOF JOIN
Tiered Storage
— Robert Hodges already presented it.
campaigns_dictionary.xml:
<yandex>
<dictionary>
<name>campaigns</name>
<source>
<mysql>
<password>...</password>
<port>3306</port>
<user>metrika</user>
<replica>
<host>mtexample.metr.yandex.net</host>
<priority>1</priority>
</replica>
<db>dicts</db>
<table>campaign_names</table>
</mysql>
</source>
<lifetime>
<min>14400</min>
<max>18000</max>
</lifetime>
<layout>
<cache>
<size_in_cells>1048576</size_in_cells>
</cache>
</layout>
<structure>
<id>
<name>OrderID</name>
</id>
<attribute>
<name>cid</name>
<type>UInt32</type>
<null_value>0</null_value>
</attribute>
<attribute>
<name>name</name>
<type>String</type>
<null_value />
</attribute>
</structure>
</dictionary>
</yandex>
CREATE DICTIONARY db.dict
(
region_id Int64 DEFAULT -1,
name String DEFAULT 'unknown',
population UInt64
)
PRIMARY KEY region_id
SOURCE(MYSQL(
USER 'root' PASSWORD '...'
DB 'dictionaries' TABLE 'regions'
REPLICA(PRIORITY 1 HOST 'server1' PORT 3306)
REPLICA(PRIORITY 2 HOST 'server2' PORT 3306)))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 600);
SELECT dictGetString('db.dict', 'name', toInt64(213));
Created dictionaries are also available as tables to read.
SELECT * FROM db.dict;
— available since 19.17.
Developers: — Alexander Sapin, Maxim Sabyanin
— available since 19.16.
Developers — Alexey Basmanov, Anton Popov
— available since 19.15.
Developer — Azat Khuzhin
clickhouse-local \
-S 'WatchID UInt64, JavaEnable UInt8, Title String, ...' \
--query "SELECT count() FROM table" < hits.tsv
Was: 46.2 sec., 192 096 rows/sec., 168 MB/sec.
Now: 4.5 sec., 1 935 419 rows/sec., 1.7 GB/sec.
Works for TSV, CSV, JSONEachRow.
Works in clickhouse-client, clickhouse-local,
HTTP interface; File, URL, HDFS, S3 tables.
Enabled with input_format_parallel_parsing setting.
And it's already enabled by default!
— available since 20.1.
Developers — Nikita Mikhailov, Oleg Ershov
For tables with File, URL, HDFS, S3 engines:
SELECT * FROM url('https://site.ru/data.gz',
TSV, 'phrase String, counter_id UInt32, ...')
INSERT INTO TABLE FUNCTION url(
'https://site.ru/data.gz',
TSV, 'phrase String, counter_id UInt32, ...')
Autodetect by file extension or explicit: gz/gzip, br/brotli, none, auto:
SELECT * FROM url('https://site.ru/data', TSV,
'phrase String, counter_id UInt32, ...', 'gzip')
Developer — Andrey Bodrov
Processing ~TB of data from GHArchive.
CREATE TABLE test.null (x UInt64, d DateTime) ENGINE = Null
clickhouse-client --query "INSERT INTO test.null VALUES" <...
Inserting 1 mln. records like (1, '2020-02-04 00:00:00'):
— 0.108 sec.
Inserting 1 mln. records like (1, now()):
— 40.4 sec. — 400 times slower.
Inserting 1 mln. records like (1, now()) with enabled setting
input_format_values_deduce_templates_of_expressions:
— 0.112 sec. — almost without performance regression.
— available since 19.16.
Developer — Alexander Tokmakov
[11825667, "закрепление журные людьми и"]
[7588965, "выпискусстановке фото 3д фирма"]
[30622428, "губка перерьер ванновгород главный"]
[15009727, "uhfvvf yfbck.n gtycb. ltym hj;ltyb 3"]
[31235604, "порно друзья дачные коллетение спит 66"]
[1036200, "oomph"]
[22446879, "ионистранны в туре 17 март 2014 смотреть"]
[792887, "пп оптики билей ремиум новый фасонвар"]
[315961, "рангику"]
[1151353, "дом-2 последние смотреть онлайн фигур"]
— available since 20.1.
Developer — Mikhail Korotov
Controlled by the settings:
format_custom_escaping_rule
format_custom_field_delimiter
format_custom_row_before_delimiter
format_custom_row_after_delimiter
format_custom_row_between_delimiter
format_custom_result_before_delimiter
format_custom_result_after_delimiter
Allow to define a template for data formatting/parsing.
A template contains substitutions and delimieters.
Each substitution specifies data escaping rule:
Quoted, Escaped, CSV, JSON, XML, Raw.
Website ${domain:Quoted} has ${count:Raw} pageviews.
You can specify a template for rows, a delimiter between rows
and a template to wrap resultset.
Example: to parse web access logs.
Example: to parse deeply nested JSON.
Example: generate HTML right in ClickHouse.
input_format_csv_unquoted_null_literal_as_null
— parse unquoted NULL in CSV as NULL (otherwise you have to \N).
input_format_null_as_default
— parse NULL as default value for non-Nullable columns.
input_format_tsv_empty_as_default
— parse empty values in TSV as default values.
input_format_with_names_use_header
— use headers in TSV and CSV to determine column order;
— fill missing columns with default values.
input_format_defaults_for_omitted_fields
— calculate complex default values for JSONEachRow, TSV, CSV.
Apache Avro — row-oriented schemaful binary format with embedded schema.
When data is exported in Avro format, the schema is derived from the table definition.
When data is imported in Avro format, we read it according to the schema and convert types.
Developer — Andrew Onyshchuk
— available since 20.2.
Schema is not embedded directly but loaded from schema registry.
The format is designed specifically to use for Kafka messages.
Do you remember that we also support Parquet, ORC, Protobuf?
Table function s3 and table engine S3:
SELECT * FROM s3(...)
INSERT INTO TABLE FUNCTION s3(...)
CREATE TABLE table ENGINE = S3(...)
Supports wildcards (globs) to read multiple files.
Supports _file and _path virtual columns.
Supports authentication, compressed files...
— full support in 20.2.
Developer — Vladimir Chebotarev, Olga Khvostikova and Pavel Kovalenko
Allows to execute JOIN even when right hand doesn't fit in memory.
SET partial_merge_join = 1;
Developer — Artem Zuikov
greatCircleDistance
— improved performance 3 times
with the code from Andrey Aksenov;
greatCircleAngle
— the distance on sphere in degrees;
— useful for astronomical data;
geoDistance
— calculates the distance on WGS-84 ellipsoid
(under certain assumptions);
— available since 20.1.
DateTime with arbitrary subsecond precision.
It's like DECIMAL for DateTime.
Developer — Vasily Nemkov
To compare different hardware and cloud providers.
(demo)
arraySplit multiMatchAllIndices multiFuzzyMatchAllIndices arrayCompact isValidJSON arrayFill arrayReverseFill JSONExtractArrayRaw FQDN
char repeat
Handler for Prometheus.
Aggregate function combiners: -OrNull and -OrDefault.
Bloom filter secondary index for arrays.
Virtual column _shard_num.
Skipping unknown settings in distributed queries.
Allow using FINAL and SAMPLE simultaneously.
Optimization of uniqCombined; uniqCombined64.
Instant count() in trivial queries.
RBAC
Polymorphic Data Parts
Lock-free ALTERs
Custom HTTP handlers
Query cache for IN subqueries
VFS for storage/compute decoupling
Pluggable authentication and access control
Adaptive merge JOIN algorithm
Workload management
https://clickhouse.com/docs/ru/extended_roadmap/
~ 500 tasks with detailed description — status, dependencies, assignees...
~ 50 are already done :)
Web site: https://clickhouse.com/
Maillist: [email protected]
YouTube: https://www.youtube.com/c/ClickHouseDB
Telegram chat: https://telegram.me/clickhouse_ru, clickhouse_en
GitHub: https://github.com/ClickHouse/ClickHouse/
Twitter: https://twitter.com/ClickHouseDB
Google groups: https://groups.google.com/forum/#!forum/clickhouse