New features of ClickHouse

New features of ClickHouse

A random selection
of features that I remember

CONSTRAINTs for INSERT queries

CREATE TABLE hits ( URL String, Domain String, CONSTRAINT c_valid_url CHECK isValidUTF8(URL), CONSTRAINT c_domain CHECK Domain = domain(URL) )

Checked on INSERT.

Developer — Gleb Novikov, HSE Moscow

CONSTRAINTs for INSERT queries

INSERT INTO hits VALUES ('https://yandex.ru/', 'google.com') Exception: Constraint `c_domain` for table `hits` is violated at row 1. Expression: (Domain = domain(URL)). Column values: URL = 'https://yandex.ru/', Domain = 'google.com'.

Don't checked on background merges.

Don't checked on ALTER UPDATE.

CONSTRAINTs for INSERT queries

ALTER TABLE hits ADD CONSTRAINT c_valid_url CHECK isValidUTF8(URL)

Existing data is not checked when CONSTRAINT is added.

ALTER TABLE hits DROP CONSTRAINT c_valid_url

Queries With Parameters

SELECT count() FROM test.hits WHERE CounterID = {id:UInt32} AND SearchPhrase = {phrase:String}

— substitutions have names and types;
— substituted on AST level.

TODO:
— substitutions for sets in IN operator;
— substitutions for table, column, database names.
— support for custom server URLs with predefined queries.

Developer — Alexander Tretyakov, HSE Moscow

Queries With Parameters

$ curl "http://localhost:8123/" --data-binary \ "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" Exception: Query parameter `id` was not set. $ curl "http://localhost:8123/?param_id=34" --data-binary \ "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" 1962 $ curl "http://localhost:8123/?param_id=34;%20DROP%20DATABASE%20test" ... "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" Exception: Value 34; DROP DATABASE test cannot be parsed as UInt32 for query parameter 'id' $ curl "http://localhost:8123/?param_id=34;%20DROP%20DATABASE%20test" ... "SELECT count() FROM test.hits WHERE CounterID = {id:String}" Exception: Illegal types of arguments (UInt32, String) of function equals

Data Gaps Filling

SELECT EventDate, count() FROM table GROUP BY EventDate ORDER BY EventDate ┌──EventDate─┬─count()─┐ │ 2019-09-01 │ 5 │ │ 2019-09-02 │ 3 │ │ 2019-09-04 │ 4 │ │ 2019-09-05 │ 1 │ └────────────┴─────────┘

Data Gaps Filling

SELECT EventDate, count() FROM table GROUP BY EventDate ORDER BY EventDate WITH FILL ┌──EventDate─┬─count()─┐ │ 2019-09-01 │ 5 │ │ 2019-09-02 │ 3 │ │ 2019-09-03 │ 0 │ │ 2019-09-04 │ 4 │ │ 2019-09-05 │ 1 │ └────────────┴─────────┘

Data Gaps Filling

WITH FILL — a modifier for ORDER BY element;

WITH FILL FROM start

WITH FILL FROM start TO end

WITH FILL FROM start TO end STEP step

WITH FILL can be applied for any elements in ORDER BY:

ORDER BY EventDate WITH FILL, EventTime WITH FILL STEP 3600

— available in version 19.14.

Developers — Anton Popov, Yandex; Dmitri Utkin, HSE Moscow

TTL expressions

— for columns:

CREATE TABLE t ( date Date, ClientIP UInt32 TTL date + INTERVAL 3 MONTH

— for all table data:

CREATE TABLE t (date Date, ...) ENGINE = MergeTree ORDER BY ... TTL date + INTERVAL 3 MONTH

Tiered Storage

Example: store hot data on SSD and archive data on HDDs.

Multiple storage policies can be configured and used on per-table basis.

Tiered Storage

Step 1: configure available disks (storage paths):

<disks> <fast_disk> <!-- disk name --> <path>/mnt/fast_ssd/clickhouse</path> </fast_disk> <disk1> <path>/mnt/hdd1/clickhouse</path> <keep_free_space_bytes>10485760 </keep_free_space_bytes> </disk1> ...

Tiered Storage

Step 2: configure storage policies:

<policies> <ssd_and_hdd> <volumes> <hot> <disk>fast_ssd</disk> <max_data_part_size_bytes>1073741824 </max_data_part_size_bytes> </hot> <cold> <disk>disk1</disk> </cold> <move_factor>0.2</move_factor> </volumes> </ssd_and_hdd> ...

Tiered Storage

Step 3: use the configured policy for your table:

CREATE TABLE table ( ... ) ENGINE = MergeTree ORDER BY ... SETTINGS storage_policy = 'ssd_and_hdd'

Tiered Storage

The data will be moved between volumes automatically.

You can also do it manually:

ALTER TABLE table MOVE PART|PARTITION ... TO VOLUME|DISK ...

— available in 19.15.

New Formats

Protobuf

— efficient implementation, no excessive copies/allocations
(ClickHouse style);

— transparent type conversions between Proto's and ClickHouse types (UInt8, Int64, DateTime <-> sint64, uint64, sint32, uint32, String <-> bytes, string, etc.);

— support for Nested types via repeated Messages or parallel repeated fields;

format_schema setting must be specified.

New Formats

Parquet

— columnar format; naturally implemented without unpacking of columns;

— transparent type conversions also supported.

ORC

— since 19.14 (input only).

Template Format

Allow to define a template for data formatting/parsing.

A template contains substitutions and delimieters.

Each substitution specifies data escaping rule:
Quoted, Escaped, CSV, JSON, XML, Raw.

Website ${domain:Quoted} has ${count:Raw} pageviews.

You can specify a template for rows, a delimiter between rows
and a template to wrap resultset.

Example: to parse web access logs.
Example: to parse deeply nested JSON.
Example: generate HTML right in ClickHouse.

Table Functions

— url;

— file;

— cluster;

— mysql;

— odbc;

— hdfs;

— input (since 19.15).

Table Function 'input'

For data transformation at INSERT time:

INSERT INTO table SELECT *, domain(URL) FROM input(TSV, 'URL String, ...')

You pipe your data to this query as with usual INSERT.

Data is transformed on the fly by SELECT expression.

Examples:
— calculate data for columns;
— skip unneeded columns;
— do filtering, aggregations, joins.

— available in 19.15 testing.

Optimization of queries with ORDER BY

with respect to table sorting key

SELECT event_time, message FROM system.text_log ORDER BY event_date DESC, event_time DESC LIMIT 10 ┌──────────event_time─┬─message───────────────────────────────────────────┐ │ 2019-09-05 04:20:35 │ Renaming temporary part tmp_insert_201909_10_10_0 │ │ 2019-09-05 04:20:35 │ Flushing system log │ │ 2019-09-05 04:20:34 │ Flushing system log │ │ 2019-09-05 04:20:34 │ Renaming temporary part tmp_insert_201909_10_10_0 │ │ 2019-09-05 04:20:28 │ Renaming temporary part tmp_insert_201909_9_9_0 to│ │ 2019-09-05 04:20:28 │ Flushing system log │ │ 2019-09-05 04:20:27 │ Renaming temporary part tmp_insert_201909_9_9_0 to│ ... └─────────────────────┴───────────────────────────────────────────────────┘ 10 rows in set. Elapsed: 0.291 sec. Processed 42.21 million rows, 6.10 GB (145.13 million rows/s., 20.98 GB/s.)

Optimization of queries with ORDER BY

with respect to table sorting key

:) SET optimize_read_in_order = 1 :) SELECT event_time, message FROM system.text_log ORDER BY event_date DESC, event_time DESC LIMIT 10 ┌──────────event_time─┬─message───────────────────────────────────────────┐ │ 2019-09-05 04:28:59 │ Merged 6 parts: from 201909_1_96_19 to 201909_101_│ │ 2019-09-05 04:28:59 │ Renaming temporary part tmp_merge_201909_1_101_20 │ │ 2019-09-05 04:28:59 │ Merge sorted 761 rows, containing 202 columns (202│ │ 2019-09-05 04:28:59 │ Reading 2 marks from part 201909_101_101_0, total │ │ 2019-09-05 04:28:59 │ Reading 2 marks from part 201909_100_100_0, total │ ... └─────────────────────┴───────────────────────────────────────────────────┘ 10 rows in set. Elapsed: 0.012 sec. Processed 54.41 thousand rows, 7.42 MB (4.68 million rows/s., 637.76 MB/s.)

Optimization of queries with ORDER BY

with respect to table sorting key

— ORDER BY table sorting key;

— ASC / DESC / mixed:

ORDER BY event_date DESC, event_time ASC

— or by the prefix:

ORDER BY event_date DESC

— of by expressions with monotonic functions of them;

ORDER BY event_date DESC, -toUInt32(event_time)

— available from version 19.14;

Developers — Anton Popov, Yandex; Anastasiya Rodigina, HSE Moscow

Server Logs for Introspection

in system tables:

— system.query_log;

— system.query_thread_log;

— system.part_log;

— system.trace_log;

— system.text_log;

— system.metric_log;

Server Logs for Introspection

system.text_log

Now we write ClickHouse logs into ClickHouse!

DESCRIBE TABLE system.text_log ┌─name──────────┬─type───────────────────┐ │ event_date │ Date │ │ event_time │ DateTime │ │ microseconds │ UInt32 │ │ thread_name │ LowCardinality(String) │ │ thread_number │ UInt32 │ │ os_thread_id │ UInt32 │ │ level │ Enum8('Fatal' = 1, '...│ │ query_id │ String │ │ logger_name │ LowCardinality(String) │ │ message │ String │ │ revision │ UInt32 │ │ source_file │ LowCardinality(String) │ │ source_line │ UInt64 │ └───────────────┴────────────────────────┘

Server Logs for Introspection

system.metric_log

— for those who forgot to setup monitoring.

— record all the ClickHouse metrics each second (by default).

SELECT toStartOfMinute(event_time) AS h, sum(ProfileEvent_UserTimeMicroseconds) AS user_time, bar(user_time, 0, 60000000, 80) AS bar FROM system.metric_log WHERE event_date = today() GROUP BY h ORDER BY h

Server Logs for Introspection

SELECT toStartOfMinute(event_time) AS h, sum(ProfileEvent_UserTimeMicroseconds) AS user_time, bar(user_time, 0, 60000000, 80) AS bar FROM system.metric_log WHERE event_date = today() GROUP BY h ORDER BY h ┌───────────────────h─┬─user_time─┬─bar───────────────────────────────────────────────┐ │ 2019-09-05 04:12:00 │ 0 │ │ │ 2019-09-05 04:13:00 │ 0 │ │ │ 2019-09-05 04:14:00 │ 524000 │ ▋ │ │ 2019-09-05 04:15:00 │ 15880000 │ █████████████████████▏ │ │ 2019-09-05 04:19:00 │ 36724000 │ ████████████████████████████████████████████████▊ │ │ 2019-09-05 04:20:00 │ 17508000 │ ███████████████████████▎ │ │ 2019-09-05 04:21:00 │ 0 │ │ │ 2019-09-05 04:22:00 │ 0 │ │ │ 2019-09-05 04:23:00 │ 0 │ │ │ 2019-09-05 04:24:00 │ 0 │ │ │ 2019-09-05 04:25:00 │ 0 │ │ │ 2019-09-05 04:26:00 │ 0 │ │ │ 2019-09-05 04:27:00 │ 0 │ │ │ 2019-09-05 04:28:00 │ 0 │ │ │ 2019-09-05 04:29:00 │ 80000 │ │ │ 2019-09-05 04:30:00 │ 0 │ │ │ 2019-09-05 04:31:00 │ 0 │ │

Sampling Profiler on a Query Level

Record code locations where the query was executing
in every execution thread, at each moment of time with some period.

If the query is slow — where exactly (in code) it was slow?

— where the specific query spent time?

— where the time was spent for queries of some kind?

— where the time was spent for queries of some user?

— where the time was spent for queries, cluster wide?

Developer — Nikita Lapkov, HSE Moscow; et all.

Sampling Profiler on a Query Level

1. Turn on one of the following settings (or both):

SET query_profiler_cpu_time_period_ns = 1000000; SET query_profiler_real_time_period_ns = 1000000;

2. Run your queries.
Recorded samples will be saved into system.trace_log table.

event_date: 2019-09-05 event_time: 2019-09-05 05:47:44 revision: 54425 timer_type: CPU thread_number: 149 query_id: b1d8e7f9-48d8-4cb3-a768-0a6683f6f061 trace: [140171472847748,61781958,110943821,117594728,117595220,115654933, 120321783,63251928,111161800,120329436,120331356,120308294,120313436,120319113, 120143313,115666412,120146905,111013972,118237176,111013972,117990912,111013972, 110986070,110986938,61896391,61897898,61887509,156206624,140171472807643]

Sampling Profiler on a Query Level

trace — an array of addresses in machine code (stack trace);

Translate address to function name:
— demangle(addressToSymbol(trace[1]))
Translate address to source file name and line number:
— addressToLine(trace[1])

* don't forget to install clickhouse-common-static-dbg package

Example: functions top:

SELECT count(), demangle(addressToSymbol(trace[1] AS addr)) AS symbol FROM system.trace_log WHERE event_date = today() GROUP BY symbol ORDER BY count() DESC LIMIT 10

Sampling Profiler on a Query Level

Пример: топ функций:

┌─count()─┬─symbol──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ 517 │ void LZ4::(anonymous namespace)::decompressImpl<32ul, false>(char const*, char*, unsigned long) │ │ 480 │ void DB::deserializeBinarySSE2<4>(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::ReadBuffer&, unsigned long) │ │ 457 │ DB::VolnitskyBase<true, true, DB::StringSearcher<true, true> >::search(unsigned char const*, unsigned long) const │ │ 270 │ read │ │ 163 │ void LZ4::(anonymous namespace)::decompressImpl<16ul, true>(char const*, char*, unsigned long) │ │ 130 │ void LZ4::(anonymous namespace)::decompressImpl<16ul, false>(char const*, char*, unsigned long) │ │ 58 │ CityHash_v1_0_2::CityHash128WithSeed(char const*, unsigned long, std::pair<unsigned long, unsigned long>) │ │ 44 │ void DB::deserializeBinarySSE2<2>(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::ReadBuffer&, unsigned long) │ │ 37 │ void LZ4::(anonymous namespace)::decompressImpl<8ul, true>(char const*, char*, unsigned long) │ │ 32 │ memcpy │ └─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Sampling Profiler on a Query Level

Example: top of contexts (stacks) for a query:

SELECT count(), arrayStringConcat(arrayMap(x -> concat( demangle(addressToSymbol(x)), '\n ', addressToLine(x)), trace), '\n') AS sym FROM system.trace_log WHERE query_id = '1a1272b5-695a-4b17-966d-a1701b61b3eb' AND event_date = today() GROUP BY trace ORDER BY count() DESC LIMIT 10
count(): 154 sym: DB::VolnitskyBase<true, true, DB::StringSearcher<true, true> >::search(unsigned char const*, unsigned long) const /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::MatchImpl<true, false>::vector_constant(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&) /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::FunctionsStringSearch<DB::MatchImpl<true, false>, DB::NameLike>::executeImpl(DB::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::PreparedFunctionImpl::execute(DB::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) /home/milovidov/ClickHouse/build_gcc9/../dbms/src/Functions/IFunction.cpp:464 DB::ExpressionAction::execute(DB::Block&, bool) const /usr/local/include/c++/9.1.0/bits/stl_vector.h:677 DB::ExpressionActions::execute(DB::Block&, bool) const /home/milovidov/ClickHouse/build_gcc9/../dbms/src/Interpreters/ExpressionActions.cpp:759 DB::FilterBlockInputStream::readImpl() /home/milovidov/ClickHouse/build_gcc9/../dbms/src/DataStreams/FilterBlockInputStream.cpp:84 DB::IBlockInputStream::read() /usr/local/include/c++/9.1.0/bits/stl_vector.h:108 DB::ExpressionBlockInputStream::readImpl() /home/milovidov/ClickHouse/build_gcc9/../dbms/src/DataStreams/ExpressionBlockInputStream.cpp:34 DB::IBlockInputStream::read() /usr/local/include/c++/9.1.0/bits/stl_vector.h:108 DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::thread(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long) /usr/local/include/c++/9.1.0/bits/atomic_base.h:419 ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*, std::shared_ptr<DB::ThreadGroupStatus>, unsigned long&>(void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*&&)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*&&, std::shared_ptr<DB::ThreadGroupStatus>&&, unsigned long&)::{lambda()#1}::operator()() const /usr/local/include/c++/9.1.0/bits/shared_ptr_base.h:729 ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/local/include/c++/9.1.0/bits/atomic_base.h:551 execute_native_thread_routine /home/milovidov/ClickHouse/ci/workspace/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:81 start_thread /lib/x86_64-linux-gnu/libpthread-2.27.so clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Row-Level Security

<users> <user_name> ... <databases> <database_name> <table_name> <filter>IsYandex = 1</filter> <table_name> </database_name> </databases> </user_name> </users>
 

Adaptive Index Granularity

Automatically tune index granularity if you have fat columns.

Everything is working out of the box.

— available in 19.13.

Official RPM packages

CentOS, RedHat... GosLinux.

Supported versions starting from CentOS 5.

sudo yum install yum-utils sudo rpm --import \ https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG sudo yum-config-manager --add-repo \ https://repo.yandex.ru/clickhouse/rpm/stable/x86_64 sudo yum install clickhouse-server clickhouse-client

Upcoming

Autumn 2019

— Indexing by z-Order curve

— DDL queries for dictionaries

— S3 import/export

— Parallel parsing of data formats;

— Speedup of INSERT with VALUES with expressions;

— Aggregate functions for data clustering

— Optimization of GROUP BY with table's order key

October 2019

— Initial implementation of RBAC;

— Initial implementation of Merge JOIN;

Autumn-Winter 2019

— More than initial implementation of RBAC;

— More than initial implementation of Merge JOIN;

— Workload management;

 

.

Web site: https://clickhouse.yandex/

Maillist: [email protected]

YouTube: https://www.youtube.com/c/ClickHouseDB

Telegram chat: https://telegram.me/clickhouse_ru, clickhouse_en

GitHub: https://github.com/ClickHouse/ClickHouse/

Twitter: https://twitter.com/ClickHouseDB

Google groups: https://groups.google.com/forum/#!forum/clickhouse