ClickHouse Roadmap 2021

ClickHouse Roadmap 2021

You don't need this presentation

ClickHouse Roadmap is publicly available on GitHub:

https://github.com/ClickHouse/ClickHouse/issues/17623

I will show you only some highlights and examples.

Main Tasks

Provide alternative for ZooKeeper Nested and semistructured data Limited support for transactions Backups Hedged requests Window functions Separation of storage and compute Short-circuit evaluation Projections Lightweight DELETE/UPDATE Workload management User Defined Functions Simplify replication JOIN improvements Embedded documentation Pluggable auth with tokens

https://github.com/ClickHouse/ClickHouse/issues/17623

Support for Nested and Semistructured Data

Work in progress. Initial support in version 21.1.

Data types:

Tuple(T1, T2...)
Tuple(x1 T1, x2 T2...)
Map(T1, T2)
Nested(x1 T1, x2 T2...)

Support for subcolumns:

SELECT cart.id, cart.price FROM table
— only queried subcolumns will be read from table.

Support for Nested and Semistructured Data

Work in progress. Initial support in version 21.1.

Multiple nesting:

cart Nested(
    item_id UInt64,
    item_price Decimal(20, 5),
    features Nested(
        ...))

SELECT cart.item_id, cart.features.f1 FROM table

SELECT cart.* FROM table

Maps naturally to nested JSON and Protobuf.

Window Functions

Work in progress. Initial support in version 21.1.

SET allow_experimental_window_functions = 1

Already supported:
— OVER (PARTITION BY ... ORDER BY ...)
— aggregate functions over windows;
— WINDOW clause;

Upcoming:
— non-aggregate window functions (rank, etc...);
— frame specifications;

Projections

Multiple data representations inside a single table.

— different data order;
— subset of columns;
— subset of rows;
— aggregation.

Work in progress.

Difference to materialized views:

— projections data is always consistent;
— updated atomically with the table;
— replicated in the same way as the table;
— projection can be automatically used for SELECT query.

Alternative to ZooKeeper

Work in Progress.

— ZooKeeper network protocol is implemented;
— Abstraction layer over ZooKeeper is used;
— ZooKeeper data model is implemented for testing;
— TestKeeperServer: a server with ZooKeeper data model for testing;

Benefits:
— less operational complexity;
— fix "zxid overflow" issue;
— fix the issue with max packet size;
— fix "session expired" due to gc pauses;
— improve memory usage;
— allow compressed snapshots;
— allow embedding into clickhouse-server.

Short-circuit Evaluation

SELECT IF(number = 0, 0, 123 % number) FROM numbers(10)

— division by zero.

SELECT * FROM numbers(10) WHERE number > 0 AND 10 % number > 0

— division by zero.

— both branches of IF, AND, OR are always evaluated.

SELECT * FROM
(
    SELECT * FROM numbers(10)
    WHERE number > 0
)
WHERE 10 % number > 0

— division by zero.

User Defined Functions

We are considering five ways to implement UDF, two of them are mandatory:

1. UDF as SQL expressions.

CREATE FUNCTION f AS x -> x + 1

2. UDF as executable script.

Interaction via pipes, data is serialized using supported formats.

Hedged Requests

Send distributed query to multiple replicas — to mitigate tail latencies.

This is needed for distributed queries on large clusters (with large "fanout").

Work in progress.

* The largest ClickHouse cluster in Yandex is 630+ servers,
but there are many larger clusters in other companies.

Bonus

Native integration with PostgreSQL

— PostgreSQL table engine and table function;
— PostgreSQL dictionary source;
— PostgreSQL database engine as a view to all tables in PG database;

Available in version 21.2-testing.

In previous versions it was only available via ODBC with many complications.

?

Read the official roadmap and ask your questions:

https://github.com/ClickHouse/ClickHouse/issues/17623