Chproxy quick start
This quick start guide will walk you through an example of chproxy usage. After this guide, you will have seen how chproxy can be used to spread SELECT queries over a multi-node ClickHouse cluster.
To follow along with the quick start, clone chproxy from github and navigate to the examples/quick-start
folder.
Running the Quick Start
A docker-compose file is provided to run an example ClickHouse setup, with a multi-node ClickHouse setup.
Make sure your Docker Engine is provisioned with at least 4G of memory available. To start the ClickHouse servers, run:
Optionally, to verify that the cluster is running properly, execute the following from a new terminal shell:
The ClickHouse server exposes an HTTP interface on port 8123. Chproxy uses the HTTP interface to submit queries to Clickhouse.
The ClickHouse servers are already setup with a Distributed Table on each node, as described in Spreading INSERTs.
The tables are set-up as follows. A local table is created on each node, which holds the data:
The GenerateRandom engine is used to automatically create some test data, saving the need to manually load in date for the tutorial.
Additionally on each node a distributed table is defined, referencing the local table:
To stop the ClickHouse server and remove all data, run:
Querying the HTTP interface
We will use the following example query to query the data:
To pass this query to Clickhouse directly, execute the following command:
This will execute the query on the node exposing the HTTP interface. ClickHouse internally routes the queries to the other nodes and gathers the results on the node being queried.
You can confirm this by looking at the logs in the docker-compose shell. If you repeat the command above you will see that the query is being executed at the same node:
This can lead to situations where a single node is overloaded as it serves as the entry point for all queries on the distributed table. Additionally this leads to several underutilized nodes and one over utilized node, which isn’t efficient.
Chproxy can be utilized as a proxy in front of ClickHouse that will help balance the load. Chproxy will send the query to a different node each time the query is executed. This avoids the issues previously described.
To pass this query to chproxy directly, execute the following command:
Chproxy proxies the queries to the ClickHouse cluster and ensures an even balance of load for each of the ClickHouse nodes.
You can confirm this by looking at the logs in the docker-compose shell. If you repeat the command above you will see that the query is being executed on a different node each time:
Introducing query caching
Let’s reduce the load on ClickHouse even more by caching queries. Often when building applications that utilise ClickHouse, queries can be repeated many times. But the result won’t change each time. By caching the query response, we can limit the number of queries that are actually executed by ClickHouse.
To introduce a cache, we need to update the configuration of chproxy and define which caches should be used. The yaml snippet below highlights the configuration changes that are made. As you can see, we enable a file system cache that caches queries for 30s.
We already prepared an example configuration in examples/quick_start/resources/chproxy/config/config_with_cache.yml
. To run this example, just edit the chproxy service in the docker-compose
file and update the command:
Now when you run the docker compose stack again with docker compose up
and execute the query from above again the first time you will see it hit the ClickHouse cluster. However, each execution of the query after for 30s will be cached. You can verify this by executing the command many times and checking the logs of ClickHouse to see if the query hit ClickHouse.