ClickHouse
This is a deep dive into ClickHouse configuration. Follow one of the deployment guides to get started.
ClickHouse is the main OLAP storage solution within Langfuse for our Trace, Observation, and Score entities. It is optimized for high write throughput and fast analytical queries. This guide covers how to configure ClickHouse within Langfuse and what to keep in mind when (optionally) bringing your own ClickHouse.
Langfuse supports ClickHouse versions >= 24.3.
Configuration
Langfuse accepts the following environment variables to fine-tune your ClickHouse usage. They need to be provided for the Langfuse Web and Langfuse Worker containers.
Variable | Required / Default | Description |
---|---|---|
CLICKHOUSE_MIGRATION_URL | Required | Migration URL (TCP protocol) for the ClickHouse instance. Pattern: clickhouse://<hostname>:(9000/9440) |
CLICKHOUSE_MIGRATION_SSL | false | Set to true to establish an SSL connection to ClickHouse for the database migration. |
CLICKHOUSE_URL | Required | Hostname of the ClickHouse instance. Pattern: http(s)://<hostname>:(8123/8443) |
CLICKHOUSE_USER | Required | Username of the ClickHouse database. Needs SELECT, ALTER, INSERT, CREATE, DELETE grants. |
CLICKHOUSE_PASSWORD | Required | Password of the ClickHouse user. |
CLICKHOUSE_DB | default | Name of the ClickHouse database to use. |
CLICKHOUSE_CLUSTER_ENABLED | true | Whether to run ClickHouse commands ON CLUSTER . Set to false for single-container setups. |
LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED | false | Whether to disable automatic ClickHouse migrations. |
Langfuse uses default
as the cluster name if CLICKHOUSE_CLUSTER_ENABLED is set to true
.
You can overwrite this by setting CLICKHOUSE_CLUSTER_NAME
to a different value.
In that case, the database migrations will not apply correctly as they cannot run dynamically for different clusters.
You must set LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED = false
and run ClickHouse migrations manually.
Clone the Langfuse repository, adjust the cluster name in ./packages/shared/clickhouse/migrations/clustered/*.sql
and run cd ./packages/shared && sh ./clickhouse/scripts/up.sh
to manually apply the migrations.
Deployment Options
This section covers different deployment options and provides example environment variables.
ClickHouse Cloud
ClickHouse Cloud is a scalable and fully managed deployment option for ClickHouse. You can provision it directly from ClickHouse or through one of the cloud provider marketplaces:
ClickHouse Cloud clusters will be provisioned outside your cloud environment and your VPC, but Clickhouse offers private links for AWS, GCP, and Azure.
Example Configuration
Set the following environment variables to connect to your ClickHouse instance:
CLICKHOUSE_URL=https://<identifier>.<region>.aws.clickhouse.cloud:8443
CLICKHOUSE_MIGRATION_URL=clickhouse://<identifier>.<region>.aws.clickhouse.cloud:9440
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=changeme
CLICKHOUSE_MIGRATION_SSL=true
Troubleshooting
- ‘error: driver: bad connection in line 0’ during migration: If you see the previous error message during startup of your web container, ensure that the
CLICKHOUSE_MIGRATION_SSL
flag is set and that Langfuse Web can access your ClickHouse environment. Review the IP whitelisting if applicable and whether the instance has access to the Private Link.
ClickHouse on Kubernetes (Helm)
The Bitnami ClickHouse Helm Chart provides a production ready deployment of ClickHouse using a given Kubernetes cluster. We use it as a dependency for Langfuse K8s. See Langfuse on Kubernetes (Helm) for more details on how to deploy Langfuse on Kubernetes.
Example Configuration
For a minimum production setup, we recommend to use the following values.yaml overwrites when deploying the Clickhouse Helm chart:
clickhouse:
deploy: true
shards: 1 # Fixed: Langfuse does not support multi-shard clusters
replicaCount: 3
resourcesPreset: large # or more
auth:
username: default
password: changeme
- shards: Shards are used for horizontally scaling ClickHouse. A single ClickHouse shard can handle multiple Terabytes of data. Today, Langfuse does not support a multi-shard cluster, i.e. this value must be set to 1. Please get in touch with us if you hit scaling limits of a single shard cluster.
- replicaCount: The number of replicas for each shard. ClickHouse counts the all instances towards the number of replicas, i.e. a replica count of 1 means no redundancy at all. We recommend a minimum of 3 replicas for production setups. The number of replicas cannot be increased at runtime without manual intervention or downtime.
- resourcesPreset: ClickHouse is CPU and memory intensive for analytical and highly concurrent requests. We recommend at least the
large
resourcesPreset and more for larger deployments. - auth: The username and password for the ClickHouse database. Overwrite those values according to your preferences, or mount them from a secret.
- disk: The ClickHouse Helm chart uses the default storage class to create volumes for each replica. Ensure that the storage class has
allowVolumeExpansion = true
as observability workloads tend to be very disk heavy. For cloud providers like AWS, GCP, and Azure this should be the default.
Langfuse assumes that certain parameters are set in the ClickHouse configurations. To perform our database migrations, the following values must be provided:
<!--
Substitutions for parameters of replicated tables.
Optional. If you don't use replicated tables, you could omit that.
See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/#creating-replicated-tables
-->
<!--
<macros>
<shard>01</shard>
<replica>example01-01-1</replica>
</macros>
-->
<!--
<default_replica_path>/clickhouse/tables/{database}/{table}</default_replica_path>
<default_replica_name>{replica}</default_replica_name>
-->
macros
and default_replica_*
configuration should be covered by the Helm chart without any further configuration.
Set the following environment variables to connect to your ClickHouse instance assuming that Langfuse runs within the same Cluster and Namespace:
CLICKHOUSE_URL=http://<chart-name>-clickhouse:8123
CLICKHOUSE_MIGRATION_URL=clickhouse://<chart-name>-clickhouse:9000
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=changeme
Docker
You can run ClickHouse in a single Docker container for development purposes. As there is no redundancy, this is not recommended for production workloads.
Example Configuration
Start the container with
docker run --name clickhouse-server \
-e CLICKHOUSE_DB=default \
-e CLICKHOUSE_USER=clickhouse \
-e CLICKHOUSE_PASSWORD=clickhouse \
-d --ulimit nofile=262144:262144 \
-p 8123:8123 \
-p 9000:9000 \
-p 9009:9009 \
clickhouse/clickhouse-server
Set the following environment variables to connect to your ClickHouse instance:
CLICKHOUSE_URL=http://localhost:8123
CLICKHOUSE_MIGRATION_URL=clickhouse://localhost:9000
CLICKHOUSE_USER=clickhouse
CLICKHOUSE_PASSWORD=clickhouse
CLICKHOUSE_CLUSTER_ENABLED=false
Blob Storage as Disk
ClickHouse supports blob storages (AWS S3, Azure Blob Storage, Google Cloud Storage) as disks. This is useful for auto-scaling storages that live outside the container orchestrator and increases availability und durability of the data. For a full overview of the feature, see the ClickHouse External Disks documentation.
Below, we give a config.xml example to use S3 and Azure Blob Storage as disks for ClickHouse Docker containers using Docker Compose. Keep in mind that metadata is still stored on local disk, i.e. you need to use a persistent volume for the ClickHouse container or risk loosing access to your tables.
S3 Example
Create a config.xml file with the following contents in your local working directory:
<clickhouse>
<merge_tree>
<storage_policy>s3</storage_policy>
</merge_tree>
<storage_configuration>
<disks>
<s3>
<type>object_storage</type>
<object_storage_type>s3</object_storage_type>
<metadata_type>local</metadata_type>
<endpoint>https://s3.eu-central-1.amazonaws.com/example-bucket-name/data/</endpoint>
<access_key_id>ACCESS_KEY</access_key_id>
<secret_access_key>ACCESS_KEY_SECRET</secret_access_key>
</s3>
</disks>
<policies>
<s3>
<volumes>
<main>
<disk>s3</disk>
</main>
</volumes>
</s3>
</policies>
</storage_configuration>
</clickhouse>
Replace the Access Key Id and Secret Access key with appropriate AWS credentials and change the bucket name within the endpoint
element.
Alternatively, you can replace the credentials with <use_environment_credentials>1</use_environment_credentials>
to automatically retrieve AWS credentials from environment variables.
Now, you can start ClickHouse with the following Docker Compose file:
services:
clickhouse:
image: clickhouse/clickhouse-server
user: "101:101"
container_name: clickhouse
hostname: clickhouse
environment:
CLICKHOUSE_DB: default
CLICKHOUSE_USER: clickhouse
CLICKHOUSE_PASSWORD: clickhouse
volumes:
- ./config.xml:/etc/clickhouse-server/config.d/s3disk.xml:ro
- langfuse_clickhouse_data:/var/lib/clickhouse
- langfuse_clickhouse_logs:/var/log/clickhouse-server
ports:
- "8123:8123"
- "9000:9000"
volumes:
langfuse_clickhouse_data:
driver: local
langfuse_clickhouse_logs:
driver: local
Azure Blob Storage Example
Create a config.xml file with the following contents in your local working directory. The credentials below are the default Azurite credentials and considered public.
<clickhouse>
<merge_tree>
<storage_policy>blob_storage_disk</storage_policy>
</merge_tree>
<storage_configuration>
<disks>
<blob_storage_disk>
<type>object_storage</type>
<object_storage_type>azure_blob_storage</object_storage_type>
<metadata_type>local</metadata_type>
<storage_account_url>http://azurite:10000/devstoreaccount1</storage_account_url>
<container_name>langfuse</container_name>
<account_name>devstoreaccount1</account_name>
<account_key>Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==</account_key>
</blob_storage_disk>
</disks>
<policies>
<blob_storage_disk>
<volumes>
<main>
<disk>blob_storage_disk</disk>
</main>
</volumes>
</blob_storage_disk>
</policies>
</storage_configuration>
</clickhouse>
You can start ClickHouse together with an Azurite service using the following Docker Compose file:
services:
clickhouse:
image: clickhouse/clickhouse-server
user: "101:101"
container_name: clickhouse
hostname: clickhouse
environment:
CLICKHOUSE_DB: default
CLICKHOUSE_USER: clickhouse
CLICKHOUSE_PASSWORD: clickhouse
volumes:
- ./config.xml:/etc/clickhouse-server/config.d/azuredisk.xml:ro
- langfuse_clickhouse_data:/var/lib/clickhouse
- langfuse_clickhouse_logs:/var/log/clickhouse-server
ports:
- "8123:8123"
- "9000:9000"
depends_on:
- azurite
azurite:
image: mcr.microsoft.com/azure-storage/azurite
container_name: azurite
command: azurite-blob --blobHost 0.0.0.0
ports:
- "10000:10000"
volumes:
- langfuse_azurite_data:/data
volumes:
langfuse_clickhouse_data:
driver: local
langfuse_clickhouse_logs:
driver: local
langfuse_azurite_data:
driver: local
This will store ClickHouse data within the Azurite bucket.
Backups
ClickHouse Cloud manages backups automatically for you. For self-hosted ClickHouse instances, you need to create backups on your own. We recommend following the ClickHouse backup guide for more details. In addition, the ClickHouse state can be restored based on the data in S3. While this is computationally expensive and may take a long time, it is an alternative safety measure to prevent data loss.