Ever wonder what’s really happening behind the scenes in your Databricks workspace? Who’s reading or writing data, what files are being accessed, or whether any unusual activity is taking place in your storage? You don’t need a complex monitoring setup to find out. With a little understanding of system tables and audit logs, you can uncover a detailed picture of how your Databricks storage is being used.
Think of system tables like the activity log on your phone or computer. They quietly keep track of everything happening in your Databricks environment: who did what, when, and how.
These logs include things like:
And the best part? You can look into these records just like reading any other table, by running SQL queries.
Note: To access system tables, your workspace must be enabled for Unity Catalog.
You can refer the databricks official website to see which all types of System tables are available.
Here are a few real-world reasons:
Let’s dive into a simple scenario where the system tables are more beneficial.
You want to identify clusters that:
This helps in cost optimization by flagging clusters that might be over-provisioned or idle.
WITH avg_cpu_usage AS (
SELECT
cluster_id,
AVG(cpu_user_percent + cpu_system_percent) AS avg_cpu_utilization
FROM system.compute.node_timeline
WHERE end_time >= now() – INTERVAL 1 DAY
GROUP BY cluster_id
)
SELECT
c.cluster_id,
c.cluster_name,
c.owned_by,
c.worker_count,
c.driver_node_type,
c.worker_node_type,
c.create_time,
a.avg_cpu_utilization
FROM system.compute.clusters c
JOIN avg_cpu_usage a ON c.cluster_id = a.cluster_id
WHERE a.avg_cpu_utilization < 20
AND c.worker_count >= 4
AND c.delete_time IS NULL
ORDER BY a.avg_cpu_utilization ASC;
Databricks System Tables make it much easier to understand what’s happening in your data platform. Whether you’re a data engineer trying to improve performance or someone in charge of data security and compliance, these tables give you the clear, useful insights you need—all in one place.
Pallavi A