When managing a Databricks environment for multiple teams, having clear control over resource usage becomes crucial. One powerful tool for achieving this is compute policies. These policies help ensure that compute resources are allocated efficiently, reducing wastage and maintaining consistency across your workspace.
In this post, we’ll dive into what compute policies are and how you can use them to set boundaries, enforce best practices, and keep your Databricks environment running smoothly.
1. What is Compute Policies?
A policy is a tool workspace admins can use to limit a user or group’s compute creation permissions based on a set of policy rules. Policies provide the following benefits: Limit users to creating clusters with prescribed settings. Limit users to creating a certain number of clusters.
1.1 Benefits of using compute policies
- Limit users to creating clusters with prescribed settings.
- Limit users to creating a certain number of clusters.
- Cost Management: Control cost by limiting maximum cost per cluster.
- Security: Policies can restrict which users or groups have access to certain features or resources, enhancing security by limiting permissions based on roles.
1.2 Let’s know about the Policy Families which are already pre-defined in the workspace.
When you create a policy, you can choose to use a policy family. Policy families are Azure Databricks-provide policy templates with pre-populated rules, designed to address common compute use cases.
- Job compute: This compute type is specifically dedicated to running jobs, which are automated tasks scheduled in Databricks.
-
- These clusters are used only for job execution (e.g., ETL pipelines, machine learning model training, or batch jobs).
- They can be auto-scaled based on the job’s workload requirements and are typically terminated automatically after the job completes to save costs.
- Legacy Shared Compute: This is an older type of shared cluster that was used across multiple users or workloads.
-
- This compute option allows multiple users or jobs to share the same cluster, which can be useful for smaller teams or cost-efficient resource use.
- It is not optimized for heavy or independent workloads as resources are shared.
- Personal Compute: Personal Compute clusters are dedicated to a single user for personal development or testing purposes.
-
- Unlike shared clusters, this ensures that the resources are not shared with other users, meaning there is no performance interference.
- Ideal for isolated testing, development, or research.
- Generally, these clusters are all-purpose, meaning they can handle interactive development, notebooks, and lightweight production tasks.
- Power User Compute: Designed for power users (e.g., data engineers, advanced data scientists) who need enhanced resources and performance.
-
- These clusters are typically used by more advanced users for running resource-intensive workloads like complex data transformations or model training.
- The compute resources in these clusters are higher in terms of CPU, memory, or GPU to support large-scale data engineering or machine learning tasks.
- It is typically set as an all-purpose cluster, making it suitable for both interactive and batch jobs.
- Shared Compute: This compute type allows multiple users to share the same cluster, making it cost-efficient for teams.
-
- It’s an all-purpose cluster, meaning it can run notebooks, interactive development, or lightweight jobs.
- Resources like CPU and memory are shared across all users who are running tasks or queries on the cluster.
- Shared compute is typically used for development, collaboration, or interactive exploration.
Let’s look into the policy definitions available in the Databricks Workspace.
2. What are Policy Definitions?
Policy definitions are individual policy rules expressed in JSON. It refers to a structured set of rules and constraints that govern how resources can be created, configured, and managed within a workspace.
Policy Types
There are two types of Policy types: Fixed Policies and Limiting Policies.
2.1 Fixed Policies:
2.1.1 Fixed Policy: A fixed policy locks a specific attribute to a set value, meaning the user cannot change or modify that value.
{
“spark_version”: { “type”: “fixed”, “value”: “auto:latest-lts”, “hidden”: true }
}
2.1.2 Forbidden Policy: A forbidden policy explicitly prevents users from configuring a particular attribute.
{
“instance_pool_id”: { “type”: “forbidden” }
}
2.2 Limiting Policies:
Limiting policies control a user’s configuration options but do not entirely prevent changes.
2.2.1 Allowlist Policy: Users can only choose from a specific list of allowed values.
{
“node_type_id”: {
“type”: “allowlist”,
“values”: [
“Standard_DS3_v2”,
“Standard_DS4_v2”
],
“defaultValue”: “Standard_DS3_v2”
}
}
2.2.2 Blocklist Policy: Users can select any value except those specified in the blocklist.
{
“spark_version”: { “type”: “blocklist”, “values”: [ “7.3.x-scala2.12” ] }
}
2.2.3 Regex Policy: You define a regular expression (regex) to restrict the format or pattern of the value users can configure.
{
“spark_version”: { “type”: “regex”, “pattern”: “13\.[3456].*” }
}
2.2.4 Range Policy: Users can configure values only within a specific numeric range.
{
“num_workers”: { “type”: “range”, “maxValue”: 10 }
}
2.2.5 Unlimited Policy: This policy places no restrictions on a user’s configuration options.
{
“autotermination_minutes”: {
“type”: “unlimited”,
“defaultValue”: 30,
“isOptional”: true
}
}
3. Let’s create a sample policy in the Databricks Workspace.
The below Policy will add Permissions for only assigned users or groups, who can use that policy to create a cluster.
Created a Policy named as General Policy with 9 definitions. The code is given below:
{
“node_type_id”: {
“type”: “allowlist”,
“values”: [
“Standard_DS3_v2”,
“Standard_DS4_v2”
],
“defaultValue”: “Standard_DS3_v2”
},
“spark_version”: {
“type”: “fixed”,
“value”: “auto:latest-lts”,
“hidden”: false
},
“runtime_engine”: {
“type”: “fixed”,
“value”: “STANDARD”,
“hidden”: false
},
“num_workers”: {
“type”: “fixed”,
“value”: 0,
“hidden”: false
},
“driver_instance_pool_id”: {
“type”: “forbidden”,
“hidden”: true
},
“cluster_type”: {
“type”: “fixed”,
“value”: “all-purpose”
},
“azure_attributes.availability”: {
“type”: “fixed”,
“value”: “ON_DEMAND_AZURE”,
“hidden”: true
},
“spark_conf.spark.databricks.cluster.profile”: {
“type”: “fixed”,
“value”: “singleNode”,
“hidden”: false
},
“autotermination_minutes”: {
“type”: “unlimited”,
“defaultValue”: 30,
“isOptional”: true
}
}
By these Policies, you can create a Compute Resource(Cluster) to optimize the usage of compute resources, reduces the cost and enhances security.
Pallavi A