Exploring Data Connectivity: Direct Lake compared to Import and Direct Query

Blogs

Introduction to Retrieval-Augmented Generation
September 24, 2024
Database Recovery 101: Handling a Corrupted Log File
September 26, 2024

Exploring Data Connectivity: Direct Lake compared to Import and Direct Query

1. Import Mode – Import Mode is the traditional and most used method for connecting to data in Power BI.

Here’s how it works:
Data Loading: In Import Mode, the data from the source is fully loaded into the BI tool’s in-memory storage. This means that the entire dataset is imported, and subsequent analyses and visualizations are performed on this imported data.
Offline Availability: One of the significant advantages of Import Mode is that once the data is imported, it can be used offline. The reports will still work even if the original data source is unavailable, making it a reliable choice for scenarios where data connectivity might be an issue.
Limitations: The main drawback is that importing large datasets can be resource-intensive and time-consuming. Additionally, the data in the report becomes static, meaning you need to refresh the import to update the data, which may not always be real-time.
Performance: Since all the data is stored in memory, Import Mode provides excellent performance, especially for complex calculations and aggregations. The response times are usually fast because the tool doesn’t need to query the database every time a report is refreshed.
The following screenshot is from the performance analyzer, which was used to evaluate the performance of two tables—one large table and one small table—using the import connection mode.
The following screenshot shows the import mode has taken less than a sec for a card to load from a huge table each time the slicer value has changed.

2. Direct Query Mode – Direct Query Mode is a more dynamic approach compared to Import Mode. Instead of importing the data, it queries the data source directly.

• Data Loading: In Direct Query Mode, no data is loaded into memory. Instead, the tool sends queries directly to the database every time you interact with a report, such as filtering or drilling down.
• Scalability: Direct Query is well-suited for scenarios where data volume is too large to fit into memory or when the data needs to be analyzed in real-time without any delay.
Limitations: Direct Query Mode requires a stable and fast connection to the data source, as any lag can impact the user experience. Additionally, some advanced features available in Import Mode, like complex DAX functions in Power BI, might be limited or unavailable in Direct Query Mode.
• Performance: The performance of Direct Query Mode depends heavily on the underlying database’s performance. Since every interaction with the report triggers a query, the response times can be slower, especially with complex queries or large datasets. However, this method allows for real-time data analysis, as it always fetches the latest data from the source.
The following screenshot is from the performance analyzer, which was used to evaluate the performance of two tables—same tables used in import mode screenshot—using the direct query connection mode.
The following screenshot clearly shows the direct query has taken more than 5 secs for a card to load from a huge table each time the slicer value has changed.

3. Direct Lake Mode – Direct Lake is a newer approach that combines some of the best aspects of both Import and Direct Query modes, particularly suited for large-scale data lakes.

• Data Loading: Unlike Import, Direct Lake doesn’t load the entire dataset into memory. However, it also doesn’t rely solely on querying the data source like Direct Query. Instead, it enables on-demand access to data stored in a data lake (like Azure Data Lake) without requiring a full import or real-time query.
• Scalability: Direct Lake shines in scenarios where data is continuously being ingested and needs to be analyzed almost in real-time. It allows organizations to handle data at a scale that would be challenging with Import or Direct Query.
• Limitations: As Direct Lake is relatively new, it may not be as widely supported or as feature rich as the more established Import and Direct Query modes. Additionally, it requires a well-architected data lake setup, which might be more complex and costly to maintain.
• Performance: Direct Lake is optimized for large-scale datasets and is designed to provide near real-time performance while managing vast amounts of data efficiently. It uses advanced caching and indexing strategies to minimize query times, even when dealing with petabytes of data.
The following screenshot is from the performance analyzer, which was used to evaluate the performance of two tables—one large table and one small table—using the direct lake connection mode.
The following screenshot clearly shows DirectLake mode connection has taken less than a sec similar to import mode for a card to load from a huge table each time the slicer value has changed.

Choosing the Right Approach
The choice between Import, Direct Query, and Direct Lake depends largely on your specific use case:
• Import is ideal for smaller datasets where performance and offline availability are priorities.
• Direct Query suits scenarios that require real-time data access and where the dataset is too large to fit into memory.
• Direct Lake is the go-to option for organizations dealing with massive data lakes, needing near real-time analytics with high scalability.

Differences between Import, Direct Query and DataLake

Capability Import DirectQuery Direct Lake
Licensing Any Fabric or Power BI license (including Microsoft Fabric Free licenses) Any Fabric or Power BI license (including Microsoft Fabric Free licenses) Fabric capacity subscription (SKUs) only
Data source Any connector Any connector that supports DirectQuery mode Only lakehouse or warehouse tables (or views)
Connect to SQL analytics endpoint views Yes Yes Yes – but will automatically fall back to DirectQuery mode
Composite models Yes – can combine with DirectQuery or Dual storage mode tables Yes – can combine with Import or Dual storage mode tables No
Single sign-on (SSO) Not applicable Yes Yes
Calculated tables Yes No – calculated tables use Import storage mode even when they refer to other tables in DirectQuery mode No – except calculation groups, what-if parameters, and field parameters, which implicitly create calculated tables
Calculated columns Yes Yes No
Hybrid tables Yes Yes No
Model table partitions Yes – either automatically created by incremental refresh, or manually created by using the XMLA endpoint No No – however partitioning can be done at the Delta table level
User-defined aggregations Yes Yes No
SQL analytics endpoint object-level security or column-level security Yes – but must duplicate permissions with semantic model object-level security Yes – but queries might produce errors when permission is denied Yes – but queries might produce errors when permission is denied
SQL analytics endpoint row-level security (RLS) Yes – but must duplicate permissions with semantic model RLS Yes Yes – but queries will fall back to DirectQuery mode
Semantic model row-level security (RLS) Yes Yes Yes – but it’s strongly recommended to use a fixed identity cloud connection
Semantic model object-level security (OLS) Yes Yes Yes
Large data volumes without refresh requirement Less suited – a larger capacity size might be required for querying and refreshing Yes Yes
Reduce data latency No Yes Yes – when automatic updates is enabled, or programmatic reframing; however, data preparation must be done upstream first

Conclusion: By choosing the right data connectivity mode, can strike the perfect balance between performance, scalability, and real-time insights, enabling you to make data-driven decisions with confidence.

 


Priyanka P Amte

Leave a Reply

Your email address will not be published. Required fields are marked *