The Copy Job in Microsoft Fabric Data Factory is a streamlined feature designed for efficient data ingestion. It abstracts complex pipeline configurations by providing a simplified method to move data from a source system into a Lakehouse or Warehouse. Focused on ingestion, it eliminates the need for extensive custom development, offering a user-friendly configuration GUI or wizard for quick setup. This makes it an ideal solution for handling incremental data loads and syncing large datasets efficiently, empowering users to focus on data insights rather than pipeline complexities.
Benefits of Copy Job in Microsoft Fabric
• Simplicity: Data Factory in Fabric simplifies data integration with over 100 connectors and essential tools for data operations. The Copy activity handles data ingestion, while Dataflow Gen 2 manages data transformation. The Data pipeline orchestrates the entire process. Although creating a Data pipeline can be complex for new users, the Copy Job simplifies this with a more user-friendly approach. With just a few clicks, users can transfer data between any source and destination and easily.
• Efficiency: With Copy Job, changes in your data are automatically captured, minimizing manual intervention and ensuring your information remains consistently up to date. This automation not only saves time but also improves accuracy, allowing you to focus on strategic priorities while the system efficiently manages incremental updates. By optimizing resource usage and accelerating copy times, Copy Job reduces system strain and delivers faster access to updated data, enhancing overall performance and productivity.
• Flexibility: Offers full control over data copying processes, allowing you to customize configurations to fit the needs. Copy Job provides an easy way to move data with full customization. Specific tables and columns can be selected for copying, data can be mapped and read/write settings configured. Flexible schedules allow for one-time or recurring transfers, offering complete control over data movement with precision and ease.
• High Performance: Copy Job is designed for fast and efficient data transfer. It moves large amounts of data across different sources and destinations easily. With its serverless setup and parallel processing, it uses network and storage resources effectively for better performance. Whether it’s bulk transfers or updates, Copy Job ensures smooth, faster transfers with lower costs and better efficiency.
How to create a job
Step 1: Log in to Fabric, select the workspace, and choose Data Factory.
Step 2: On the Data Factory page, select the Copy Job component.
Step 3: A new pop-up window will appear, provide a name for the Copy Job and click “Create”.
In this example, the job is named copyjob_test
Step 4: Clicking “Create” opens a wizard to guide the job creation process.
The first step is to choose data source
Step 5: In this example, the source is an Azure SQL Database. Enter the connection details and click “Next” after completing the fields.
Step 6: Select the data to be transferred to the destination by choosing the required tables. All tables can be selected, or specific ones. A preview of the selected table data is available. After the selection, click “Next”.
Step 7: Select the data destination. In this example, the destination is a Microsoft Fabric warehouse. After choosing the warehouse option, select the workspace and assign a name to the warehouse. Then, click Create and connect.
Step 8: Map to the destination. Here, destination table names can be edited, and the mapping, schema, and data types can be adjusted either manually or by selecting options.
Step 9: Copy Job Mode
Select the desired method for copying data. This mode will be applied each time the job is run, whether it’s a one-time or recurring task. Once the new copy job is created, it can be scheduled for regular runs, and its status can be monitored.
There are 3 options: Full Copy, Incremental Copy, and Stream Copy. In this example, a Full Copy is performed.
Note: For incremental copy, an incremental column must be present in the table.
Step 10: The next step is to review and save, where the job can be saved and executed. Once the job runs, it will copy the tables into warehouse and remain inactive until manually started again or scheduled to run.
The results will be displayed, including details such as the source, destination, status (successful or failed), rows read, rows written, run start time, and run end time.
Conclusion
The Copy Job in Microsoft Fabric provides a simplified, efficient, and high-performance solution for data integration, streamlining the process of transferring data between sources and destinations. By offering flexibility in configuration, automation for incremental updates, and a user-friendly interface, it enables faster, more accurate data movement. This allows users to focus on deriving insights from data rather than managing complex pipelines. Whether for one-time bulk transfers or scheduled recurring tasks, Copy Job ensures a seamless, efficient, and cost-effective data integration experience, optimizing both performance and resource usage.
Chandana R L