Azure Storage Account is a cloud service from Microsoft Azure that provides you a secure and scalable way to store and manage different types of data. Azure storage account contains all your storage data objects such as blobs for unstructured data, files for shared access, queues for message handling, and tables for structured data. The storage account gives you a unique namespace for your data that you can access from anywhere in the world over HTTP or HTTPS. When you create a storage account, it has a unique name that forms part of the URL. This name must be different from all other storage account names in Azure. The storage account is built to be reliable, always available and able to handle a lot of data.
In an Azure storage account, data is organized into various components under data storage including Containers, File shares, Queues and Tables.
Containers: These are used in Azure Blob Storage to organize blobs. Containers help you manage and secure your blobs by grouping them logically. You can think of containers as folders in a file system, where each container can hold multiple blobs.
File Shares: This feature in Azure Files allows you to create file shares that can be accessed via the Server Message Block protocol. File shares are useful where you need shared access to files across multiple machines or applications.
Queues: Azure Queue Storage provides a messaging queue for storing and retrieving messages between different application components. Queues help you manage asynchronous communication and decouple different parts of your application, ensuring that messages are stored reliably until processed.
Tables: Azure Table storage stores large amounts of structured data. It handles non-relational structured data (also known as structured NoSQL data) with a key, attribute store and a schemeless design.
Azure Blob storage
Is a cloud-based object storage solution that can store unstructured data which includes audio, text, and Excel files, as well as user data from web apps and devices. Users can even access objects in Blob storage via HTTPS.
Blob storage is designed for:
- Storing large amounts of unstructured data such as text, video, and audio
- Storing files accessible across distributed system
- Streaming video and audio content
- Writing to log files
- Blob Storage always maintains multiple copies of your data to protect it from disasters.
- Azure Blob Storage provides high scalability to handle increasing data needs.
Types of blobs
- Block blobs: Storing large amounts of unstructured data, such as text and binary data. Block blobs are made up of blocks of data that can be uploaded independently and then committed together.
- Append blobs: Optimized for append operations, making them suitable for scenarios where data needs to be continuously added, such as log files. Data is added to the end of the blob without modifying existing content
- Page blobs: Designed for random read and write operations and are used primarily for virtual machine disks and other high-performance scenarios.
Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 is a cloud storage service from Microsoft Azure that’s specifically designed for big data analytics and processing. Is a unique hybrid solution that combines the advantages of object storage and file storage in a single system.
Data Lake Storage Gen2 offers a hierarchical namespace, better integration with analytics. It is a highly scalable and cost-effective way to manage large amounts of structured, unstructured, and semi-structured data. It is built on top of Azure Blob storage and provides all the features of Blob storage while also supporting the Hadoop Distributed File System, making it an ideal solution for big data analytics.
Key features of Azure Data Lake Storage Gen2
- Hierarchical Namespace: The hierarchical namespace in Azure Data Lake Storage Gen2 organizes data in a directory-like structure, like traditional file systems. This feature allows users to manage their data using folders and subfolders, making it easier to handle large volumes of files and datasets.
- Scalability: Azure Data Lake Storage Gen2 provides virtually unlimited scalability. It supports the storage of petabytes of data, making it suitable for both large-scale data repositories and dynamically growing datasets. It provides high throughput and low latency performance, ensuring efficient data access and processing. This allows organizations to expand their storage capacity without compromising performance.
- Security: It provides robust security features to protect sensitive data. It includes encryption both at rest and in transit, auditing and access control, ensuring that data is secure while stored and during transmission.
- High Availability and Disaster Recover: It ensures high availability by keeping multiple copies of your data in different locations. This means if one location experiences a problem, your data is still safe and accessible from other locations.
- Integration with Analytics Tools: It can easily connect with various data analytics tools, allowing organizations to use their preferred tools for data analysis and processing.
- Hadoop Distributed File System support: Compatible with HDFS, enabling seamless integration with big data frameworks and tools.
Azure blob Storage and Azure Data Lake Storage Gen2 (ADLS Gen2): Similarities
Azure Data Lake Storage Gen2 has many of the same features as Azure Blob Storage, except that hierarchical namespace must be enabled for ADLS Gen2. Both services are designed to store unstructured data and manage large volumes of data effectively. They offer data redundancy options such as Locally Redundant Storage and Geo-Redundant Storage to ensure data durability. Also, both provide various access tiers Hot, Cool, and Archive to optimize storage costs based on data access patterns.
Azure Blob Storage vs Azure Data Lake storage Gen2 (ADLS Gen2)
Azure Blob storage and ADLS Gen2 are both used for storing unstructured data such as videos, photos, audio files, text files and Excel files. Since you are storing data in an unstructured format, you cannot directly query data in either service. You will need to leverage another service to begin querying or analyzing that data.
Purpose
- Azure Blob Storage: Serves as a general-purpose object store suitable for a wide range of storage needs, including big data analytics.
- ADLS Gen2: Builds on Azure Blob Storage’s capabilities and is specifically optimized for large-scale analytics workloads.
How Data is Organized
- Azure Blob Storage: Data is organized within a storage account into containers, which hold blobs. It uses a logical folder structure and includes soft delete.
Soft delete: Allows for the recovery of deleted blobs within a retention period, providing a safeguard against accidental or malicious data loss.
- ADLS Gen2: Utilizes a hierarchical namespace that supports containers, files, and folders. It does not offer soft delete functionality.
Geo-Redundancy
- Azure Blob Storage: Provides several geo-redundant storage options to ensure data durability and availability:
Geo-Redundant Storage (GRS): Replicates data to a secondary region for disaster recovery.
Locally Redundant Storage (LRS): Keeps multiple copies of data within a single region for redundancy.
Zone-Redundant Storage (ZRS): Distributes data across multiple availability zones within a region for higher resilience.
- ADLS Gen2: ADLS Gen2 offers various geo-redundancy options including GRS, LRS, ZRS, and GZRS.
Geo-Zone-Redundant Storage (GZRS): Combines geo-replication with zone redundancy for protection against both regional and zonal failures.
Authentication
- Azure Blob Storage: Utilizes account access keys and shared access signatures to control access.
- ADLS Gen2: Supports authentication through Azure Active Directory, managed identities, service principals, and shared access signature.
Authorization
- Azure Blob Storage: Uses Azure Role-Based Access Control (RBAC) for managing access to storage accounts.
- ADLS Gen2: Uses Azure Role-Based Access Control for management, and Active Directory-based access control lists for data-level permissions.
Virtual Network (VNet) support
- Azure Blob Storage: VNet Integration allows Azure Blob Storage to be accessed securely from within an Azure Virtual Network. It helps restrict access to the storage account so that it can only be accessed from resources within the VNet.
- ADLS Gen2: Service Endpoints and private endpoints
Service Endpoints: These enable you to connect to ADLS Gen2 from within your VNet. Service Endpoints extend your VNet private address space to the Azure service, securing the traffic between your VNet and ADLS Gen2.
Private Endpoints: It gives ADLS Gen2 a private IP address in your VNet, letting you connect to it securely over your private network. This way, your data doesn’t go over the public internet, making it safe.
Chandana R L