Optimize Your Data Storage with Microsoft OneLake

By Kristofer Schlieper | June 5, 2024

Microsoft OneLake Main Photo

Microsoft is revolutionizing the way organizations store their data by introducing OneLake.  

Storing your data in an optimized and efficient way is key to optimizing processes across your organization. The cleaner and more easily accessible your data is, the easier it is for your team members to use it to do their jobs effectively.

For example, a salesperson who has all the information they need about a potential client at their fingertips will be able to have a more personalized interaction, leading to greater potential for a sale. Similarly, a team member in your finance department will be able to better manage payments, invoices, and financial reporting if they can find the data they need compiled neatly in a single location. That is where Microsoft Onelake comes in.

What is Microsoft OneLake?

OneLake is a single, consolidated data lake designed to encompass all of the data in your system. It operates like OneDrive, as it automatically comes with every Microsoft Fabric tenant and serves as the central location for your data. It eliminates data silos and simplifies the ways you access, discover, analyze, and secure your data.  

Previously, organizations would have to create multiple “Data Lakes” for different business groups. While grouping data does have some benefits, it poses challenges for collaboration when you must pick data from various places and then centralize it.  

OneLake addresses this by ensuring every Fabric tenant automatically has access with nothing to manage or set up.

How Does Microsoft OneLake Work?

There are a lot of great features and functionalities that come with OneLake. While it would take more than one blog to highlight all of them, we will review some key areas that can help you optimize your data strategies.

What is a Microsoft OneLake Lakehouse?

To store data in OneLake, you can create a lakehouse in Microsoft Fabric. Lakehouses are powerful data architecture platforms for storing, managing, and examining structured and unstructured data. It's flexible and scalable in processing and analyzing large data volumes and integrates well with other data management solutions. Some key features of lakehouses include:

  • Lakehouse SQL Analytics Endpoint: Automatically generates a read-only SQL analytics endpoint and a default semantic model.
  • Automatic Table Discovery and Registration: Data engineers and scientists can benefit from a fully managed file-to-table experience using PySpark.
  • Interacting with the Lakehouse Item:
    • Lakehouse Explorer: The main interaction page allows you to load data, explore it using the object explorer, and set MIP labels.
    • Notebooks: Data engineers can write code directly to Lakehouse tables or folders using notebooks.
    • Pipelines: Integration tools like the pipeline copy tool facilitate pulling data from other sources into the Lakehouse.
    • Apache Spark Job Definitions: Developers can orchestrate compiled Spark jobs in Java, Scala, or Python.
    • Dataflow Gen 2: Ingestion and data preparation using Dataflows Gen2.
    • Multitasking efficiency: Lakehous enhances multitasking by providing a browser tab design. Users can seamlessly switch between multiple items, manage data tasks efficiently, and preserve running operations (e.g. data uploads or load operations)

How You Can Connect to Microsoft OneLake

OneLake provides open access to all your Fabric items using existing Azure Data Lake Storage (ADLS) Gen2 APIs and SDKs. You can access your data in OneLake through any compatible API, SDK, or tool by using a OneLake URI. Whether uploading data via Azure Storage Explorer or reading a delta table via a shortcut from Azure Databricks, OneLake simplifies data management.

As a Software as a Service (SaaS) offering, certain operations (e.g., managing permissions or updating items) are performed through Fabric experiences rather than ADLS Gen2 APIs. For a comprehensive list of API changes specific to OneLake, refer to the OneLake API parity documentation.

Securing Your Data in Microsoft OneLake

Like many cloud-connected Microsoft services, Microsoft Fabric – and OneLake – come with strong security measures to keep your data safe. Here are some key security features:

  • Fabric Security Model: Fabric comes with multi-layer security that can be set at the workspace level, for individual items, or through granular permissions within each Fabric engine.
  • Data Security Based On Roles: Workspaces contain Fabric Items that require users to have specific sets of permissions to view and use them. Admins can customize access in two main areas:
    • Workspace permissions grant access to all items in a workspace
    • Fabric Item permissions that allow access to specific items
  • Engine-Specific Data Security: Many Fabric engines support fine-grained control while some compute engines have their own security models.
  • OneLake Data Access Roles: Users can create custom roles within a lakehouse and grand read permissions to specified folders. Roles can be assigned to users, security groups, or based on workspace roles.
  • Shortcut Security: OneLake Folder security applies to shortcuts based on roles defined in the lakehouse, simplifying data management.
  • Authentication and Encryption: OneLake uses Microsoft Entra ID for authentication and data stored there is encrypted using Microsoft-managed keys.

Security is crucial for data protection, and OneLake comes with robust and controlled security measures to help you control your environment and manage your data.

Unify Your Data with Shortcuts

OneLake shortcuts serve as objects that point to internal or external storage locations - also known as targets. They can be separated into two points:

  1. The target path of a shortcut refers to the location it points to
  2. The shortcut path is where the shortcut appears within OneLake

Shortcuts function like symbolic links and are independent of the target. Note that deleting a shortcut does not affect the target, but it will change the target path. This means moving, renaming, or deleting a target path can break the shortcut.

You can create shortcuts in lakehouses and Kusto Query Language (KQL) databases. Additionally, users who aren’t skilled programmers can use Fabric UI to create shortcuts interactively. Programmers can use the REST API to create shortcuts.

Find and Manage Files Through the OneLake File Explorer

This feature integrates OneLake with your Windows File Explorer. It syncs your OneLake items such as metadata on files and folders, or changes made locally to the OneLake service. to the Windows File Explorer.

When you create, update, or delete a file via Windows File Explorer, it automatically syncs the changes to the OneLake service. Updates to your item made outside of your File Explorer aren't automatically synced. To pull these updates, you need to right-click on the item or subfolder in Windows File Explorer and select Sync from OneLake.

Starting in version 1.0.13, the OneLake file explorer app will now notify you when a new update is available. You’ll receive a Windows notification when a new version becomes available and the OneLake icon will change.

Operate Centrally in the OneLake Data Hub

It’s easy to find, explore, and use data within your organization. You will find information about each data item and entry points for working with them.

The data hub also provides:

  • A filterable list of all the data items you can access
  • Recommended data items in a gallery format
  • The ability to find data items by workspace
  • A way to display only the data items of a selected domain
  • An options menu of things you can do with the data item

Integration with Microsoft Azure Services

Integrations between Azure and OneLake include:

  • Azure Synapse Analytics
  • Azure Storage Explorer
  • Azure Databricks
  • Databricks Unity Catalog
  • Azure HDInsight
  • PowerShell

Looking Ahead to the Future of Data Management with Microsoft OneLake

OneLake is a vital part of the Microsoft Fabric ecosystem that empowers you to maximize your data storage and integrates very well with other solutions in the Microsoft suite. Hopefully, this blog gives you a sense of how you can use OneLake to enhance processes across your organization.

Want to Learn More About Optimizing Data Storage?

Get in touch with the Stoneridge team! Our data experts can help you organize and consolidate your data so you and your team can use it to its full potential.

Related Posts

Under the terms of this license, you are authorized to share and redistribute the content across various mediums, subject to adherence to the specified conditions: you must provide proper attribution to Stoneridge as the original creator in a manner that does not imply their endorsement of your use, the material is to be utilized solely for non-commercial purposes, and alterations, modifications, or derivative works based on the original material are strictly prohibited.

Responsibility rests with the licensee to ensure that their use of the material does not violate any other rights.

Start the Conversation

It’s our mission to help clients win. We’d love to talk to you about the right business solutions to help you achieve your goals.

Subscribe To Our Blog

Sign up to get periodic updates on the latest posts.

Thank you for subscribing!