Skip to content

Google's open data framework for enterprise AI: The base for enterprise artificial intelligence data operations

Data integration for strengthening AI applications previously carried potential drawbacks, according to Google, but that's no longer the case.

Google's open data platform for AI, referred to as 'lakehouse', serves as the base for...
Google's open data platform for AI, referred to as 'lakehouse', serves as the base for enterprise-level AI data operations.

Google's open data framework for enterprise AI: The base for enterprise artificial intelligence data operations

In a significant leap forward for data management and artificial intelligence (AI), Google Cloud has unveiled its Open Lakehouse architecture. This innovative platform aims to revolutionise the way businesses handle and leverage their data, particularly in the realm of AI.

At its core, the Open Lakehouse unifies structured and unstructured data, bridging the gap between traditional data warehouses and data lakes. Built on Google's BigLake storage engine, this scalable, open data lakehouse platform supports both data types seamlessly under one system. Open table formats like Apache Iceberg are employed to bring data warehouse reliability to data lakes, ensuring consistent management of diverse datasets without duplication or movement.

For developer flexibility, the platform supports multiple interoperable query and processing engines such as BigQuery for SQL analytics and serverless Spark for large-scale data processing and machine learning. This allows developers to use their preferred tools and languages on the same unified data platform, preventing silos and simplifying data access and management. The modular architecture, complete with a metadata layer similar to catalogs like Dataplex Universal Catalog, provides schema enforcement, data governance, and automatic discovery, further streamlining the data management process.

The Open Lakehouse also promises to enhance AI deployments at scale. By directing AI workloads to all relevant data, the platform enables AI models to leverage structured and unstructured data across multiple modalities (e.g., speech, audio, images) for richer insights and more comprehensive training data. The elimination of data silos and integrated metadata management makes it easier to orchestrate data pipelines that feed AI systems effectively, accelerating AI readiness at enterprise scale. The platform's ability to handle petabyte-scale data and provide fine-grained access controls optimises AI performance while ensuring compliance and security.

In essence, the Open Lakehouse represents a planetary-scale, open, governed, and multimodal data platform that underpins enterprise AI by eliminating data silos and providing a 360-degree perspective for all data sources at scale. With this platform, businesses can answer questions like "What should I do next?" by unlocking the value in structured and unstructured data while minimising complexity.

This development is hailed as the best of both worlds, combining the capabilities of a data warehouse and data lake in one solution. As AI becomes an integral part of all business strategies, the Open Lakehouse is poised to play a crucial role in driving productivity growth and fueling AI capabilities, as predicted by McKinsey, which believes that AI could increase US productivity growth by 1.5 percent annually over a ten-year period.

References: 1. Google Cloud. (2021). What is BigLake? Retrieved from https://cloud.google.com/biglake/docs/overview 2. Google Cloud. (2021). Apache Iceberg. Retrieved from https://cloud.google.com/bigquery/docs/iceberg 3. Google Cloud. (2021). Dataplex Universal Catalog. Retrieved from https://cloud.google.com/dataplex/docs/concepts/universal-catalog 4. Google Cloud. (2021). Vertex AI. Retrieved from https://cloud.google.com/vertex-ai 5. McKinsey & Company. (2018). Artificial Intelligence: The next frontier for growth. Retrieved from https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/artificial-intelligence-the-next-frontier-for-growth

  1. The Open Lakehouse platform, unveiled by Google Cloud, aims to revolutionize business data management and AI, especially in the realms of machine learning and finance.
  2. Built on Google's BigLake storage engine, the Open Lakehouse is an open, scalable data lakehouse that supports both structured and unstructured data, bridging the gap between traditional data warehouses and data lakes.
  3. The platform employs open table formats like Apache Iceberg to bring data warehouse reliability to data lakes, enabling diverse datasets to be managed consistently without duplication or movement.
  4. For developer flexibility, the Open Lakehouse supports multiple interoperable query and processing engines such as BigQuery for SQL analytics and serverless Spark for large-scale data processing and machine learning, allowing developers to use their preferred tools and languages on the same unified data platform.
  5. By eliminating data silos and providing a 360-degree perspective for all data sources at scale, the Open Lakehouse is expected to significantly drive productivity growth and fuel AI capabilities, contributing to an annual increase of US productivity growth by 1.5 percent over a ten-year period, as predicted by McKinsey.

Read also:

    Latest