BigQueryCloud DataflowCloud Dataproc Cloud Composer Cloud Datalab Data Studio Cloud Dataprep Cloud Pub/sub

Analytics Data Warehouse. 

Google offers a proven, integrated end to end Big Data solution, based on years of innovation, that lets you capture, process, store and analyze your data within a single platform. With Google Cloud Platform you can focus on finding insights rather than managing your infrastructure and you can combine cloud-native services with open source tools as needed, both in batch and stream mode.

Google BigQuery is Google's fully managed, low cost analytics data warehouse. BigQuery is serverless, there is no infrastructure to manage, no need to guess the needed capacity or overprovision, and you don't need a database administrator. You can focus on analyzing data to find meaningful insights, use familiar SQL, and take advantage of Google’s pay-as-you-go model.

Benefits

  • Get up and running fast

Google BigQuery runs blazing-fast SQL queries on gigabytes to petabytes of data and makes it easy to join public or commercial datasets with your data. Eliminate the time-consuming work of provisioning infrastructure and reduce your downtime with a serverless infrastructure that handles all ongoing maintenance, including patches and upgrades

  • Scale seamlessly

Remove the headache of planning for data warehouse capacity and reach for infinity with elastic capacity scaling that has no limit. Google BigQuery meets the challenges of real-time analytics by leveraging Google’s serverless infrastructure that uses automatic scaling and high-performance streaming ingestion to load data.

  • Accelerate your insights with powerful analysis

Google BigQuery gives you full view of all your data by seamlessly querying data stored in BigQuery’s managed columnar storage, Google Cloud Storage, Google Cloud Bigtable, Google Sheets and Google Drive. BigQuery integrates with existing ETL tools like Informatica and Talend to enrich the data you already use and supports popular BI tools like Tableau, MicroStrategy, Looker, Google DataStudio out of the box, so anyone can easily create stunning reports and dashboards.

  • Protect your business data and investments

Google BigQuery eliminates the data operations burden by providing automatic data replication for disaster recovery and high-availability of processing for no additional charge. BigQuery makes it easy to maintain strong security with fine-grained identity and access management control. BigQuery data is always encrypted, at rest and in transit.

Features

  • Serverless data warehousing gives you the resources you need, when you need them. With BigQuery, you can focus on your data and analysis, rather than operating and sizing computing resources.
  • BigQuery’s high-speed streaming insertion API provides a powerful foundation for real-time analytics.
  • Free data and compute replication in multiple locations means your data is available for query even in the case of extreme failure modes
  • With Cloud Dataproc and Cloud Dataflow, BigQuery provides integration with the Apache Big Data ecosystem, allowing existing Hadoop/Spark, and Beam workloads to read or write data directly from BigQuery
  • BigQuery makes it easy to maintain strong security with fine-grained identity and access management with Google Cloud IAM, and your data is always encrypted at rest and in transit.
  • BigQuery gives you the option of geographic data control, without having the headaches of setting up and managing clusters and other computing resources in-region.
  • BigQuery provides a flexible, powerful foundation for Machine Learning and Artificial Intelligence
  • BigQuery provides a REST API for easy programmatic access and application integration
  • BigQuery provides rich monitoring, logging and alerting through Stackdriver Audit Logs

Batch and Stream Data Processing.

Google Cloud Dataflow offers a unified programming model and a managed service for executing a wide range of data processing patterns including streaming analytics, ETL, and batch computation. Cloud Dataflow frees you from operational tasks like capacity planning, resource management and performance optimization.

Benefits

  • Faster development, easier management

Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed. And with its serverless approach to resource provisioning and management, you have access to virtually limitless capacity to solve your biggest data processing challenges, while paying only for what you use. Cloud Dataflow unlocks transformational use cases across industries, including:

● Clickstream, Point-of-Sale, and segmentation analysis in retail

● Fraud detection in financial services

● Personalized user experience in gaming

● IoT analytics in manufacturing, healthcare, and logistics



  • Accelerate development for batch & streaming

Cloud Dataflow supports fast, simplified pipeline development via expressive Java and Python APIs in the Apache Beam SDK, which provides a rich set of windowing and session analysis primitives as well as an ecosystem of source and sink connectors. Plus, Beam’s unique, unified development model lets you reuse more code across streaming and batch pipelines.

  • Simplify operations & management

GCP’s serverless approach removes operational overhead with performance, scaling, availability, security and compliance handled automatically so users can focus on programming instead of managing server clusters. Integration with Stackdriver, GCP’s unified logging and monitoring solution, lets you monitor and troubleshoot your pipelines as they are running. Rich visualization, logging, and advanced alerting help you identify and respond to potential issues.

  • Build on a foundation for machine learning

Use Cloud Dataflow as a convenient integration point to bring predictive analytics to fraud detection, real-time personalization and similar use cases by adding TensorFlow-based Cloud Machine Learning models and APIs to your data processing pipelines.


  • Use your favorite and familiar tools

Cloud Dataflow seamlessly integrates with GCP services for streaming events ingestion (Cloud Pub/Sub), data warehousing (BigQuery), machine learning (Cloud Machine Learning), and more. Its Beam-based SDK also lets developers build custom extensions and even choose alternative execution engines, such as Apache Spark via Cloud Dataproc or on-premises. For Apache Kafka users, a Cloud Dataflow connector makes integration with GCP easy.

Features

  • Automated Resource Management - Cloud Dataflow automates provisioning and management of processing resources to minimize latency and maximize utilization; no more spinning up instances by hand or reserving them.
  • Dynamic Work Rebalancing - Automated and optimized work partitioning dynamically rebalances lagging work. No need to chase down “hot keys” or pre-process your input data.
  • Reliable & Consistent Exactly-once Processing - Provides built-in support for fault-tolerant execution that is consistent and correct regardless of data size, cluster size, processing pattern or pipeline complexity.
  • Horizontal Auto-scaling - Horizontal auto-scaling of worker resources for optimum throughput results in better overall price-to-performance.
  • Unified Programming Model - Apache Beam SDK offers equally rich MapReduce-like operations, powerful data windowing, and fine-grained correctness control for streaming and batch data alike.
  • Community-driven Innovation - Developers wishing to extend the Cloud Dataflow programming model can fork and/or contribute to Apache Beam.

Managed Hadoop & Spark.

Use Google Cloud Dataproc, a managed Spark and Hadoop service, to easily process big datasets using the powerful and open tools in the Apache Big Data ecosystem. Control your costs by creating managed clusters of any size in about a minute, and turning them off when you're done, paying for what you use, not idle clusters. Cloud Dataproc integrates with storage, compute, and monitoring services across Cloud Platform products, giving you a powerful and complete data processing platform.

Benefits

  • Cloud-native Hadoop & Spark

Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way. Operations that used to take hours or days take seconds or minutes instead, and you pay only for the resources you use (with per-second billing). Cloud Dataproc also easily integrates with other Google Cloud Platform (GCP) services, giving you a powerful and complete platform for data processing, analytics and machine learning. 

  • Fast & Scalable Data Processing

Create Cloud Dataproc clusters quickly and resize them at any time—from three to hundreds of nodes—so you don't have to worry about your data pipelines outgrowing your clusters. With each cluster action taking less than 90 seconds on average, you have more time to focus on insights, with less time lost to infrastructure. 

  • Affordable Pricing

Adopting Google Cloud Platform pricing principles, Cloud Dataproc has a low cost and an easy to understand price structure, based on actual use, measured by the second. Also, Cloud Dataproc clusters can include lower-cost preemptible instances, giving you powerful clusters at an even lower total cost. 

  • Open Source Ecosystem

The Spark and Hadoop ecosystem provides tools, libraries, and documentation that you can leverage with Cloud Dataproc. By offering frequently updated and native versions of Spark, Hadoop, Pig, and Hive, you can get started without needing to learn new tools or APIs, and you can move existing projects or ETL pipelines without redevelopment.

Features

  • Automated Cluster Management - Managed deployment, logging, and monitoring let you focus on your data, not on your cluster. Your clusters will be stable, scalable, and speedy.
  • Resizable Clusters - Clusters can be created and scaled quickly with a variety of virtual machine types, disk sizes, number of nodes, and networking options.
  • Integrated -Built-in integration with Cloud Storage, BigQuery, Bigtable, Stackdriver Logging, and Stackdriver Monitoring, giving you a complete and robust data platform.
  • Versioning - Image versioning allows you to switch between different versions of Apache Spark, Apache Hadoop, and other tools.
  • Highly available - Run clusters with multiple master nodes and set jobs to restart on failure to ensure your clusters and jobs are highly available.
  • Developer Tools - Multiple ways to manage a cluster, including an easy-to-use Web UI, the Google Cloud SDK, RESTful APIs, and SSH access.
  • Initialization Actions - Run initialization actions to install or customize the settings and libraries you need when your cluster is created.
  • Automatic or Manual Configuration - Cloud Dataproc automatically configures hardware and software on clusters for you while also allowing for manual control.
  • Flexible Virtual Machines - Clusters can use custom machine types and preemptible virtual machines so they are the perfect size for your needs.

Powerful Data Exploration.

Google Cloud Datalab is an interactive notebook (based on Jupyter) to explore, collaborate, analyze and visualize data. It is integrated with BigQuery and Google Cloud Machine Learning to give you easy access to key data processing services.

Benefits

  • Open Source Ecosystem

The Spark and Hadoop ecosystem provides tools, libraries, and documentation that you can leverage with Cloud Dataproc. By offering frequently updated and native versions of Spark, Hadoop, Pig, and Hive, you can get started without needing to learn new tools or APIs, and you can move existing projects or ETL pipelines without redevelopment. 

  • Powerful Data Exploration

Cloud Datalab is a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform. It runs on Google Compute Engine and connects to multiple cloud services easily so you can focus on your data science tasks. 

  • Integrated & Open Source

Cloud Datalab is built on Jupyter (formerly IPython), which boasts a thriving ecosystem of modules and a robust knowledge base. Cloud Datalab enables analysis of your data on Google BigQueryCloud Machine Learning EngineGoogle Compute Engine, and Google Cloud Storage using Python, SQL, and JavaScript (for BigQuery user-defined functions). 

  • Scalable

Whether you're analyzing megabytes or terabytes, Cloud Datalab has you covered. Query terabytes of data in BigQuery, run local analysis on sampled data and run training jobs on terabytes of data in Cloud Machine Learning Engine seamlessly. 

  • Data Management & Visualization

Use Cloud Datalab to gain insight from your data. Interactively explore, transform, analyze, and visualize your data using BigQuery, Cloud Storage and Python. 

  • Machine Learning with Lifecycle Support

Go from data to deployed machine-learning (ML) models ready for prediction. Explore data, build, evaluate and optimize Machine Learning models using TensorFlow or Cloud Machine Learning Engine.

Features

  • Integrated - Cloud Datalab simplifies data processing with Cloud BigQuery, Cloud Machine Learning Engine, Cloud Storage, and Stackdriver Monitoring. Authentication, cloud computation and source control are taken care of out-of-the-box.
  • Multi-Language Support - Cloud Datalab currently supports Python, SQL, and JavaScript (for BigQuery user-defined functions).
  • Notebook Format - Cloud Datalab combines code, documentation, results, and visualizations together in an intuitive notebook format.
  • Pay-per-use Pricing - Only pay for the cloud resources you use: Google Compute Engine VMs, BigQuery, and any additional resources you decide to use, such as Cloud Storage.
  • Interactive Data Visualization - Use Google Charting or matplotlib for easy visualizations.
  • Machine Learning - Supports TensorFlow-based deep ML models in addition to scikit-learn. Scales training and prediction via specialized libraries for Cloud Machine Learning Engine.
  • IPython Support - Datalab is based on Jupyter (formerly IPython) so you can use a large number of existing packages for statistics, machine learning etc. Learn from published notebooks and swap tips with a vibrant IPython community.
  • Open Source - Developers wishing to extend Datalab can fork and or submit pull requests on the GitHub hosted project.

 

Tell great data stories to support better business decisions.

Make your data easily accessible, readily available, and most importantly, useful to your business with the Google Cloud BI solution — a comprehensive suite of data integration, transformation, analysis, visualization, and reporting tools from Google and our technology partners. The Google Cloud BI solution is centered around Google BigQuery, our fully-managed cloud data warehouse, so your BI can effortlessly scale on demand.

Benefits

  • Beautiful reports start here

Google Data Studio turns your data into informative dashboards and reports that are easy to read, easy to share, and fully customizable. Dashboarding allows you to tell great data stories to support better business decisions. 

  • Put all your data to work

Easily access all the data sources you need to understand your business and make better decisions. 

  • Transform your data

Transform your raw data into the dimensions, metrics, and calculations you need — no code or queries required. 

  • Build engaging visualizations

Data Studio gives you the ability to create beautiful charts and graphs that bring your data to life. 

  • Leverage teamwork that works

Harness the collective wisdom of your team. Share and collaborate in real time. Work together quickly, from anywhere.

Features

  • Data Connections

Connect data to reports from databases like Google BigQuery, Google Cloud SQL, and MySQL. Additionally connect data from Google Sheets, Google Analytics and Analytics 360, AdWords, DoubleClick, and YouTube channels. 

  • Data Transformation

Create dimensions, metrics, and calculations to clean and transform your data without having to update your raw data. Functions allow dozens of mathematical, string, date, and other functions to transform your data into more useful values and metrics. 

  • Data Visualization

Choose from a broad array of charts, graphs, and visualizations available to bring your data to life. These include time series, bar charts, pie charts, tables, heat maps, geo maps, scorecards, scatter charts, bullet charts, and area charts. Each visualization has built-in comparison functions making it easy to see the changes in the data period over period. 

  • Report Customization

Data Studio allows you to customize every aspect of your reports and dashboards to make them your own. Add logos and icons, change the background, fill, line, and text colors, and choose from an array of fonts, line styles, and object properties to make your data come to life. Insert dynamic controls to allow viewers to interact and explore data in realtime. 

  • Sharing & Collaboration

Data Studio is built with the same technology that underlies popular GSuite products like Docs, Sheets, and Slides which means you get to decide who gets access to your reports. Grant individuals or groups, inside and outside your company, the right to edit or view with just a few clicks. Real-time collaboration allows multiple teammates to edit a single report at the same time. 

  • Report Templates

With a library of report templates to choose from, you can be up and running in minutes. Simply connect your data sources and customize the design and style to match your needs. 

  • User Administration

You decide who gets access to Data Studio. Leveraging Google Drive technology, you can easily manage all of your users and their level of access — grant individuals or groups the right to create, edit, or view.

Intelligent Data Preparation.

Google Cloud Dataprep is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Cloud Dataprep is serverless and works at any scale. There is no infrastructure to deploy or manage. Easy data preparation with clicks and no code.

Benefits

      • Intelligent Data Preparation

Google Cloud Dataprep is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Cloud Dataprep is serverless and works at any scale. There is no infrastructure to deploy or manage. Easy data preparation with clicks and no code. 

      • Visual Interactivity, Ease of Use

Understand data instantly with visual data distributions. With each gesture in the UI Dataprep suggests and predicts your next ideal data transformation so you don’t have to write code. 

      • Fast Data Preparation

Cloud Dataprep automatically detects schemas, datatypes, possible joins and anomalies such as missing values, outliers, and duplicates so you get to skip the time consuming work of profiling your data and go right to the data analysis. 

      • Fully Managed and Powerful

Cloud Dataprep is an integrated partner service that is operated by another company, Trifacta. Google works closely with Trifacta to provide a seamless user experience that removes the need for upfront software installation, separate licensing costs, or ongoing operational overhead. The service scales on demand to meet your growing data preparation needs so that you can stay focused on analysis.

Features

      • Instant Data Exploration

Visually explore and interact with data in seconds. Instantly understand data distribution and patterns. You don't need to write code. You can prepare data with a few clicks. 

      • Intelligent Data Cleansing

Cloud Dataprep automatically identifies data anomalies and helps you to take corrective actions fast. Get data transformation suggestions based on your usage pattern. Standardize, structure, and join datasets easily with a guided approach. 

      • Serverless

Cloud Dataprep is a serverless service, so you do not need to create or manage infrastructure. This helps you to keep your focus on the data preparation and analysis. 

      • Seriously Powerful

Cloud Dataprep is built on top of the powerful Google Cloud Dataflow service. Cloud Dataprep is auto-scalable and can easily handle processing massive data sets. 

      • Supports Common Data Sources of Any Size

Process diverse datasets - structured and unstructured. Transform data stored in CSV, JSON, or relational table formats. Prepare datasets of any size, megabytes to terabytes, with equal ease. 

      • Integrated with Google Cloud Platform

Easily process data stored in Google Cloud Storage, Google BigQuery or from your desktop. Export clean data directly into BigQuery for further analysis. Seamlessly manage user access and data security with Google Cloud Identity and Access Management.

Scalable Event Ingestion and Messaging Middleware.

Google Cloud Pub/Sub is a serverless, large scale, reliable, real-time messaging service that allows you to send and receive messages between independent applications. You can leverage Cloud Pub/Sub’s flexibility to decouple systems and components hosted on Cloud Platform or elsewhere on the Internet. By building on the same technology Google uses, Cloud Pub/Sub is designed to provide “at least once” delivery at low latency with on-demand scaling to tens of millions of messages per second.

Benefits

  • Deliver event data wherever you need it

    Cloud Pub/Sub is a simple, reliable, scalable foundation for stream analytics and event-driven computing systems. As part of Google Cloud’s stream analytics solution, the service ingests event streams and delivers them to Cloud Dataflow for processing and BigQuery for analysis as a data warehousing solution. Relying on the Cloud Pub/Sub service for delivery of event data frees you to focus on transforming your business and data systems with applications such as:

    ● Real-time personalization in gaming

    ● Fast reporting, targeting and optimization in advertising and media

    ● Processing device data for healthcare, manufacturing, oil and gas, and logistics

    ● Syndicating market-related data streams for financial services

  • Build multi-cloud and hybrid applications on open architecture

Syndicate data across projects and applications running on other clouds, or between cloud and on-premises apps. Cloud Pub/Sub easily fits in your existing environment via efficient client libraries for multiple languages, open REST/HTTP and gRPC service APIs, and an open source Apache Kafka connector.

 

  • Scale responsively and automatically

Scale to hundreds of millions of messages per second and pay only for the resources you use. There are no partitions or local instances to manage, reducing operational overhead. Data is automatically and intelligently distributed across data centers over our unique, high-speed private network.

 

  • Bring reliability and security tools to real-time apps

Use Cloud Pub/Sub to simplify scalable, distributed systems. All published data is synchronously replicated across availability zones to ensure that messages are available to consumers for processing as soon as they are ready. Fine-grained access controls allow for sophisticated cross-team and organizational data sharing. And end-to-end encryption adds security to your pipelines.

Features

      • At-least-once delivery

Synchronous, cross-zone message replication and per-message receipt tracking ensures at-least-once delivery at any scale. 

      • Exactly-once processing

Cloud Dataflow supports reliable, expressive, exactly-once processing of Cloud Pub/Sub streams. 

      • No provisioning, auto-everything

Cloud Pub/Sub does not have shards or partitions. Just set your quota, publish and consume. 

      • Integrated

Take advantage of integrations with multiple services, such as Cloud Storage and GMail update events and Cloud Functions for serverless event-driven computing. 

      • Open

Open APIs and client libraries in seven languages support cross-cloud and hybrid deployments. 

      • Global by default

Publish from anywhere in the world and consume from anywhere, with consistent latency. No replication necessary. 

      • Compliance & security

Cloud Pub/Sub is a HIPAA-compliant service, offering fine-grained access controls and end-to-end encryption.

Workflow Orchestration.

Google Cloud Composer is a fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers. Built on the popular Apache Airflow open source project and operated using the Python programming language, Cloud Composer is free from lock-in and easy to use.

Benefits

  • Making Orchestration Easy
    Cloud Composer pipelines are configured as directed acyclic graphs (DAGs) using Python, making it easy for users of any experience level to author and schedule a workflow. One-click deployment yields instant access to a rich library of connectors and multiple graphical representations of your workflow in action, increasing pipeline reliability by making troubleshooting easy. Automatic synchronization of your directed acyclic graphs ensures your jobs stay on schedule.
  • End-to-End Integration for GCP Workloads 
    Cloud Composer is deeply integrated within the Google Cloud Platform, giving users the ability to orchestrate their full pipeline. Cloud Composer has robust, built-in integration with many products, including Google BigQuery, Cloud Dataflow, Cloud Dataproc, Cloud Datastore, Cloud Storage, Cloud Pub/Sub, and Cloud ML Engine.
  • Create a Hybrid & Multi-Cloud Environment 
    Cloud Composer gives you the ability to connect your pipeline through a single orchestration tool whether your workflow lives on-premises, in multiple clouds, or fully within GCP. The ability to author, schedule, and monitor your workflows in a unified manner means you can break down the silos in your environment and focus less on infrastructure.
  • Open Source at its Core 
    Cloud Composer is built on Apache Airflow, the popular open source orchestration tool. This open source project, which Google is contributing back into, provides freedom from lock in for customers as well as integration with a broad number of platforms, which will only expand as the Airflow community grows.

Features

  • Multi-cloud

Create workflows that connect data, processing, and services across clouds, giving you a unified data environment.

  • Hybrid

Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud.

  • Python Programming Language

Leverage existing Python skills to dynamically author and schedule workflows within Cloud Composer.

  • Fully Managed

Cloud Composer's managed nature allows you to focus on authoring, scheduling, and monitoring your workflows as opposed to provisioning resources.

  • Open Source

Cloud Composer is built upon Apache Airflow, giving users freedom from lock-in and portability.

  • Integrated

Built-in integration with BigQuery, Dataflow, Dataproc, DatastoreCloud StoragePub/SubCloud ML Engine, and more, giving you the ability to orchestrate end-to-end GCP workloads.

  • Reliability

Increase reliability of your workflows through easy-to-use charts for monitoring and troubleshooting the root cause of an issue.

RECOMMENDED FOR YOU

Business Guide To Implementing Hyperion For The Finance Function

How To Improve Your Financial Planning, Budgeting And Forecasting Process?

Asian Wireless Broadband Provider Taken Live On Oracle ERP In 6 Weeks

Leading US Infrastructure Company Successfully Migrates To Oracle R12 ERP

More Resources