BigQueryCloud DataflowCloud Dataproc Cloud Composer Cloud Datalab Data Studio Cloud Dataprep Cloud Pub/sub
Analytics Data Warehouse.
Google offers a proven, integrated end to end Big Data solution, based on years of innovation, that lets you capture, process, store and analyze your data within a single platform. With Google Cloud Platform you can focus on finding insights rather than managing your infrastructure and you can combine cloud-native services with open source tools as needed, both in batch and stream mode.
Google BigQuery is Google's fully managed, low cost analytics data warehouse. BigQuery is serverless, there is no infrastructure to manage, no need to guess the needed capacity or overprovision, and you don't need a database administrator. You can focus on analyzing data to find meaningful insights, use familiar SQL, and take advantage of Google’s pay-as-you-go model.
Google BigQuery runs blazing-fast SQL queries on gigabytes to petabytes of data and makes it easy to join public or commercial datasets with your data. Eliminate the time-consuming work of provisioning infrastructure and reduce your downtime with a serverless infrastructure that handles all ongoing maintenance, including patches and upgrades
Remove the headache of planning for data warehouse capacity and reach for infinity with elastic capacity scaling that has no limit. Google BigQuery meets the challenges of real-time analytics by leveraging Google’s serverless infrastructure that uses automatic scaling and high-performance streaming ingestion to load data.
Google BigQuery gives you full view of all your data by seamlessly querying data stored in BigQuery’s managed columnar storage, Google Cloud Storage, Google Cloud Bigtable, Google Sheets and Google Drive. BigQuery integrates with existing ETL tools like Informatica and Talend to enrich the data you already use and supports popular BI tools like Tableau, MicroStrategy, Looker, Google DataStudio out of the box, so anyone can easily create stunning reports and dashboards.
Google BigQuery eliminates the data operations burden by providing automatic data replication for disaster recovery and high-availability of processing for no additional charge. BigQuery makes it easy to maintain strong security with fine-grained identity and access management control. BigQuery data is always encrypted, at rest and in transit.
Batch and Stream Data Processing.
Google Cloud Dataflow offers a unified programming model and a managed service for executing a wide range of data processing patterns including streaming analytics, ETL, and batch computation. Cloud Dataflow frees you from operational tasks like capacity planning, resource management and performance optimization.
Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed. And with its serverless approach to resource provisioning and management, you have access to virtually limitless capacity to solve your biggest data processing challenges, while paying only for what you use. Cloud Dataflow unlocks transformational use cases across industries, including:
● Clickstream, Point-of-Sale, and segmentation analysis in retail
● Fraud detection in financial services
● Personalized user experience in gaming
● IoT analytics in manufacturing, healthcare, and logistics
Cloud Dataflow supports fast, simplified pipeline development via expressive Java and Python APIs in the Apache Beam SDK, which provides a rich set of windowing and session analysis primitives as well as an ecosystem of source and sink connectors. Plus, Beam’s unique, unified development model lets you reuse more code across streaming and batch pipelines.
GCP’s serverless approach removes operational overhead with performance, scaling, availability, security and compliance handled automatically so users can focus on programming instead of managing server clusters. Integration with Stackdriver, GCP’s unified logging and monitoring solution, lets you monitor and troubleshoot your pipelines as they are running. Rich visualization, logging, and advanced alerting help you identify and respond to potential issues.
Use Cloud Dataflow as a convenient integration point to bring predictive analytics to fraud detection, real-time personalization and similar use cases by adding TensorFlow-based Cloud Machine Learning models and APIs to your data processing pipelines.
Cloud Dataflow seamlessly integrates with GCP services for streaming events ingestion (Cloud Pub/Sub), data warehousing (BigQuery), machine learning (Cloud Machine Learning), and more. Its Beam-based SDK also lets developers build custom extensions and even choose alternative execution engines, such as Apache Spark via Cloud Dataproc or on-premises. For Apache Kafka users, a Cloud Dataflow connector makes integration with GCP easy.
Managed Hadoop & Spark.
Use Google Cloud Dataproc, a managed Spark and Hadoop service, to easily process big datasets using the powerful and open tools in the Apache Big Data ecosystem. Control your costs by creating managed clusters of any size in about a minute, and turning them off when you're done, paying for what you use, not idle clusters. Cloud Dataproc integrates with storage, compute, and monitoring services across Cloud Platform products, giving you a powerful and complete data processing platform.
Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way. Operations that used to take hours or days take seconds or minutes instead, and you pay only for the resources you use (with per-second billing). Cloud Dataproc also easily integrates with other Google Cloud Platform (GCP) services, giving you a powerful and complete platform for data processing, analytics and machine learning.
Create Cloud Dataproc clusters quickly and resize them at any time—from three to hundreds of nodes—so you don't have to worry about your data pipelines outgrowing your clusters. With each cluster action taking less than 90 seconds on average, you have more time to focus on insights, with less time lost to infrastructure.
Adopting Google Cloud Platform pricing principles, Cloud Dataproc has a low cost and an easy to understand price structure, based on actual use, measured by the second. Also, Cloud Dataproc clusters can include lower-cost preemptible instances, giving you powerful clusters at an even lower total cost.
The Spark and Hadoop ecosystem provides tools, libraries, and documentation that you can leverage with Cloud Dataproc. By offering frequently updated and native versions of Spark, Hadoop, Pig, and Hive, you can get started without needing to learn new tools or APIs, and you can move existing projects or ETL pipelines without redevelopment.
Powerful Data Exploration.
Google Cloud Datalab is an interactive notebook (based on Jupyter) to explore, collaborate, analyze and visualize data. It is integrated with BigQuery and Google Cloud Machine Learning to give you easy access to key data processing services.
The Spark and Hadoop ecosystem provides tools, libraries, and documentation that you can leverage with Cloud Dataproc. By offering frequently updated and native versions of Spark, Hadoop, Pig, and Hive, you can get started without needing to learn new tools or APIs, and you can move existing projects or ETL pipelines without redevelopment.
Cloud Datalab is a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform. It runs on Google Compute Engine and connects to multiple cloud services easily so you can focus on your data science tasks.
Cloud Datalab is built on Jupyter (formerly IPython), which boasts a thriving ecosystem of modules and a robust knowledge base. Cloud Datalab enables analysis of your data on Google BigQuery, Cloud Machine Learning Engine, Google Compute Engine, and Google Cloud Storage using Python, SQL, and JavaScript (for BigQuery user-defined functions).
Whether you're analyzing megabytes or terabytes, Cloud Datalab has you covered. Query terabytes of data in BigQuery, run local analysis on sampled data and run training jobs on terabytes of data in Cloud Machine Learning Engine seamlessly.
Use Cloud Datalab to gain insight from your data. Interactively explore, transform, analyze, and visualize your data using BigQuery, Cloud Storage and Python.
Go from data to deployed machine-learning (ML) models ready for prediction. Explore data, build, evaluate and optimize Machine Learning models using TensorFlow or Cloud Machine Learning Engine.
Tell great data stories to support better business decisions.
Make your data easily accessible, readily available, and most importantly, useful to your business with the Google Cloud BI solution — a comprehensive suite of data integration, transformation, analysis, visualization, and reporting tools from Google and our technology partners. The Google Cloud BI solution is centered around Google BigQuery, our fully-managed cloud data warehouse, so your BI can effortlessly scale on demand.
Google Data Studio turns your data into informative dashboards and reports that are easy to read, easy to share, and fully customizable. Dashboarding allows you to tell great data stories to support better business decisions.
Easily access all the data sources you need to understand your business and make better decisions.
Transform your raw data into the dimensions, metrics, and calculations you need — no code or queries required.
Data Studio gives you the ability to create beautiful charts and graphs that bring your data to life.
Harness the collective wisdom of your team. Share and collaborate in real time. Work together quickly, from anywhere.
Connect data to reports from databases like Google BigQuery, Google Cloud SQL, and MySQL. Additionally connect data from Google Sheets, Google Analytics and Analytics 360, AdWords, DoubleClick, and YouTube channels.
Create dimensions, metrics, and calculations to clean and transform your data without having to update your raw data. Functions allow dozens of mathematical, string, date, and other functions to transform your data into more useful values and metrics.
Choose from a broad array of charts, graphs, and visualizations available to bring your data to life. These include time series, bar charts, pie charts, tables, heat maps, geo maps, scorecards, scatter charts, bullet charts, and area charts. Each visualization has built-in comparison functions making it easy to see the changes in the data period over period.
Data Studio allows you to customize every aspect of your reports and dashboards to make them your own. Add logos and icons, change the background, fill, line, and text colors, and choose from an array of fonts, line styles, and object properties to make your data come to life. Insert dynamic controls to allow viewers to interact and explore data in realtime.
Data Studio is built with the same technology that underlies popular GSuite products like Docs, Sheets, and Slides which means you get to decide who gets access to your reports. Grant individuals or groups, inside and outside your company, the right to edit or view with just a few clicks. Real-time collaboration allows multiple teammates to edit a single report at the same time.
With a library of report templates to choose from, you can be up and running in minutes. Simply connect your data sources and customize the design and style to match your needs.
You decide who gets access to Data Studio. Leveraging Google Drive technology, you can easily manage all of your users and their level of access — grant individuals or groups the right to create, edit, or view.
Intelligent Data Preparation.
Google Cloud Dataprep is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Cloud Dataprep is serverless and works at any scale. There is no infrastructure to deploy or manage. Easy data preparation with clicks and no code.
Google Cloud Dataprep is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Cloud Dataprep is serverless and works at any scale. There is no infrastructure to deploy or manage. Easy data preparation with clicks and no code.
Understand data instantly with visual data distributions. With each gesture in the UI Dataprep suggests and predicts your next ideal data transformation so you don’t have to write code.
Cloud Dataprep automatically detects schemas, datatypes, possible joins and anomalies such as missing values, outliers, and duplicates so you get to skip the time consuming work of profiling your data and go right to the data analysis.
Cloud Dataprep is an integrated partner service that is operated by another company, Trifacta. Google works closely with Trifacta to provide a seamless user experience that removes the need for upfront software installation, separate licensing costs, or ongoing operational overhead. The service scales on demand to meet your growing data preparation needs so that you can stay focused on analysis.
Visually explore and interact with data in seconds. Instantly understand data distribution and patterns. You don't need to write code. You can prepare data with a few clicks.
Cloud Dataprep automatically identifies data anomalies and helps you to take corrective actions fast. Get data transformation suggestions based on your usage pattern. Standardize, structure, and join datasets easily with a guided approach.
Cloud Dataprep is a serverless service, so you do not need to create or manage infrastructure. This helps you to keep your focus on the data preparation and analysis.
Cloud Dataprep is built on top of the powerful Google Cloud Dataflow service. Cloud Dataprep is auto-scalable and can easily handle processing massive data sets.
Process diverse datasets - structured and unstructured. Transform data stored in CSV, JSON, or relational table formats. Prepare datasets of any size, megabytes to terabytes, with equal ease.
Easily process data stored in Google Cloud Storage, Google BigQuery or from your desktop. Export clean data directly into BigQuery for further analysis. Seamlessly manage user access and data security with Google Cloud Identity and Access Management.
Scalable Event Ingestion and Messaging Middleware.
Google Cloud Pub/Sub is a serverless, large scale, reliable, real-time messaging service that allows you to send and receive messages between independent applications. You can leverage Cloud Pub/Sub’s flexibility to decouple systems and components hosted on Cloud Platform or elsewhere on the Internet. By building on the same technology Google uses, Cloud Pub/Sub is designed to provide “at least once” delivery at low latency with on-demand scaling to tens of millions of messages per second.
Cloud Pub/Sub is a simple, reliable, scalable foundation for stream analytics and event-driven computing systems. As part of Google Cloud’s stream analytics solution, the service ingests event streams and delivers them to Cloud Dataflow for processing and BigQuery for analysis as a data warehousing solution. Relying on the Cloud Pub/Sub service for delivery of event data frees you to focus on transforming your business and data systems with applications such as:
● Real-time personalization in gaming
● Fast reporting, targeting and optimization in advertising and media
● Processing device data for healthcare, manufacturing, oil and gas, and logistics
● Syndicating market-related data streams for financial services
Syndicate data across projects and applications running on other clouds, or between cloud and on-premises apps. Cloud Pub/Sub easily fits in your existing environment via efficient client libraries for multiple languages, open REST/HTTP and gRPC service APIs, and an open source Apache Kafka connector.
Scale to hundreds of millions of messages per second and pay only for the resources you use. There are no partitions or local instances to manage, reducing operational overhead. Data is automatically and intelligently distributed across data centers over our unique, high-speed private network.
Use Cloud Pub/Sub to simplify scalable, distributed systems. All published data is synchronously replicated across availability zones to ensure that messages are available to consumers for processing as soon as they are ready. Fine-grained access controls allow for sophisticated cross-team and organizational data sharing. And end-to-end encryption adds security to your pipelines.
Synchronous, cross-zone message replication and per-message receipt tracking ensures at-least-once delivery at any scale.
Cloud Dataflow supports reliable, expressive, exactly-once processing of Cloud Pub/Sub streams.
Cloud Pub/Sub does not have shards or partitions. Just set your quota, publish and consume.
Take advantage of integrations with multiple services, such as Cloud Storage and GMail update events and Cloud Functions for serverless event-driven computing.
Open APIs and client libraries in seven languages support cross-cloud and hybrid deployments.
Publish from anywhere in the world and consume from anywhere, with consistent latency. No replication necessary.
Cloud Pub/Sub is a HIPAA-compliant service, offering fine-grained access controls and end-to-end encryption.
Workflow Orchestration.
Google Cloud Composer is a fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers. Built on the popular Apache Airflow open source project and operated using the Python programming language, Cloud Composer is free from lock-in and easy to use.
Create workflows that connect data, processing, and services across clouds, giving you a unified data environment.
Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud.
Leverage existing Python skills to dynamically author and schedule workflows within Cloud Composer.
Cloud Composer's managed nature allows you to focus on authoring, scheduling, and monitoring your workflows as opposed to provisioning resources.
Cloud Composer is built upon Apache Airflow, giving users freedom from lock-in and portability.
Built-in integration with BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, Cloud ML Engine, and more, giving you the ability to orchestrate end-to-end GCP workloads.
Increase reliability of your workflows through easy-to-use charts for monitoring and troubleshooting the root cause of an issue.
Join thousands of subscribers who get Cloud Tips delivered straight to their inbox.
@2017 - OneGlobe LLC.
Join Our next Webinar 'Decipher Insights from Data through Visualization - 14 December'