During the Google Cloud Next conference in March in San Francisco; United States, Google introduced Cloud Dataprep; a serverless data preparation service that allows even machine learning training models to be used.
In the next six months; the company has made the beta available only to a select number of interested parties; but it has now been announced that the tool is available in public beta for anyone who wants to use it.
Some reports and surveys indicate that analysts and data scientists can spend up to 80 percent of their time preparing raw data for analysis. Google Cloud Dataprep comes to automate this work by detecting data types; schemas, joins, and anomalies that contribute significantly to the work of professionals and companies who handle a large amount of data. The tool natively integrates with other services such as Cloud Storage, BigQuery, and Google Cloud Platform (GCP).
Machine learning allows this process to be improved as the service is more widely used; suggesting different ways of cleaning the data, which makes the process faster and also less susceptible to errors.
Another highlight of Google Cloud Dataprep is its layout; which makes it easier for those who are not data engineers to change or add a set of data and information.
“The Cloud Dataprep also has built-in intelligence to automatically understand and operationalize its specific usage patterns; making data preparation even faster and less prone to user error,” said Eric Anderson, a product manager at Google. “The overall result is more productive, efficient and powerful,” he added. Cloud Dataprep is an embedded version of Trifacta’s Wrangler business application, which has the same data preparation task.
Last year, one of Google’s top competitors in the area; Amazon, had already released its own serverless data preparation tool called AWS Glue.