Engineer Data in Google Cloud: Challenge Lab

In this article, we will go through the lab to Engineer Data in Google Cloud.

In this lab, you are expected to import some historical data to a working BigQuery dataset and build a basic model that predicts fares based on information available when a new ride starts. Leadership is interested in building an app and estimating for users how much a ride will cost. The source data will be provided in your project.

The challenge contains 3 required tasks

  1. Clean your training data.
  2. Create a BQML model called taxirides.fare_model.
  3. Perform a batch prediction on new data.
Image for post
Image for post

The first step is to Clean your training data using BigQuery.

  • In the GCP Console go to Navigation Menu > BigQuery.
  • Then Click on More > Query settings under the Query Editor.
Image for post
Image for post
  • In the Query Setting write Set a destination table for query results under Destination; and taxi_training_data as the Table name.
  • Click on Save.
Image for post
Image for post
  • Run the following SQL query.
Select
pickup_datetime,
pickup_longitude AS pickuplon,
pickup_latitude AS pickuplat,
dropoff_longitude AS dropofflon,
dropoff_latitude AS dropofflat,
passenger_count AS passengers,
( tolls_amount + fare_amount ) AS fare_amount
FROM
`taxirides.historical_taxi_rides_raw`
WHERE
trip_distance > 0
AND fare_amount >= 2.5
AND pickup_longitude > -75
AND pickup_longitude < -73
AND dropoff_longitude > -75
AND dropoff_longitude < -73
AND pickup_latitude > 40
AND pickup_latitude < 42
AND dropoff_latitude > 40
AND dropoff_latitude < 42
AND passenger_count > 0
AND RAND() < 0.001
Image for post
Image for post

In this task, you need to create a model called taxirides.fare_model and Train the model with an RMSE < 10.

  • Compose a new query with the given ST_distance() and ST_GeogPoint() functions in the Query Editor.
  • The SQL query to create the BQML mode
CREATE or REPLACE MODEL
taxirides.fare_model OPTIONS (model_type='linear_reg',
labels=['fare_amount']) AS
WITH
taxitrips AS (
SELECT
*,
ST_Distance(ST_GeogPoint(pickuplon, pickuplat), ST_GeogPoint(dropofflon, dropofflat)) AS euclidean
FROM
`taxirides.taxi_training_data` )
SELECT
*
FROM
taxitrips
  • Click on Run.
Image for post
Image for post

In this task, you need to use the BQML model to predict the taxi fares of the data given in the taxirides.report_prediction_data table the data they’ve collected in 2015.

  • In the Query, Setting writes Set a destination table for query results under Destination and 2015_fare_amount_predictions as the Table name.
  • Click on Save.
Image for post
Image for post
  • Run the following SQL query.
CREATE OR REPLACE TABLE taxirides.2015_fare_amount_predictions 
AS
SELECT * FROM ML.PREDICT(MODEL taxirides.fare_model,(
SELECT * FROM taxirides.report_prediction_data)
)​
Image for post
Image for post

Congratulations! Done with the challenge lab.

Stay tuned till the next blog

If you Want to Connect with Me:

Linkedin: https://www.linkedin.com/in/mayank-chourasia-38421a134/

Twitter: https://twitter.com/ChourasiaMayank.

Thank you stay safe, stay healthy.

Google Developers Google Cloud Qwiklabs .Cloud Opinion Mayank Chourasia

Written by

I am Google Cloud Certified Associate Cloud Engineer and MS Azure Certifed. I have been working on different fields including Data Science, DevOps and Anthos.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store