Engineer Data in Google Cloud: Challenge Lab

Mayank Chourasia
3 min readNov 16, 2020

In this article, we will go through the lab to Engineer Data in Google Cloud.

In this lab, you are expected to import some historical data to a working BigQuery dataset and build a basic model that predicts fares based on information available when a new ride starts. Leadership is interested in building an app and estimating for users how much a ride will cost. The source data will be provided in your project.

The challenge contains 3 required tasks

  1. Clean your training data.
  2. Create a BQML model called taxirides.fare_model.
  3. Perform a batch prediction on new data.

1.Clean your training data

The first step is to Clean your training data using BigQuery.

  • In the GCP Console go to Navigation Menu > BigQuery.
  • Then Click on More > Query settings under the Query Editor.
  • In the Query Setting write Set a destination table for query results under Destination; and taxi_training_data as the Table name.
  • Click on Save.
  • Run the following SQL query.
Select
pickup_datetime,
pickup_longitude AS pickuplon,
pickup_latitude AS pickuplat,
dropoff_longitude AS dropofflon,
dropoff_latitude AS dropofflat,
passenger_count AS passengers,
( tolls_amount + fare_amount ) AS fare_amount
FROM
`taxirides.historical_taxi_rides_raw`
WHERE
trip_distance > 0
AND fare_amount >= 2.5
AND pickup_longitude > -75
AND pickup_longitude < -73
AND dropoff_longitude > -75
AND dropoff_longitude < -73
AND pickup_latitude > 40
AND pickup_latitude < 42
AND dropoff_latitude > 40
AND dropoff_latitude < 42
AND passenger_count > 0
AND RAND() < 0.001

2.Create a BQML model called taxirides.fare_model

In this task, you need to create a model called taxirides.fare_model and Train the model with an RMSE < 10.

  • Compose a new query with the given ST_distance() and ST_GeogPoint() functions in the Query Editor.
  • The SQL query to create the BQML mode
CREATE or REPLACE MODEL
taxirides.fare_model OPTIONS (model_type='linear_reg',
labels=['fare_amount']) AS
WITH
taxitrips AS (
SELECT
*,
ST_Distance(ST_GeogPoint(pickuplon, pickuplat), ST_GeogPoint(dropofflon, dropofflat)) AS euclidean
FROM
`taxirides.taxi_training_data` )
SELECT
*
FROM
taxitrips
  • Click on Run.

3: Perform a batch prediction on new data

In this task, you need to use the BQML model to predict the taxi fares of the data given in the taxirides.report_prediction_data table the data they’ve collected in 2015.

  • In the Query, Setting writes Set a destination table for query results under Destination and 2015_fare_amount_predictions as the Table name.
  • Click on Save.
  • Run the following SQL query.
CREATE OR REPLACE TABLE taxirides.2015_fare_amount_predictions 
AS
SELECT * FROM ML.PREDICT(MODEL taxirides.fare_model,(
SELECT * FROM taxirides.report_prediction_data)
)​

Congratulations! Done with the challenge lab.

Stay tuned till the next blog

If you Want to Connect with Me:

Linkedin: https://www.linkedin.com/in/mayank-chourasia-38421a134/

Twitter: https://twitter.com/ChourasiaMayank.

Thank you stay safe, stay healthy.

Google Developers Google Cloud Qwiklabs .Cloud Opinion Mayank Chourasia

--

--

Mayank Chourasia

Hey, My name is Mayank Chourasia. Currently I am working on SAP Utilities as a SAP ABAP Developer. I had written a blogs on SAP ISU, SAP ABAP, Google Cloud .