Engineer Data in Google Cloud: Challenge Lab

Mayank Chourasia
3 min readNov 16, 2020

--

In this article, we will go through the lab to Engineer Data in Google Cloud.

In this lab, you are expected to import some historical data to a working BigQuery dataset and build a basic model that predicts fares based on information available when a new ride starts. Leadership is interested in building an app and estimating for users how much a ride will cost. The source data will be provided in your project.

The challenge contains 3 required tasks

  1. Clean your training data.
  2. Create a BQML model called taxirides.fare_model.
  3. Perform a batch prediction on new data.

1.Clean your training data

The first step is to Clean your training data using BigQuery.

  • In the GCP Console go to Navigation Menu > BigQuery.
  • Then Click on More > Query settings under the Query Editor.
  • In the Query Setting write Set a destination table for query results under Destination; and taxi_training_data as the Table name.
  • Click on Save.
  • Run the following SQL query.
Select
pickup_datetime,
pickup_longitude AS pickuplon,
pickup_latitude AS pickuplat,
dropoff_longitude AS dropofflon,
dropoff_latitude AS dropofflat,
passenger_count AS passengers,
( tolls_amount + fare_amount ) AS fare_amount
FROM
`taxirides.historical_taxi_rides_raw`
WHERE
trip_distance > 0
AND fare_amount >= 2.5
AND pickup_longitude > -75
AND pickup_longitude < -73
AND dropoff_longitude > -75
AND dropoff_longitude < -73
AND pickup_latitude > 40
AND pickup_latitude < 42
AND dropoff_latitude > 40
AND dropoff_latitude < 42
AND passenger_count > 0
AND RAND() < 0.001

2.Create a BQML model called taxirides.fare_model

In this task, you need to create a model called taxirides.fare_model and Train the model with an RMSE < 10.

  • Compose a new query with the given ST_distance() and ST_GeogPoint() functions in the Query Editor.
  • The SQL query to create the BQML mode
CREATE or REPLACE MODEL
taxirides.fare_model OPTIONS (model_type='linear_reg',
labels=['fare_amount']) AS
WITH
taxitrips AS (
SELECT
*,
ST_Distance(ST_GeogPoint(pickuplon, pickuplat), ST_GeogPoint(dropofflon, dropofflat)) AS euclidean
FROM
`taxirides.taxi_training_data` )
SELECT
*
FROM
taxitrips
  • Click on Run.

3: Perform a batch prediction on new data

In this task, you need to use the BQML model to predict the taxi fares of the data given in the taxirides.report_prediction_data table the data they’ve collected in 2015.

  • In the Query, Setting writes Set a destination table for query results under Destination and 2015_fare_amount_predictions as the Table name.
  • Click on Save.
  • Run the following SQL query.
CREATE OR REPLACE TABLE taxirides.2015_fare_amount_predictions 
AS
SELECT * FROM ML.PREDICT(MODEL taxirides.fare_model,(
SELECT * FROM taxirides.report_prediction_data)
)​

Congratulations! Done with the challenge lab.

Stay tuned till the next blog

If you Want to Connect with Me:

Linkedin: https://www.linkedin.com/in/mayank-chourasia-38421a134/

Twitter: https://twitter.com/ChourasiaMayank.

Thank you stay safe, stay healthy.

Google Developers Google Cloud Qwiklabs .Cloud Opinion Mayank Chourasia

--

--

Mayank Chourasia
Mayank Chourasia

Written by Mayank Chourasia

Hey, My name is Mayank Chourasia. Currently I am working on SAP Utilities as a SAP ABAP Developer. I had written a blogs on SAP ISU, SAP ABAP, Google Cloud .

No responses yet