Engineer Data in Google Cloud: Challenge Lab
In this article, we will go through the lab to Engineer Data in Google Cloud.
In this lab, you are expected to import some historical data to a working BigQuery dataset and build a basic model that predicts fares based on information available when a new ride starts. Leadership is interested in building an app and estimating for users how much a ride will cost. The source data will be provided in your project.
The challenge contains 3 required tasks
- Clean your training data.
- Create a BQML model called taxirides.fare_model.
- Perform a batch prediction on new data.
1.Clean your training data
The first step is to Clean your training data using BigQuery.
- In the GCP Console go to Navigation Menu > BigQuery.
- Then Click on More > Query settings under the Query Editor.
- In the Query Setting write Set a destination table for query results under Destination; and
taxi_training_data
as the Table name. - Click on Save.
- Run the following SQL query.
Select
pickup_datetime,
pickup_longitude AS pickuplon,
pickup_latitude AS pickuplat,
dropoff_longitude AS dropofflon,
dropoff_latitude AS dropofflat,
passenger_count AS passengers,
( tolls_amount + fare_amount ) AS fare_amount
FROM
`taxirides.historical_taxi_rides_raw`
WHERE
trip_distance > 0
AND fare_amount >= 2.5
AND pickup_longitude > -75
AND pickup_longitude < -73
AND dropoff_longitude > -75
AND dropoff_longitude < -73
AND pickup_latitude > 40
AND pickup_latitude < 42
AND dropoff_latitude > 40
AND dropoff_latitude < 42
AND passenger_count > 0
AND RAND() < 0.001
2.Create a BQML model called taxirides.fare_model
In this task, you need to create a model called taxirides.fare_model
and Train the model with an RMSE < 10.
- Compose a new query with the given
ST_distance()
andST_GeogPoint()
functions in the Query Editor. - The SQL query to create the BQML mode
CREATE or REPLACE MODEL
taxirides.fare_model OPTIONS (model_type='linear_reg',
labels=['fare_amount']) AS
WITH
taxitrips AS (
SELECT
*,
ST_Distance(ST_GeogPoint(pickuplon, pickuplat), ST_GeogPoint(dropofflon, dropofflat)) AS euclidean
FROM
`taxirides.taxi_training_data` )
SELECT
*
FROM
taxitrips
- Click on Run.
3: Perform a batch prediction on new data
In this task, you need to use the BQML model to predict the taxi fares of the data given in the taxirides.report_prediction_data
table the data they’ve collected in 2015.
- In the Query, Setting writes Set a destination table for query results under Destination and
2015_fare_amount_predictions
as the Table name. - Click on Save.
- Run the following SQL query.
CREATE OR REPLACE TABLE taxirides.2015_fare_amount_predictions
AS
SELECT * FROM ML.PREDICT(MODEL taxirides.fare_model,(
SELECT * FROM taxirides.report_prediction_data)
)
Congratulations! Done with the challenge lab.
Stay tuned till the next blog
If you Want to Connect with Me:
Linkedin: https://www.linkedin.com/in/mayank-chourasia-38421a134/
Twitter: https://twitter.com/ChourasiaMayank.
Thank you stay safe, stay healthy.
Google Developers Google Cloud Qwiklabs .Cloud Opinion Mayank Chourasia