How to Train a Model using Vertex AI & Python SDK

Vertex AI is a fully managed platform provided by Google Cloud for machine learning and data science. It allows you to create, train, deploy, and manage machine learning models using Google Cloud’s robust infrastructure and services.

The Vertex AI SDK for Python is a high-level library that simplifies the process of automating data intake, model training, and making predictions on Vertex AI. Essentially, it lets you perform many tasks that you’d typically do via the Google Cloud terminal using Python code. In this guide, we’ll walk you through how to use the Vertex AI SDK for Python to train a model on Vertex AI.

Here’s what we will cover:

Main Components and Concepts of Vertex AI
Installing and Importing the Vertex AI SDK for Python
Creating a Dataset and Uploading Data to Vertex AI
Defining a Custom Training Job and Running It on Vertex AI
Deploying the Trained Model and Getting Predictions on Vertex AI

Understanding the Main Components of Vertex AI

Before diving into using Vertex AI, it’s important to familiarize yourself with its key components:

Project: A Google Cloud project is a container that holds all your settings and resources. To use Vertex AI, you first need to create a project and enable the Vertex AI API.
Dataset: A dataset is a collection of data used for training or making predictions. Vertex AI supports various types of datasets, such as tabular, image, text, video, and custom datasets. You can import data from different sources like local files, BigQuery, and Google Cloud Storage.
Training Job: This is a process where your dataset is used to train a machine learning model. Vertex AI offers different types of training jobs, including custom training, hyperparameter tuning, and AutoML. You can specify various parameters for your training job, such as machine type, region, scale tier, and budget.
Model: A model is the output of a training job. It represents the patterns and rules learned from your data. You can use a model to make predictions or evaluate new data.
Endpoint: An endpoint is a service that hosts one or more models for making predictions. You can deploy your models to an endpoint and send queries to get predictions. Vertex AI also provides tools to manage and monitor your endpoints.
Prediction: A prediction is the result of applying a model to input data. Vertex AI supports both online and offline predictions. Online predictions are real-time responses to queries sent to an endpoint, while offline predictions involve processing large datasets in batches and storing the results.

Step 1: Install the Vertex AI SDK for Python

To use the Vertex AI SDK for Python, you need to install the Google Cloud platform package, which includes the Vertex AI Python client library. This client library gives you more control over Vertex AI API calls. You can use both the Vertex AI SDK and the client library together if needed.

To install the Google Cloud platform package in your virtual environment, run the following command:

pip install google-cloud-aiplatform

(Note: This step is optional if you’re using the Vertex AI Workbench notebook.)

# Setup your dependencies
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

USER_FLAG = ""
# Google Cloud Notebook requires dependencies to be installed with '--user'
if IS_GOOGLE_CLOUD_NOTEBOOK:
	USER_FLAG = "--user"

Install the latest version of the Vertex AI client library.

Run the following command in your virtual environment to install the Vertex SDK for Python:

! pip install google-cloud-aiplatform
# if package already installed in your system or notebook run below commands
# Upgrade the specified package to the newest available version
# ! pip install {USER_FLAG} --upgrade google-cloud-aiplatform 
# Upgrade the specified package to the newest available version
# ! pip install {USER_FLAG} --upgrade google-cloud-storage

Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
	# Automatically restart kernel after installs
	import IPython

	app = IPython.Application.instance()
	app.kernel.do_shutdown(True)

Step 2: Setting Up the Environment
Before diving into the code, we must set up our GCP project, create a Cloud Storage bucket, and install the necessary Python libraries. you must have the Google Cloud SDK (gcloud) installed and configured with your GCP project.

If you don’t know your project ID, you may be able to get your project ID using gcloud.

import os

PROJECT_ID = ""

# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
	shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null
	PROJECT_ID = shell_output[0]
	print("Project ID: ", PROJECT_ID)

When you run the above command it will show your project ID it will look like this :

Project ID: owillabs-gcp-04-c945c60XXXX

Copy the project ID and set your project ID here as the environment variable

if PROJECT_ID == "" or PROJECT_ID is None:
	PROJECT_ID = "owillabs-gcp-04-c945c60***" # @param {type:"string"}

Now let us Create a timestamp for uniqueness:

# Import necessary libraries
from datetime import datetime

# Use a timestamp to ensure unique resources
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

Create a Cloud Storage bucket:

! gsutil mb -l $REGION $BUCKET_NAME

Replace REGION and BUCKET_NAME as per your project requirement.

Output :

Creating gs://owillabs-gcp-06-c945b****aip-20210826051667/…

Finally, validate access to your Cloud Storage bucket by examining its contents:

! gsutil ls -al $BUCKET_NAME

Step 3: Copying Dataset into Cloud Storage

In this step, we’ll copy the dataset from a source location to our Cloud Storage bucket. Replace [your-bucket-name] with your bucket name and [your-dataset-source] with the source URL of your dataset.

IMPORT_FILE = "petfinder-tabular-classification_toy.csv"
! gsutil cp gs://cloud-training/mlongcp/v3.0_MLonGC/pdtrust_toy_datasets/{IMPORT_FILE} {BUCKET_NAME}/data/

gcs_source = f"{BUCKET_NAME}/data/{IMPORT_FILE}"

Step 4: Importing the Vertex SDK for Python

We need to import the Vertex SDK and initialize it using our project ID and location:

# Import necessary libraries
import os

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

Step 5: Creating a Managed Tabular Dataset

To create a dataset from a CSV file stored in Cloud Storage, use the Vertex SDK:

ds = dataset = aiplatform.TabularDataset.create(
	display_name="petfinder-tabular-dataset",
	gcs_source=gcs_source,
)

ds.resource_name

This will create a dataset from a CSV file stored on your GCS bucket.

Step 6: Launching a Training Job
Now, we are ready to create and train our AutoML tabular model:

# Constructs a AutoML Tabular Training Job
job = aiplatform.AutoMLTabularTrainingJob(
	display_name="train-petfinder-automl-1",
	optimization_prediction_type="classification",
	column_transformations=[
		{"categorical": {"column_name": "Type"}},
		{"numeric": {"column_name": "Age"}},
		{"categorical": {"column_name": "Breed1"}},
		{"categorical": {"column_name": "Color1"}},
		{"categorical": {"column_name": "Color2"}},
		{"categorical": {"column_name": "MaturitySize"}},
		{"categorical": {"column_name": "FurLength"}},
		{"categorical": {"column_name": "Vaccinated"}},
		{"categorical": {"column_name": "Sterilized"}},
		{"categorical": {"column_name": "Health"}},
		{"numeric": {"column_name": "Fee"}},
		{"numeric": {"column_name": "PhotoAmt"}},
	],
)


# Create and train the model object
# This will take around two hour and half to run
model = job.run(
	dataset=ds,
	target_column="Adopted",
	# TODO 2b
	# Define training, validation and test fraction for training
	training_fraction_split=0.8,
	validation_fraction_split=0.1,
	test_fraction_split=0.1,
	model_display_name="adopted-prediction-model",
	disable_early_stopping=False,
)

Output:

opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:16: DeprecationWarning: consider using column_specs instead. column_transformations will be deprecated in the future.
  app.launch_new_instance()
INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/1715908841423503360?project=1075205415941
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING

Keep patience it take more than 2 hours to complete the training

Step 7: Deploying the Model

Before making predictions, we need to deploy the model to an endpoint:

# Deploy the model resource to the serving endpoint resource 
endpoint = model.deploy(
	machine_type="e2-standard-4",
)

Step 8: Making Predictions

With the model deployed, you can now make predictions. Here’s an example of how to send data for prediction. This sample instance is taken from an observation in which Adopted = Yes

Note: Google Cloud-platform: that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your AutoMLTabularTrainingJob inform Vertex AI to transform the inputs to their defined types.

# Make a prediction using the sample values 
prediction = endpoint.predict(
	[
		{
			"Type": "Cat",
			"Age": "3",
			"Breed1": "Tabby",
			"Gender": "Male",
			"Color1": "Black",
			"Color2": "White",
			"MaturitySize": "Small",
			"FurLength": "Short",
			"Vaccinated": "No",
			"Sterilized": "No",
			"Health": "Healthy",
			"Fee": "100",
			"PhotoAmt": "2",
		}
	]
)

print(prediction)

Output:

Prediction(predictions=[{'classes': ['Yes', 'No'], 'scores': [0.527707576751709, 0.4722923934459686]}], deployed_model_id='3521401492231684096', explanations=None)

Conclusion

In this article, we explored how to train a model using Vertex AI and the Python SDK. We covered the essential components of Vertex AI, from creating a dataset to deploying a model and making predictions. Vertex AI provides a robust and scalable platform for machine learning, making it easier to manage the entire ML lifecycle. By leveraging the Vertex AI SDK for Python, you can automate and streamline your workflow, allowing you to focus on building and refining your models.

Whether you’re dealing with large datasets or complex training jobs, Vertex AI simplifies the process with its powerful tools and integrations. As you continue to work with Vertex AI, you’ll find it an invaluable resource for accelerating your machine learning projects, improving productivity, and achieving better results. Start exploring the possibilities of Vertex AI and see how it can transform your data science and machine learning endeavors.

CodeMagnet

CodeMagnet

How to Train a Model using Vertex AI & Python SDK

Understanding the Main Components of Vertex AI

Step 1: Install the Vertex AI SDK for Python

Step 3: Copying Dataset into Cloud Storage

Step 4: Importing the Vertex SDK for Python

Step 5: Creating a Managed Tabular Dataset

Step 7: Deploying the Model

Step 8: Making Predictions

Conclusion

Like this:

Author

Leave a ReplyCancel reply

Hangman Game in Python: Beginner-Friendly Project with Source Code

Python Google Trends Analysis Made Easy with TrendSpy-Lite 0.0.3

Pydantic v3: The New Standard for Data Validation in Python (Why Everything Changed in 2025)

Trending

Hangman Game in Python: Beginner-Friendly Project with Source Code

Python Google Trends Analysis Made Easy with TrendSpy-Lite 0.0.3

Pydantic v3: The New Standard for Data Validation in Python (Why Everything Changed in 2025)

Data Cleaning with Pandas in Python – A Complete Guide

CodeMagnet

Subscribe to CodeMagnet! 🔔

How to Train a Model using Vertex AI & Python SDK

Understanding the Main Components of Vertex AI

Step 1: Install the Vertex AI SDK for Python

Step 3: Copying Dataset into Cloud Storage

Step 4: Importing the Vertex SDK for Python

Step 5: Creating a Managed Tabular Dataset

Step 7: Deploying the Model

Step 8: Making Predictions

Conclusion

Share this:

Like this:

Author

Leave a ReplyCancel reply

Trending