Google announced the general availability of AI Prediction service, a key component of its AI Platform. The service supports hosting the models trained in popular machine learning frameworks including TensorFlow, XGBoost and Scikit-Learn.
The AI Prediction service acts as the last stage of the machine learning pipeline. It hosts trained machine learning models in the cloud to infer target values for new data. Trained models deployed in AI Prediction service are exposed as REST endpoints that can be invoked from any standard client that supports HTTP.
The AI Platform Prediction service is based on the Google Kubernetes Engine (GKE) backend which is designed for improved reliability, flexibility via new hardware options such as Google Compute Engine machine types and NVIDIA GPUs.
Though the service is based on Google Kubernetes Engine, AI Prediction service hides the complexity of provisioning, managing, and scaling the clusters. Data scientists and engineers can focus on business problems instead of managing the infrastructure.
With the general availability, AI Prediction service supports XGBoost and Scikit-Learn models on high-memory and high-cpu machine types. Behind the scenes, the service automatically expands and shrinks the infrastructure depending on the traffic and requests.
The service is tightly integrated with Google Cloud Console and Stackdriver to track and visualize resource metrics. Performance metrics related to GPU, CPU, RAM, and network utilization of a model provide insights into the performance characteristics.
Customers can choose to deploy machine learning models in specific regions through the AI Prediction service. Google introduced new endpoints in three regions (us-central1, europe-west4, and asia-east1) with regional isolation for improved reliability. Models deployed on the regional endpoints stay within the specified region offering data locality and sovereignty to customers.
With added support for VPC controls, customers can define a security perimeter and deploy online prediction models that have access only to resources and services within the perimeter, or within another bridged perimeter. Since the prediction service endpoints are private to the VPC, data remains within the private network without having to traverse the public internet.
Models deployed and exposed through AI Prediction service support online and batch inference. Online prediction is optimized to minimize the latency of serving predictions while batch prediction is optimized to handle a high volume of instances in a job. Unlike online prediction where the results are sent immediately, batch predictions write the inference output to a file stored in a Google Cloud Storage bucket.
Google has been investing heavily in the AI Platform as a Service (PaaS) offering. It consolidated and augmented various services including Cloud ML Engine. With tight integration with GKE and Kubeflow, the service has evolved into an end-to-end platform that supports data preparation, transformation, training, model management, deployment, and inference.