Serving ML Models with FastAPI | Video 7

TL;DR

Train a model in a notebook, export it (model.pkl), and load it once when the FastAPI app starts.

Briefing Cornell Notes

Briefing

FastAPI is used to turn a trained machine-learning model into a working prediction service, then wrap that service with a simple Streamlit front end so users can submit inputs and receive an insurance-premium category (low/medium/high). The core workflow runs end-to-end: a model is trained in a notebook, exported as a file, loaded by a FastAPI app, exposed via a POST endpoint, and finally called from a UI that sends raw user attributes and displays the JSON response.

The project centers on an “insurance premium prediction” problem. Given user details—age, weight, height, annual income (LPA), smoker status, city, and occupation—the model outputs a premium category: high, medium, or low. The value of this setup is twofold: insurance companies can segment customers into pricing tiers, while individual users can estimate which tier they fall into and adjust lifestyle choices accordingly.

To make the model practical, the notebook performs feature engineering before training. Age is converted into an “age group” (young/ adult/ middle-aged/ senior) rather than used as a raw number. Weight and height are combined into BMI. A “lifestyle risk” feature is derived from BMI and smoker status, producing a risk level (high/medium/low) based on simple thresholds. Cities are bucketed into tiers (tier one, tier two, otherwise tier three) using predefined city lists. After these transformations, the training dataset drops the original raw columns (age, weight, height, BMI inputs, smoker, city) in favor of the engineered features.

For modeling, the pipeline uses scikit-learn: categorical features are one-hot encoded, numerical features pass through unchanged, and a RandomForestClassifier is trained inside a Pipeline. A train/test split is performed, and the resulting accuracy is reported around 90%—with an explicit caveat that the dataset is a toy dataset, so real-world reliability shouldn’t be assumed. The trained model is exported to a model.pkl file.

Serving comes next. A FastAPI app loads model.pkl at startup, then defines a Pydantic model (UserInput) to validate incoming request data: age, weight, height, income, smoker (boolean), city (string), and occupation (restricted to a fixed set of allowed values). FastAPI also computes derived features server-side using Pydantic computed fields: BMI, age group, lifestyle risk, and city tier. This design choice avoids forcing users to calculate BMI or risk tiers themselves; the client sends raw attributes, and the API handles the transformations.

The prediction endpoint is exposed at /predict-premium using the HTTP POST method. The endpoint accepts the validated input, converts it into a single-row pandas DataFrame in the exact feature format expected by the RandomForest model, runs model.predict, and returns the predicted category in a JSON response with HTTP status code 200.

Finally, Streamlit is used to build a lightweight web UI. The app collects the same seven raw inputs, sends them to the FastAPI endpoint via the requests library, and renders the returned premium category. The result is a complete pattern for deploying ML (or deep learning) models: train and export, serve with FastAPI using Pydantic validation + computed features, and consume via a front end that talks to the API.

Cornell Notes

The workflow builds an insurance-premium prediction model and serves it through FastAPI. A RandomForestClassifier is trained on engineered features derived from raw user inputs: BMI (from weight/height), age group, lifestyle risk (from BMI and smoker), and city tier. The trained model is exported to model.pkl, loaded by FastAPI, and exposed via a POST endpoint at /predict-premium. Pydantic validates incoming data and computes the derived features inside the API, so clients send raw attributes instead of pre-calculated features. A Streamlit front end then collects user inputs, calls the API, and displays the JSON prediction (low/medium/high).

Why does the API use HTTP POST for predictions, even though it doesn’t create a new resource?

POST is used because the client is sending data for server-side processing. In this setup, the client submits raw user attributes (age, weight, height, income, smoker, city, occupation), the server validates and transforms them into the model’s expected features, runs model.predict, and returns the prediction. The “create” aspect is less important than the fact that the request body carries inputs that must be processed.

What derived features does the API compute, and why compute them server-side?

The API computes BMI, age group, lifestyle risk, and city tier using Pydantic computed fields. BMI comes from weight and height; age group is derived from age ranges; lifestyle risk combines smoker status with BMI thresholds; city tier maps cities into tier one/tier two/otherwise tier three. Computing these inside the API keeps the client simple and prevents users from needing to replicate the same feature-engineering logic.

How does the training pipeline prepare data for the RandomForest model?

Categorical features are one-hot encoded while numerical features pass through unchanged. A scikit-learn Pipeline chains a column transformer (one-hot for categorical, passthrough for numerical) with a RandomForestClassifier. The dataset is split into training and test sets, then the pipeline is fit on the training data.

What does the /predict-premium endpoint actually do from request to response?

It accepts a request body that matches the Pydantic UserInput schema, validates types and constraints, computes derived features, and converts the result into a single-row pandas DataFrame. It then calls model.predict on that DataFrame, extracts the first prediction, and returns it as JSON with status code 200 under a key like “predicted_category”.

How does the Streamlit front end interact with the FastAPI service?

Streamlit renders a form for the seven raw inputs. When the user clicks the predict button, Streamlit builds a dictionary payload and sends it to the FastAPI endpoint using requests. After receiving the JSON response, it displays the predicted premium category. Changing inputs (e.g., increasing age and setting smoker to true) changes the returned category.

Review Questions

What feature-engineering steps are required to convert raw inputs into the model’s expected features, and where are those steps implemented?
How do Pydantic validation constraints (like age range and occupation options) affect the API’s behavior when a client sends invalid data?
Why is a single-row pandas DataFrame created inside the prediction endpoint instead of passing raw JSON directly to the model?

Key Points

1
Train a model in a notebook, export it (model.pkl), and load it once when the FastAPI app starts.
2
Use Pydantic to validate request payloads and restrict fields like occupation to an allowed set.
3
Compute derived ML features (BMI, age group, lifestyle risk, city tier) inside the API using Pydantic computed fields.
4
Expose predictions via a POST endpoint (e.g., /predict-premium) that accepts raw user attributes in the request body.
5
Convert validated/computed inputs into a single-row pandas DataFrame that matches the model’s training feature format.
6
Return predictions as JSON with a clear key and HTTP status code 200.
7
Build a Streamlit UI that collects inputs, calls the FastAPI endpoint with requests, and renders the JSON prediction.

Highlights

The API design deliberately shifts feature engineering to the server: clients send raw attributes, while Pydantic computed fields generate BMI, age group, lifestyle risk, and city tier.

A single /predict-premium POST endpoint turns a trained RandomForest model into a usable prediction service by returning JSON predictions.

Streamlit serves as a practical consumer: it collects inputs, calls the API, and displays low/medium/high premium categories.

The model’s reported ~90% accuracy comes from a toy dataset, so the serving pattern matters more than real-world performance claims.

Topics

FastAPI Model Serving
Pydantic Validation
Feature Engineering
Machine Learning Pipeline
Streamlit Front End

Mentioned

Nitesh

Serving ML Models with FastAPI | Video 7 | CampusX