Hyperparameter Tuning using Optuna | Bayesian Optimization using Optuna
Based on CampusX's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Optuna’s Bayesian optimization uses results from earlier trials to choose the next hyperparameters, reducing wasted computation compared with grid search and random search.
Briefing
Hyperparameter tuning stops being a brute-force chore when Optuna replaces exhaustive search with Bayesian optimization that learns where accuracy is likely to improve. Instead of trying every combination (grid search) or a small random subset (random search), Optuna builds a probabilistic model of the relationship between hyperparameters and the objective metric, then uses that model to choose the next hyperparameter settings to evaluate—so it reaches strong results with far fewer trials.
The walkthrough starts with a concrete setup: predicting whether students will get placements using a Random Forest classifier trained on features like CGPA and IQ. Two Random Forest hyperparameters become the tuning targets—max_depth (tree depth) and n_estimators (number of trees). The goal is to maximize classification accuracy, but the “best” values aren’t known ahead of time. That uncertainty motivates hyperparameter tuning: define a search space, train the model repeatedly for different hyperparameter combinations, and measure accuracy on validation data.
Grid search is presented as the baseline method: it enumerates all combinations in the search grid and trains a model for each one. The method is straightforward but quickly becomes computationally expensive as the number of hyperparameters and candidate values grows—especially in deep learning where each training run can be costly. Random search reduces compute by sampling only a limited number of combinations, but it can miss high-performing regions because it doesn’t use information from earlier trials.
Optuna’s core advantage is Bayesian optimization. The transcript frames it as learning a hidden mathematical relationship between hyperparameters (max_depth and n_estimators) and accuracy. As trials accumulate, Optuna treats each evaluated setting as a data point on an implicit multi-dimensional accuracy surface, then uses that information to infer promising regions. The next trial is selected intelligently using a sampler (by default, TPE—Tree-structured Parzen Estimator). Crucially, Optuna reuses past trial outcomes to guide future sampling, unlike grid or random search.
A practical code workflow is then described around five key Optuna concepts: Study (the optimization session), Trial (one hyperparameter evaluation run), Trial Parameters (the specific hyperparameter values for that run), Objective Function (the function that trains the model and returns accuracy), and Sampler (the component that decides the next hyperparameters to try). In the example, missing values in the dataset are handled by replacing zeros with NaN and imputing with the mean. The objective function defines the search ranges (n_estimators between 50 and 200; max_depth between 3 and 20), trains a Random Forest using cross-validation, and returns the mean accuracy.
After running 50 trials, Optuna reports best accuracy and best hyperparameters (example values given: n_estimators=115 and max_depth=18). The model is retrained with those parameters and evaluated on the test set, yielding an accuracy around 75—described as a solid starting point that could improve with better preprocessing.
The transcript also highlights Optuna’s flexibility: the sampler can be swapped to run random search or grid search behavior while keeping the same objective-function structure. Visualization tools are emphasized next, including optimization history (trial vs. accuracy), parallel coordinate plots (how hyperparameters relate to accuracy), contour plots (accuracy density over the hyperparameter grid), and importance plots (which hyperparameters matter most). Finally, Optuna’s “define-by-run” capability is showcased: the algorithm choice itself can be treated as a tunable hyperparameter, enabling dynamic search spaces across SVM, Random Forest, XGBoost/gradient boosting, and Logistic Regression. This lets Optuna find not only the best hyperparameters but also the best model family, using conditional logic to switch search spaces during optimization.
Cornell Notes
Optuna improves hyperparameter tuning by using Bayesian optimization: it learns from previous trials to decide which hyperparameters to evaluate next, aiming to maximize an objective metric (here, accuracy). The example tunes a Random Forest using two parameters—max_depth and n_estimators—by defining a search space, training models inside an objective function, and returning cross-validated mean accuracy. Optuna’s workflow is built around a Study (overall optimization), Trials (individual evaluations), Trial Parameters (chosen hyperparameters), an Objective Function (train + score), and a Sampler (e.g., TPE) that selects the next hyperparameters based on past results. After a limited number of trials (e.g., 50), Optuna yields best hyperparameters and a strong test accuracy, and it can further visualize and interpret the optimization process.
Why do grid search and random search become inefficient as the number of hyperparameters grows?
What makes Optuna’s Bayesian optimization different from grid or random search?
What are the five Optuna concepts needed to understand and read the tuning code?
How does the objective function work in the Random Forest example?
How can Optuna switch between Bayesian optimization and random/grid-style search?
What does “define-by-run” enable that typical tuning setups don’t?
Review Questions
- In grid search, how does the total number of model trainings scale when you add another hyperparameter with multiple candidate values?
- In Optuna, what role does the Sampler (e.g., TPE) play in choosing the next Trial’s hyperparameters?
- How does the objective function’s returned metric (accuracy vs. loss) affect whether Optuna should maximize or minimize during Study creation?
Key Points
- 1
Optuna’s Bayesian optimization uses results from earlier trials to choose the next hyperparameters, reducing wasted computation compared with grid search and random search.
- 2
Grid search becomes impractical as hyperparameter candidate counts multiply, while random search can miss high-performing regions because it doesn’t learn from prior outcomes.
- 3
Optuna’s tuning loop is built from a Study (session), Trials (evaluations), an Objective Function (train + return metric), Trial Parameters (sampled hyperparameters), and a Sampler (e.g., TPE) that drives the search.
- 4
In the Random Forest example, the objective function samples n_estimators and max_depth, trains the model, evaluates with cross-validation, and returns mean accuracy for Optuna to optimize.
- 5
Optuna can swap samplers to emulate random search or grid search behavior without changing the overall objective-function structure.
- 6
Optuna’s visualization tools (history, parallel coordinates, contour, importance) help identify when accuracy plateaus and which hyperparameters matter most.
- 7
“Define-by-run” supports dynamic/conditional search spaces, enabling joint selection of the best algorithm family and its best hyperparameters in one optimization run.