KAN Practical Implementation (Kolmogorov–Arnold Networks Algorithm)

TL;DR

Normalize tabular features with StandardScaler and use a stratified train/test split to preserve class balance before training a KAN classifier.

Briefing Cornell Notes

Briefing

Kolmogorov–Arnold Networks (KAN) are put to work on a heart-disease classification task using a practical Python pipeline: load a Kaggle dataset, preprocess it, train a KAN-based classifier, evaluate it, then improve performance through hyperparameter tuning. The core takeaway is that careful tuning of KAN-specific settings can substantially raise test-set accuracy compared with default parameters, turning a baseline model into a stronger predictor for whether heart disease is present.

The workflow starts in a Google Colab notebook and uses the IM model SX library (with scikit-learn utilities for data handling and evaluation). The dataset is read from a CSV file into a pandas DataFrame, with the target label encoded as 0/1 for “without heart disease” vs “heart disease.” Feature distributions are visualized to get a sense of the input space. Before modeling, the code separates features (X) from the target (Y) and applies standard scaling so each feature is normalized to mean 0 and standard deviation 1. The dataset is then split into training and test sets with a 70/30 split, using stratification to preserve the class balance in both partitions.

A baseline KAN classifier is trained with default parameter values and evaluated on the test set. Reported performance includes about 75% accuracy, with class-wise F1 scores for the two labels that are roughly similar (around 70% and 71% in the classification report). Additional diagnostics—Cohen’s Kappa and a confusion matrix—are used to quantify agreement beyond chance and to show where misclassifications occur (true positives/negatives and false positives/negatives).

The project then shifts to hyperparameter tuning, targeting KAN’s key knobs: hidden layer size (number of neurons), regularized activation, regularization entropy, regularization R, and spline order. Multiple runs are executed on the training data, each time predicting the unseen test set and printing accuracy, Cohen’s Kappa, classification reports, and confusion matrices. The tuning search identifies a best-performing configuration, with the highest test accuracy reaching roughly 76% during the search.

That best hyperparameter set is then used to train a final model with additional training hyperparameters such as batch size (64), learning rate (0.07), and weight decay (0.01). This tuned model delivers a major jump in test performance to about 85% accuracy. Cohen’s Kappa rises to about 60.9%, and the F1 scores per class land around the mid-60s. The confusion matrix indicates fewer errors, with 27 and 28 samples misclassified (across the two classes). Finally, a ROC curve is plotted, showing an AUC around 0.92, reflecting strong probability estimates for the positive class.

Overall, the implementation frames KAN as a viable alternative modeling approach for tabular medical data, while demonstrating that performance gains depend heavily on selecting KAN-specific hyperparameters rather than relying on defaults. The next planned step is a comparison against NLP methods on the same dataset to benchmark relative effectiveness.

Cornell Notes

The notebook implements a Kolmogorov–Arnold Networks (KAN) classifier for heart-disease detection using a Kaggle tabular dataset. It normalizes features with StandardScaler, splits data into stratified train/test sets, trains a baseline KAN model, and evaluates accuracy, Cohen’s Kappa, classification reports, and confusion matrices. Default settings yield about 75% test accuracy. Hyperparameter tuning over KAN-specific parameters—hidden layer size, regularized activation, regularization entropy, regularization R, and spline order—improves results, with the best tuned configuration reaching about 76% during the search. Training a final model with the selected KAN hyperparameters plus batch size, learning rate, and weight decay boosts test accuracy to about 85% and ROC AUC to about 0.92, with fewer misclassifications.

What preprocessing steps are applied before training the KAN classifier, and why do they matter for tabular medical data?

The pipeline reads the CSV into a DataFrame, separates features X from the binary target Y (0/1 for absence/presence of heart disease), and applies StandardScaler to normalize each feature to mean 0 and standard deviation 1. It then performs a stratified train/test split with 30% held out for testing so both sets preserve the class proportions. This normalization helps the model learn more stably across features that may otherwise have very different scales.

How is model performance evaluated beyond plain accuracy?

Evaluation includes accuracy on the test set, a classification report with per-class precision/recall/F1, Cohen’s Kappa (agreement beyond chance), and a confusion matrix that breaks down true positives, true negatives, false positives, and false negatives. After tuning, the confusion matrix also shows the number of misclassified samples (reported as 27 and 28 in the final tuned run), making error patterns visible rather than relying only on accuracy.

Which KAN-specific hyperparameters are tuned, and what role do they play?

The tuning targets hidden layer size (neurons in hidden layers), regularized activation (activation-related regularization to reduce overfitting), regularize entropy (a complexity-control regularization term), regularize R (another regularization parameter), and spline order (controls the spline basis complexity). These parameters are iterated over in multiple runs, with each run reporting accuracy, Cohen’s Kappa, classification metrics, and confusion matrices to identify the best-performing configuration.

What training hyperparameters are used in the final tuned model, and how do they relate to generalization?

In the final training stage, the notebook uses batch size = 64, learning rate = 0.07, and weight decay = 0.01. Batch size determines how many samples are processed per update step, learning rate controls the step size during optimization, and weight decay penalizes large weights to reduce overfitting—together supporting better generalization on the test set.

How do the results change from baseline to tuned KAN, and what metrics show the improvement?

Baseline default parameters produce about 75% test accuracy with Cohen’s Kappa around 50% and per-class F1 scores roughly in the low 70s. After tuning and retraining with the selected KAN hyperparameters, test accuracy rises to about 85%, Cohen’s Kappa increases to about 60.9%, per-class F1 scores settle around the mid-60s, and the confusion matrix shows fewer misclassifications. The ROC curve AUC is reported around 0.92, indicating strong discrimination using predicted probabilities.

Review Questions

Which evaluation metrics in the notebook are most sensitive to class imbalance or chance agreement, and how are they computed conceptually (accuracy vs Cohen’s Kappa vs confusion matrix)?
Why might spline order and regularization entropy meaningfully affect a KAN model’s generalization on tabular features?
If the ROC AUC is high but F1 scores are only mid-range, what kinds of prediction errors could still be present according to the confusion matrix?

Key Points

1
Normalize tabular features with StandardScaler and use a stratified train/test split to preserve class balance before training a KAN classifier.
2
Start with a baseline KAN model and evaluate using accuracy, Cohen’s Kappa, classification report, and confusion matrix—not accuracy alone.
3
Tune KAN-specific hyperparameters including hidden layer size, regularized activation, regularization entropy, regularization R, and spline order to improve test performance.
4
Select the best hyperparameter set based on highest test-set accuracy during the tuning runs, then retrain using that configuration.
5
Use training hyperparameters (batch size, learning rate, weight decay) alongside KAN settings to further improve generalization.
6
Track misclassification counts from the confusion matrix to understand where errors concentrate across the two classes.
7
Plot ROC and check AUC to confirm that probability estimates separate the classes effectively, not just that hard labels look correct.

Highlights

Default KAN settings land at roughly 75% test accuracy, with Cohen’s Kappa around 50% and a confusion matrix revealing both types of errors.

Hyperparameter tuning over KAN structure and regularization terms (hidden layer size, regularized activation, entropy, R, spline order) identifies a best configuration around 76% accuracy during the search.

A final tuned model reaches about 85% test accuracy and ROC AUC near 0.92, with misclassifications reduced to 27 and 28 samples in the confusion matrix.

The notebook treats KAN performance as highly sensitive to spline order and regularization choices, making tuning essential rather than optional.

Topics

Heart Disease Classification
Kolmogorov–Arnold Networks
Hyperparameter Tuning
Model Evaluation
ROC Curve

Mentioned

Manisha
KAN
ROC
AUC
F1
CUDA