Time Series Prediction with LSTMs using TensorFlow 2 and Keras in Python
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Time series forecasting requires sequence-aware modeling because time points influence each other over time, especially through seasonality and recurring cycles.
Briefing
Time series forecasting with LSTMs hinges on treating past observations as a sequence, not as independent data points—and the practical payoff is a working pipeline that predicts future bike-share demand from historical hourly patterns. The core idea is that time series data is typically recorded at regular intervals (often hourly or daily) and exhibits dependencies over time, including stationarity-related behavior (stable mean/variance), trends, and especially seasonality (repeating cycles). For bike-share demand, those cycles show up clearly: monthly totals rise in summer, and hourly demand spikes in the morning commute window and again in the evening.
To model those temporal dependencies, the workflow builds on recurrent neural networks, with Long Short-Term Memory (LSTM) networks singled out as a practical choice for sequence learning. LSTMs are presented as a response to training difficulties common in vanilla recurrent networks—particularly vanishing/exploding gradients—handled through gated memory that can retain relevant history while discarding noise. Training uses backpropagation through time (implemented via an unrolled recurrent structure), enabling the network to learn how earlier time steps influence the next value.
The demonstration uses the Bike Sharing dataset (sourced from Chris to motive) and runs in Python with TensorFlow 2 and Keras inside a Google Colab notebook. After installing dependencies and enabling GPU runtime, the dataset is loaded into a Pandas DataFrame with timestamps parsed and set as the index. Feature engineering then adds time-derived predictors: hour of day, day of week, day of month, and month. The target variable is the bike-share count for each one-hour interval, while additional inputs include weather-related numeric features (e.g., temperature, humidity, wind speed) and categorical/encoded signals such as weather condition codes, holiday flags, and season labels.
Before training, the data is split chronologically: 90% for training and 10% for testing, preserving temporal order (no shuffling). Scaling is handled carefully with RobustScaler from scikit-learn—fitted on the training set only—to improve learning stability. Separate scaling is applied to feature columns (temperatures, humidity, wind speed) and to the target count, which later enables an inverse transform so predictions can be interpreted in real bike-share units.
The sequence preparation step converts the time series into supervised learning examples. A custom create_dataset function slices the data into rolling windows: for each sample, it uses the previous 24 hours of features to predict the next hour’s bike-share count. With sequences shaped as (samples, time_steps, features), a bidirectional LSTM model is built in Keras: a Bidirectional wrapper around an LSTM layer, followed by Dropout for regularization and a Dense output neuron for regression. The model is compiled with the Adam optimizer and mean squared error loss.
Training runs for 30 epochs with a 10% validation split and no shuffling. Validation loss lands around 0.0231, and the learning curves suggest the model reaches a good fit within roughly 10–15 epochs. Predictions on the test set are inverse-scaled back to counts and plotted against true values. The model tracks typical demand levels closely, though it underestimates or misses some extreme peaks—an expected limitation for a relatively simple architecture and feature set. The result is a clear, end-to-end template for LSTM-based time series forecasting that can be extended with richer preprocessing (e.g., better encoding for categorical variables) or more advanced modeling choices.
Cornell Notes
The pipeline treats time series forecasting as sequence learning: past observations must be fed to an LSTM as ordered windows, because time points are not independent. Using the Bike Sharing dataset, the workflow engineers time features (hour, day of week, day of month, month) and uses weather and calendar signals as inputs, while the target is the hourly bike-share count. Data is split chronologically (90% train, 10% test), scaled with RobustScaler fitted only on training data, and converted into supervised samples using rolling windows of 24 hours to predict the next hour. A bidirectional LSTM with dropout is trained in Keras using Adam and mean squared error, achieving a validation loss around 0.0231. Predictions are inverse-transformed back to real counts and compared to true values, with good performance on typical ranges and weaker handling of extremes.
Why does time series forecasting require sequence models instead of treating rows as independent samples?
What stationarity and seasonality mean in practical forecasting terms?
How does the notebook prepare data for an LSTM to predict the next hour?
Why is scaling done with RobustScaler, and why is the target scaled separately?
What does the bidirectional LSTM add compared with a standard LSTM in this setup?
What performance behavior should be expected when the model struggles with extremes?
Review Questions
- In what exact way does the create_dataset function define the relationship between X and y (which time step is predicted)?
- Why must the train/test split preserve chronological order, and what goes wrong if data is shuffled?
- How does inverse scaling of the target count enable an apples-to-apples comparison between predictions and ground truth?
Key Points
- 1
Time series forecasting requires sequence-aware modeling because time points influence each other over time, especially through seasonality and recurring cycles.
- 2
Stationarity (stable mean/variance) and seasonality (repeating patterns) are practical properties that guide feature design and model choice.
- 3
For LSTMs, convert hourly rows into rolling windows: use the previous 24 hours of features to predict the next hour’s bike-share count.
- 4
Scale features and the target using RobustScaler fitted only on the training set to avoid leakage and to stabilize training.
- 5
Use a chronological split (90% train, 10% test) and avoid shuffling during training to respect temporal dependence.
- 6
A bidirectional LSTM with dropout and a single regression output neuron can produce strong baseline forecasts, though extremes may remain challenging.
- 7
Inverse-transform predictions back to real count units so evaluation plots reflect actual bike-share demand rather than scaled values.