Time Series Forecasting for Business Applications
A practical guide to time series forecasting—covering classical methods, modern ML approaches, and production deployment for demand forecasting, capacity planning, and financial projections
Forecasting is one of the highest-ROI applications of data science—directly impacting inventory costs, staffing decisions, and financial planning. This guide covers the spectrum from classical statistical methods to modern ML approaches, with emphasis on practical deployment considerations that determine real-world success.
- Basic understanding of statistics (mean, variance, correlation)
- Familiarity with Python and pandas
- Understanding of machine learning fundamentals

Time Series Forecasting for Business Applications
Why Forecasting Matters
Accurate forecasting drives billions of dollars in business value:
- Inventory optimization: Reducing stockouts while minimizing carrying costs
- Capacity planning: Scaling infrastructure ahead of demand
- Financial planning: Projecting revenue, expenses, and cash flow
- Workforce management: Aligning staffing with expected demand
- Supply chain: Coordinating with suppliers and logistics partners
Yet forecasting is hard. The future is inherently uncertain, patterns change, and rare events (pandemics, economic shocks) invalidate historical relationships.
Time Series Fundamentals
Components of Time Series
Every time series can be decomposed into constituent components:
┌──────────────────────────────────────────────────────────────────────────────┐
│ TIME SERIES DECOMPOSITION │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ Y(t) = Trend(t) + Seasonality(t) + Residual(t) [Additive] │
│ Y(t) = Trend(t) × Seasonality(t) × Residual(t) [Multiplicative] │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TREND │ │
│ │ ──────── │ │
│ │ Long-term direction of the series │ │
│ │ • Growth, decline, or stable │ │
│ │ • Can be linear, exponential, or logistic │ │
│ │ │ │
│ │ Example: E-commerce sales growing 15% YoY │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SEASONALITY │ │
│ │ ──────────── │ │
│ │ Repeating patterns at fixed intervals │ │
│ │ • Daily: Traffic peaks during commute hours │ │
│ │ • Weekly: Restaurant busy on weekends │ │
│ │ • Annual: Retail spikes in Q4 holidays │ │
│ │ │ │
│ │ Can have multiple seasonalities (daily + weekly + yearly) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ RESIDUAL / NOISE │ │
│ │ ──────────────── │ │
│ │ Random variation unexplained by trend or seasonality │ │
│ │ • Should be stationary with zero mean │ │
│ │ • Contains unusual events, measurement error │ │
│ │ │ │
│ │ If residuals show patterns, model is missing something │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
Stationarity
Stationarity is a key concept: a stationary series has constant statistical properties over time.
# time_series_fundamentals.py
"""
Fundamental time series analysis functions.
"""
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.tsa.seasonal import seasonal_decompose
from typing import Tuple, Dict
def check_stationarity(series: pd.Series, alpha: float = 0.05) -> Dict[str, Any]:
"""
Test for stationarity using ADF and KPSS tests.
ADF: Null hypothesis is non-stationary (unit root present)
KPSS: Null hypothesis is stationary
Interpretation:
- ADF rejects, KPSS doesn't reject: Stationary
- ADF doesn't reject, KPSS rejects: Non-stationary
- Both reject: Trend-stationary
- Neither rejects: Inconclusive
"""
# Augmented Dickey-Fuller test
adf_result = adfuller(series.dropna(), autolag='AIC')
adf_stationary = adf_result[1] < alpha
# KPSS test
kpss_result = kpss(series.dropna(), regression='c')
kpss_stationary = kpss_result[1] > alpha
return {
'adf_statistic': adf_result[0],
'adf_pvalue': adf_result[1],
'adf_stationary': adf_stationary,
'kpss_statistic': kpss_result[0],
'kpss_pvalue': kpss_result[1],
'kpss_stationary': kpss_stationary,
'conclusion': _interpret_stationarity(adf_stationary, kpss_stationary)
}
def _interpret_stationarity(adf: bool, kpss: bool) -> str:
if adf and kpss:
return "Stationary"
elif not adf and not kpss:
return "Non-stationary"
elif adf and not kpss:
return "Trend-stationary (difference to remove trend)"
else:
return "Inconclusive"
def decompose_series(
series: pd.Series,
period: int = None,
model: str = 'additive'
) -> Dict[str, pd.Series]:
"""
Decompose time series into trend, seasonal, and residual components.
"""
# Auto-detect period if not specified
if period is None:
period = detect_seasonality_period(series)
decomposition = seasonal_decompose(
series,
model=model,
period=period,
extrapolate_trend='freq'
)
return {
'observed': decomposition.observed,
'trend': decomposition.trend,
'seasonal': decomposition.seasonal,
'residual': decomposition.resid,
'period': period
}
def detect_seasonality_period(series: pd.Series, max_period: int = 365) -> int:
"""
Detect dominant seasonality period using autocorrelation.
"""
from statsmodels.tsa.stattools import acf
# Compute autocorrelation
autocorr = acf(series.dropna(), nlags=min(len(series) // 2, max_period))
# Find peaks in autocorrelation
peaks = []
for i in range(2, len(autocorr) - 1):
if autocorr[i] > autocorr[i-1] and autocorr[i] > autocorr[i+1]:
if autocorr[i] > 0.1: # Significance threshold
peaks.append((i, autocorr[i]))
if not peaks:
return 7 # Default to weekly
# Return period of highest peak
return max(peaks, key=lambda x: x[1])[0]
Forecasting Methods
Method 1: Classical Statistical Models
# classical_models.py
"""
Classical statistical forecasting models: ARIMA, Exponential Smoothing.
"""
import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.statespace.sarimax import SARIMAX
from typing import Tuple, Dict, Optional
from dataclasses import dataclass
@dataclass
class ForecastResult:
"""Result from a forecasting model."""
point_forecast: pd.Series
lower_bound: pd.Series
upper_bound: pd.Series
model_name: str
fit_metrics: Dict[str, float]
class ARIMAForecaster:
"""
ARIMA (AutoRegressive Integrated Moving Average) forecaster.
Good for:
- Short to medium-term forecasting
- Series without strong seasonality
- When interpretability of parameters matters
"""
def __init__(self, order: Tuple[int, int, int] = None):
"""
Initialize ARIMA model.
Args:
order: (p, d, q) where:
p = AR order (number of lagged observations)
d = Differencing order (for stationarity)
q = MA order (number of lagged forecast errors)
"""
self.order = order
self.model = None
self.fitted = None
def fit(self, series: pd.Series) -> 'ARIMAForecaster':
"""Fit ARIMA model to data."""
if self.order is None:
self.order = self._auto_select_order(series)
self.model = ARIMA(series, order=self.order)
self.fitted = self.model.fit()
return self
def forecast(
self,
steps: int,
confidence_level: float = 0.95
) -> ForecastResult:
"""Generate forecast with prediction intervals."""
forecast_obj = self.fitted.get_forecast(steps=steps)
point_forecast = forecast_obj.predicted_mean
conf_int = forecast_obj.conf_int(alpha=1-confidence_level)
return ForecastResult(
point_forecast=point_forecast,
lower_bound=conf_int.iloc[:, 0],
upper_bound=conf_int.iloc[:, 1],
model_name=f"ARIMA{self.order}",
fit_metrics={
'aic': self.fitted.aic,
'bic': self.fitted.bic,
'mape': self._compute_mape()
}
)
def _auto_select_order(self, series: pd.Series) -> Tuple[int, int, int]:
"""Auto-select ARIMA order using AIC."""
import pmdarima as pm
auto_model = pm.auto_arima(
series,
start_p=0, max_p=5,
start_q=0, max_q=5,
d=None, # Auto-select differencing
seasonal=False,
stepwise=True,
suppress_warnings=True,
error_action='ignore'
)
return auto_model.order
class SARIMAXForecaster:
"""
SARIMAX: Seasonal ARIMA with exogenous variables.
Good for:
- Series with seasonality
- Incorporating external predictors (holidays, promotions)
"""
def __init__(
self,
order: Tuple[int, int, int] = (1, 1, 1),
seasonal_order: Tuple[int, int, int, int] = (1, 1, 1, 12)
):
self.order = order
self.seasonal_order = seasonal_order
self.model = None
self.fitted = None
def fit(
self,
series: pd.Series,
exog: pd.DataFrame = None
) -> 'SARIMAXForecaster':
"""Fit SARIMAX model."""
self.model = SARIMAX(
series,
exog=exog,
order=self.order,
seasonal_order=self.seasonal_order,
enforce_stationarity=False,
enforce_invertibility=False
)
self.fitted = self.model.fit(disp=False)
return self
def forecast(
self,
steps: int,
exog_future: pd.DataFrame = None,
confidence_level: float = 0.95
) -> ForecastResult:
"""Generate forecast."""
forecast_obj = self.fitted.get_forecast(
steps=steps,
exog=exog_future
)
point_forecast = forecast_obj.predicted_mean
conf_int = forecast_obj.conf_int(alpha=1-confidence_level)
return ForecastResult(
point_forecast=point_forecast,
lower_bound=conf_int.iloc[:, 0],
upper_bound=conf_int.iloc[:, 1],
model_name=f"SARIMAX{self.order}x{self.seasonal_order}",
fit_metrics={
'aic': self.fitted.aic,
'bic': self.fitted.bic
}
)
class ExponentialSmoothingForecaster:
"""
Holt-Winters Exponential Smoothing.
Good for:
- Series with trend and/or seasonality
- Simple, robust forecasts
- When computational speed matters
"""
def __init__(
self,
trend: str = 'add', # 'add', 'mul', or None
seasonal: str = 'add', # 'add', 'mul', or None
seasonal_periods: int = 12
):
self.trend = trend
self.seasonal = seasonal
self.seasonal_periods = seasonal_periods
self.model = None
self.fitted = None
def fit(self, series: pd.Series) -> 'ExponentialSmoothingForecaster':
"""Fit Holt-Winters model."""
self.model = ExponentialSmoothing(
series,
trend=self.trend,
seasonal=self.seasonal,
seasonal_periods=self.seasonal_periods
)
self.fitted = self.model.fit()
return self
def forecast(
self,
steps: int,
confidence_level: float = 0.95
) -> ForecastResult:
"""Generate forecast with simulation-based intervals."""
point_forecast = self.fitted.forecast(steps)
# Simulation-based prediction intervals
simulations = self.fitted.simulate(
nsimulations=steps,
repetitions=1000,
anchor="end"
)
alpha = 1 - confidence_level
lower = simulations.quantile(alpha/2, axis=1)
upper = simulations.quantile(1-alpha/2, axis=1)
return ForecastResult(
point_forecast=point_forecast,
lower_bound=lower,
upper_bound=upper,
model_name="HoltWinters",
fit_metrics={
'aic': self.fitted.aic,
'sse': self.fitted.sse
}
)
Method 2: Prophet
Prophet is Facebook's library designed for business forecasting.
# prophet_forecaster.py
"""
Prophet forecaster for business time series.
"""
from prophet import Prophet
import pandas as pd
import numpy as np
from typing import List, Dict, Optional
from dataclasses import dataclass
@dataclass
class ProphetConfig:
"""Configuration for Prophet model."""
growth: str = 'linear' # 'linear' or 'logistic'
yearly_seasonality: bool = True
weekly_seasonality: bool = True
daily_seasonality: bool = False
holidays: pd.DataFrame = None
changepoint_prior_scale: float = 0.05 # Flexibility of trend
seasonality_prior_scale: float = 10.0
holidays_prior_scale: float = 10.0
interval_width: float = 0.95
class ProphetForecaster:
"""
Prophet forecaster with business-specific enhancements.
Advantages:
- Handles missing data and outliers robustly
- Easy to add holidays and special events
- Interpretable components (trend, seasonality)
- Automatic changepoint detection
"""
def __init__(self, config: ProphetConfig = None):
self.config = config or ProphetConfig()
self.model = None
self.forecast = None
def fit(
self,
df: pd.DataFrame,
regressors: List[str] = None
) -> 'ProphetForecaster':
"""
Fit Prophet model.
Args:
df: DataFrame with 'ds' (date) and 'y' (value) columns
regressors: List of additional regressor column names
"""
self.model = Prophet(
growth=self.config.growth,
yearly_seasonality=self.config.yearly_seasonality,
weekly_seasonality=self.config.weekly_seasonality,
daily_seasonality=self.config.daily_seasonality,
holidays=self.config.holidays,
changepoint_prior_scale=self.config.changepoint_prior_scale,
seasonality_prior_scale=self.config.seasonality_prior_scale,
holidays_prior_scale=self.config.holidays_prior_scale,
interval_width=self.config.interval_width
)
# Add custom regressors
if regressors:
for regressor in regressors:
self.model.add_regressor(regressor)
self.model.fit(df)
return self
def forecast(
self,
periods: int,
freq: str = 'D',
future_regressors: pd.DataFrame = None
) -> pd.DataFrame:
"""Generate forecast."""
future = self.model.make_future_dataframe(periods=periods, freq=freq)
# Add future regressor values if provided
if future_regressors is not None:
for col in future_regressors.columns:
future[col] = future_regressors[col]
self.forecast = self.model.predict(future)
return self.forecast
def get_components(self) -> Dict[str, pd.Series]:
"""Extract forecast components."""
return {
'trend': self.forecast['trend'],
'weekly': self.forecast.get('weekly', pd.Series()),
'yearly': self.forecast.get('yearly', pd.Series()),
'holidays': self.forecast.get('holidays', pd.Series()),
}
def plot_components(self):
"""Plot decomposed components."""
from prophet.plot import plot_components
return plot_components(self.model, self.forecast)
def create_holiday_dataframe(
country: str = 'US',
years: List[int] = None
) -> pd.DataFrame:
"""
Create holiday dataframe for Prophet.
"""
import holidays
if years is None:
years = list(range(2015, 2030))
country_holidays = holidays.country_holidays(country, years=years)
holiday_df = pd.DataFrame({
'ds': list(country_holidays.keys()),
'holiday': list(country_holidays.values())
})
# Add custom business events
custom_events = [
{'ds': '2024-11-29', 'holiday': 'Black Friday'},
{'ds': '2024-12-02', 'holiday': 'Cyber Monday'},
{'ds': '2024-01-15', 'holiday': 'MLK Day Sales'},
]
custom_df = pd.DataFrame(custom_events)
holiday_df = pd.concat([holiday_df, custom_df], ignore_index=True)
return holiday_df
Method 3: Machine Learning Approaches
# ml_forecasting.py
"""
Machine learning approaches for time series forecasting.
"""
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
import lightgbm as lgb
from typing import List, Dict, Tuple
from dataclasses import dataclass
@dataclass
class MLForecastConfig:
"""Configuration for ML-based forecasting."""
lag_features: List[int] = None # [1, 7, 14, 28]
rolling_windows: List[int] = None # [7, 14, 28]
date_features: bool = True
target_encoding: bool = False
class MLTimeSeriesForecaster:
"""
Machine learning approach to time series forecasting.
Advantages:
- Can incorporate many features (weather, promotions, competitors)
- Handles non-linear relationships
- Scales to large datasets
Disadvantages:
- Doesn't naturally produce prediction intervals
- Requires more feature engineering
- May overfit without careful validation
"""
def __init__(self, config: MLForecastConfig = None):
self.config = config or MLForecastConfig(
lag_features=[1, 7, 14, 28],
rolling_windows=[7, 14, 28]
)
self.model = None
self.feature_names = None
def create_features(
self,
df: pd.DataFrame,
target_col: str = 'y',
date_col: str = 'ds'
) -> pd.DataFrame:
"""Create features for ML forecasting."""
df = df.copy()
df[date_col] = pd.to_datetime(df[date_col])
df = df.sort_values(date_col)
# Lag features
for lag in self.config.lag_features:
df[f'lag_{lag}'] = df[target_col].shift(lag)
# Rolling statistics
for window in self.config.rolling_windows:
df[f'rolling_mean_{window}'] = df[target_col].shift(1).rolling(window).mean()
df[f'rolling_std_{window}'] = df[target_col].shift(1).rolling(window).std()
df[f'rolling_min_{window}'] = df[target_col].shift(1).rolling(window).min()
df[f'rolling_max_{window}'] = df[target_col].shift(1).rolling(window).max()
# Date features
if self.config.date_features:
df['day_of_week'] = df[date_col].dt.dayofweek
df['day_of_month'] = df[date_col].dt.day
df['week_of_year'] = df[date_col].dt.isocalendar().week
df['month'] = df[date_col].dt.month
df['quarter'] = df[date_col].dt.quarter
df['year'] = df[date_col].dt.year
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
df['is_month_start'] = df[date_col].dt.is_month_start.astype(int)
df['is_month_end'] = df[date_col].dt.is_month_end.astype(int)
# Drop rows with NaN from lag features
max_lag = max(self.config.lag_features + self.config.rolling_windows)
df = df.iloc[max_lag:]
return df
def fit(
self,
df: pd.DataFrame,
target_col: str = 'y',
date_col: str = 'ds'
) -> 'MLTimeSeriesForecaster':
"""Fit ML model."""
# Create features
df_features = self.create_features(df, target_col, date_col)
# Separate features and target
feature_cols = [c for c in df_features.columns if c not in [target_col, date_col]]
self.feature_names = feature_cols
X = df_features[feature_cols]
y = df_features[target_col]
# Train LightGBM
self.model = lgb.LGBMRegressor(
n_estimators=500,
learning_rate=0.05,
max_depth=6,
num_leaves=31,
min_child_samples=20,
subsample=0.8,
colsample_bytree=0.8,
random_state=42
)
self.model.fit(
X, y,
eval_set=[(X, y)],
callbacks=[lgb.early_stopping(50, verbose=False)]
)
return self
def forecast(
self,
df: pd.DataFrame,
periods: int,
target_col: str = 'y',
date_col: str = 'ds'
) -> pd.DataFrame:
"""
Generate multi-step forecast.
Uses recursive forecasting: predict one step, use prediction
as input for next step.
"""
df = df.copy()
predictions = []
last_date = df[date_col].max()
for i in range(periods):
# Create features for next prediction
df_features = self.create_features(df, target_col, date_col)
X = df_features[self.feature_names].iloc[[-1]]
# Predict
pred = self.model.predict(X)[0]
predictions.append(pred)
# Add prediction to history for next step
next_date = last_date + pd.Timedelta(days=i+1)
new_row = pd.DataFrame({
date_col: [next_date],
target_col: [pred]
})
df = pd.concat([df, new_row], ignore_index=True)
# Create forecast dataframe
forecast_dates = pd.date_range(
start=last_date + pd.Timedelta(days=1),
periods=periods,
freq='D'
)
return pd.DataFrame({
'ds': forecast_dates,
'yhat': predictions
})
def get_feature_importance(self) -> pd.DataFrame:
"""Get feature importance from trained model."""
importance = pd.DataFrame({
'feature': self.feature_names,
'importance': self.model.feature_importances_
}).sort_values('importance', ascending=False)
return importance
Forecast Evaluation
# forecast_evaluation.py
"""
Comprehensive forecast evaluation metrics and methods.
"""
import numpy as np
import pandas as pd
from typing import Dict, List
from sklearn.metrics import mean_absolute_error, mean_squared_error
def compute_forecast_metrics(
actual: np.ndarray,
predicted: np.ndarray
) -> Dict[str, float]:
"""
Compute comprehensive forecast accuracy metrics.
"""
# Remove any NaN values
mask = ~(np.isnan(actual) | np.isnan(predicted))
actual = actual[mask]
predicted = predicted[mask]
n = len(actual)
# Mean Absolute Error
mae = mean_absolute_error(actual, predicted)
# Root Mean Squared Error
rmse = np.sqrt(mean_squared_error(actual, predicted))
# Mean Absolute Percentage Error (avoiding division by zero)
with np.errstate(divide='ignore', invalid='ignore'):
mape = np.mean(np.abs((actual - predicted) / actual)) * 100
mape = np.nan_to_num(mape, nan=0.0, posinf=0.0, neginf=0.0)
# Symmetric MAPE (handles zeros better)
smape = np.mean(2 * np.abs(predicted - actual) / (np.abs(actual) + np.abs(predicted) + 1e-8)) * 100
# Mean Absolute Scaled Error (relative to naive forecast)
naive_errors = np.abs(np.diff(actual))
if len(naive_errors) > 0 and np.mean(naive_errors) > 0:
mase = mae / np.mean(naive_errors)
else:
mase = np.nan
# Bias (systematic over/under prediction)
bias = np.mean(predicted - actual)
bias_pct = (np.sum(predicted) / np.sum(actual) - 1) * 100
return {
'mae': mae,
'rmse': rmse,
'mape': mape,
'smape': smape,
'mase': mase,
'bias': bias,
'bias_pct': bias_pct,
'n_observations': n
}
def time_series_cross_validation(
series: pd.Series,
model_class,
model_params: Dict,
initial_train_size: int,
horizon: int,
step_size: int = 1
) -> pd.DataFrame:
"""
Time series cross-validation (rolling origin).
Unlike regular CV, respects temporal ordering:
- Train on data up to time t
- Predict t+1 to t+horizon
- Move forward by step_size and repeat
"""
results = []
for i in range(initial_train_size, len(series) - horizon, step_size):
# Training data: all data up to current point
train = series.iloc[:i]
# Test data: next 'horizon' points
test = series.iloc[i:i+horizon]
# Fit model
model = model_class(**model_params)
model.fit(train)
# Forecast
forecast = model.forecast(steps=horizon)
# Compute metrics
metrics = compute_forecast_metrics(
test.values,
forecast.point_forecast.values
)
metrics['fold'] = i
metrics['train_end'] = series.index[i-1]
results.append(metrics)
return pd.DataFrame(results)
def create_naive_forecasts(series: pd.Series, horizon: int) -> Dict[str, np.ndarray]:
"""
Create naive baseline forecasts for comparison.
Always compare your model against these baselines!
"""
last_value = series.iloc[-1]
last_year_values = series.iloc[-365:-365+horizon].values if len(series) > 365 else None
return {
'naive': np.full(horizon, last_value),
'seasonal_naive': last_year_values,
'mean': np.full(horizon, series.mean()),
'drift': series.iloc[-1] + np.arange(1, horizon+1) * (series.iloc[-1] - series.iloc[-2])
}
Production Deployment
# production_forecasting.py
"""
Production forecasting service with monitoring and retraining.
"""
from typing import Dict, List, Optional
from datetime import datetime, timedelta
from dataclasses import dataclass
import pandas as pd
import numpy as np
import mlflow
@dataclass
class ForecastJob:
"""Configuration for a forecasting job."""
job_id: str
series_id: str
horizon_days: int
model_type: str
retrain_frequency_days: int
last_trained: Optional[datetime] = None
last_forecast: Optional[datetime] = None
class ProductionForecastingService:
"""
Production service for automated forecasting.
Features:
- Automatic model retraining
- Forecast accuracy monitoring
- Alert on degradation
- Multiple models per series
"""
def __init__(self, mlflow_uri: str, database_conn):
mlflow.set_tracking_uri(mlflow_uri)
self.db = database_conn
self.jobs: Dict[str, ForecastJob] = {}
def register_job(self, job: ForecastJob) -> None:
"""Register a new forecasting job."""
self.jobs[job.job_id] = job
def run_forecasting_pipeline(self, job_id: str) -> Dict:
"""Execute full forecasting pipeline for a job."""
job = self.jobs[job_id]
# Step 1: Check if retraining is needed
needs_retrain = self._check_retrain_needed(job)
if needs_retrain:
self._retrain_model(job)
# Step 2: Generate forecast
forecast = self._generate_forecast(job)
# Step 3: Store forecast
self._store_forecast(job, forecast)
# Step 4: Update job metadata
job.last_forecast = datetime.utcnow()
self._update_job_metadata(job)
return {
'job_id': job_id,
'retrained': needs_retrain,
'forecast_generated': True,
'horizon': job.horizon_days
}
def _check_retrain_needed(self, job: ForecastJob) -> bool:
"""Check if model needs retraining."""
if job.last_trained is None:
return True
days_since_train = (datetime.utcnow() - job.last_trained).days
if days_since_train >= job.retrain_frequency_days:
return True
# Check forecast accuracy degradation
accuracy = self._get_recent_accuracy(job)
if accuracy and accuracy['mape'] > 20: # Threshold
return True
return False
def _retrain_model(self, job: ForecastJob) -> None:
"""Retrain model with latest data."""
# Get training data
data = self._get_training_data(job.series_id)
with mlflow.start_run(run_name=f"retrain_{job.job_id}"):
# Log parameters
mlflow.log_param("series_id", job.series_id)
mlflow.log_param("model_type", job.model_type)
mlflow.log_param("training_samples", len(data))
# Train model based on type
if job.model_type == 'prophet':
model = self._train_prophet(data)
elif job.model_type == 'arima':
model = self._train_arima(data)
else:
model = self._train_ml(data)
# Evaluate on holdout
metrics = self._evaluate_model(model, data)
mlflow.log_metrics(metrics)
# Log model
mlflow.sklearn.log_model(
model,
artifact_path="model",
registered_model_name=f"forecast_{job.series_id}"
)
job.last_trained = datetime.utcnow()
def _generate_forecast(self, job: ForecastJob) -> pd.DataFrame:
"""Generate forecast using latest model."""
# Load model from registry
model_uri = f"models:/forecast_{job.series_id}/Production"
model = mlflow.sklearn.load_model(model_uri)
# Get recent data for context
data = self._get_training_data(job.series_id, days=365)
# Generate forecast
forecast = model.forecast(steps=job.horizon_days)
return forecast
def monitor_forecast_accuracy(self, job_id: str) -> Dict:
"""Monitor accuracy of past forecasts."""
job = self.jobs[job_id]
# Get actual values
actuals = self._get_actual_values(job.series_id, days=30)
# Get corresponding forecasts
forecasts = self._get_past_forecasts(job.series_id, days=30)
if actuals.empty or forecasts.empty:
return {'status': 'insufficient_data'}
# Compute accuracy
merged = actuals.merge(forecasts, on='date')
metrics = compute_forecast_metrics(
merged['actual'].values,
merged['forecast'].values
)
# Check for degradation
if metrics['mape'] > 15:
self._send_alert(job, metrics)
return metrics
Key Takeaways
-
Start with baselines: Naive forecasts (last value, seasonal naive) often perform surprisingly well. Always compare.
-
Decomposition is fundamental: Understanding trend, seasonality, and residuals guides model selection.
-
Prophet for business: Prophet handles holidays, missing data, and changepoints—common in business data.
-
ML for complexity: When you have many features or non-linear relationships, gradient boosting excels.
-
Quantify uncertainty: Point forecasts alone are insufficient for decision-making. Always provide prediction intervals.
-
Monitor continuously: Forecast accuracy degrades as patterns change. Automate monitoring and retraining.
References
- Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts.
- Taylor, S.J., & Letham, B. (2018). "Forecasting at Scale." The American Statistician.
- Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2022). "M5 Accuracy Competition." International Journal of Forecasting.
- Prophet Documentation: https://facebook.github.io/prophet/
- Statsmodels Time Series: https://www.statsmodels.org/stable/tsa.html
- Kaggle: "Store Sales - Time Series Forecasting" Competition Insights
Key Takeaways
- ✓Simple baselines often outperform complex models—always start with naive forecasts for comparison
- ✓Decomposition into trend, seasonality, and residual components is foundational to understanding time series
- ✓Prophet is excellent for business forecasting with interpretable components and automatic handling of holidays
- ✓Forecast uncertainty quantification is as important as point predictions for decision-making
- ✓Hierarchical forecasting ensures consistency across aggregation levels (store, region, company)


