TimeGPT: Revolutionizing Time Series Forecasting

Satyajit Chaudhuri 19 Mar, 2024 • 13 min read

Introduction

The buzz surrounding the Generative Pre-Trained Transformers (GPTs) has been ever growing since their introduction into the world of AI. GPTs are versatile. They can work with text, images, videos, presentations, and much more. However, Time Series Forecasting has been a zone where GPT’s didn’t make much breakthrough – Until Now!

In this article, we introduce you to one of the most recent developments in Time Series Forecasting domain – TimeGPT .

As the name suggests, it is a Generative Pretrained Model for Time Series Analysis. This article describes TimeGPT, the first foundation model for time series, capable of generating accurate predictions for diverse datasets not seen during training.

This article will firstly walk you through the architecture of the model, the training dataset and why this is different! Then we will take up a Weekly Sales Data from Walmart and then Use various Statistical and Machine Learning based models to generate forecast and compare that with that of TimeGPT forecasts.

TimeGPT: Revolutionizing Time Series Forecasting

Learning Objectives

Understand the significance of Generative Pre-Trained Transformers (GPTs) in AI and their versatility across various data types.
Explore the challenges faced in Time Series Forecasting and the traditional methods used, highlighting the limitations faced by deep learning models.
Learn about the development of TimeGPT as a foundational model for Time Series Analysis, its architecture, training philosophy, and its unique approach to forecasting.
Gain hands-on experience in implementing TimeGPT for time series forecasting, comparing its performance with traditional statistical methods and machine learning algorithms, and evaluating its accuracy using various metrics

This article was published as a part of the Data Science Blogathon.

Need of Generative Solution for Time Series Forecasting
The Advent of TimeGPT
The Architecture of the Model
The Versatile Data Used to Train the Transformer
Hands-on Code Implementation of TimeGPT
Comparing TimeGPT with Other Methods
TimeGPT Use Cases
How Does TimeGPT Handle Missing Data in Time Series Forecasting?
TimeGPT for Anomaly Detection in Time Series Data
Challenges While Using TimeGPT for Multivariate Time Series Data

Need of Generative Solution for Time Series Forecasting

Time series data is an essential component in various sectors, including finance, healthcare, meteorology, and social sciences. Whether it’s tracking ocean tides or monitoring the daily closing value of the Dow Jones, time series data plays an indispensable role in forecasting future values and informing decision-making processes.

Traditionally, analysts have relied on methods like ARIMA, ETS, MSTL, Theta, and CES, as well as machine learning models like XGBoost and LightGBM, to analyze time series data. These tools have proven to be reliable over the years. However, deep learning models, which have seen remarkable success in natural language processing (NLP) and computer vision (CV), face skepticism in the time series analysis field. While models like LSTM, GRU, and FBProphet show promise, challenges such as their explain-ability hinder their adoption.

n time series analysis, standardized large-scale datasets tailored for deep learning methods are lacking, unlike in computer vision. Overcoming these challenges is crucial to fully utilize deep learning in time series analysis, making this Foundational Model the first step.

Also Read: Top 10 Machine Learning Algorithms to Use in 2024

The Advent of TimeGPT

Azul Garza and Max Mergenthaler-Canseco from Nixtla in San Francisco, CA, USA, outline the architecture, training, and evaluation of TimeGPT-1 in their paper. They showcase its remarkable performance across diverse time series datasets. What sets TimeGPT apart is its user-friendly, low-code approach to time series forecasting. Users can simply upload their time series data and generate forecasts for desired time steps with just a single line of code.

The TimeGPT model “reads” time series data much like the way humans read a sentence – from left to right. It looks at windows of past data, which we can think of as “tokens”, and predicts what comes next. This prediction is based on patterns the model identifies in past data and extrapolates into the future. The API provides an interface to TimeGPT, allowing users to leverage its forecasting capabilities to predict future events. TimeGPT also supports other time series-related tasks, including what-if scenarios, anomaly detection, and more.

In comparison to established statistical, machine learning, and deep learning methods, TimeGPT stands out in terms of performance, efficiency, and simplicity through its zero-shot inference capability.

The Architecture of the Model

Self-attention, the revolutionary concept introduced by the paper “Attention is all you need“, is the basis of the this foundational model. The TimeGPT model is not based on any existing large language model(LLMs). It is independently trained on vast timeseries dataset as a large transformer model and is designed so as to minimize the forecasting error.

This model uses the past data window to forecast the future. This model enhances the input with local positional encoding. The model employs an encoder-decoder architecture with multiple layers, incorporating residual connections and layer normalization. The decoder’s output is mapped to the forecasting window dimension through a linear layer. The underlying idea is that attention-based mechanisms effectively capture the diversity of past events, enabling accurate extrapolation of potential future distributions.

The Versatile Data Used to Train the Transformer

The training dataset has been carefully curated in order to develop a robust foundational model. The model is trained on the biggest collection of publicly available time series datasets.

The model handles the Major pitfalls of TS Forecasting cases in its own unique way:

Time Series Characteristic Variations: Finance, economics, demographics, healthcare, weather, IoT sensor data, energy, web traffic, sales, transport, and banking are the domains from which the datasets have been selected. The presence of such vast domains have ensured that the model encounters a wide range of time series characteristics.
Temporal Patterns: The training datasets consists of series with multiple seasonality, cyclical patterns in data, and various types of trends. These effectively familiarizes the model with the temporal patterns across domains.
Noise and Anomalous Patterns: Some datasets contain clean regular patterns while other contain significant noise and unexpected events providing a broad spectrum of scenarios for the model to learn from.

Training TimeGPT on this comprehensive dataset enhances its adaptability and capacity to handle diverse scenarios, thereby improving its resilience and ability to make accurate predictions on previously unseen time series. Consequently, TimeGPT demonstrates robust forecasting capabilities without necessitating individualized model training or optimization efforts.

TimeGPT underwent a multi-day training period on a cluster of NVIDIA A10G GPUs. The extensive hyperparameter tuning experiments showed that larger batch sizes and smaller learning rate proved beneficial. Implemented in PyTorch, TimeGPT was trained using the Adam with a learning rate decay strategy that reduced the rate to 12% of its initial value.

Hands-on Code Implementation of TimeGPT

Now Let’s move on to see a Practical application of these GPT. Here we will train and forecast the same data with multiple models like:

Statistical Algorithm – AutoARIMA
Machine Learning Algorithm – Linear Regression, LGBM Regression, XGB Regression and Random Forest Regression.
Transformer Based – TimeGPT.

In all of theses cases we will use the Statsforecast, MLforecast and Nixtlats modules all of which are developed by Nixtla company.

Install Libraries

#Install statsforecast
!pip install statsforecast

#Install mlforecast. 
#This will also install collected packages: window-ops, utilsforecast, mlforecast
!pip install mlforecast

#Install nixtlats
!pip install nixtlats>=0.1.0

Importing Necessary Libraries

import os
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean
import lightgbm as lgb
import xgboost as xgb
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from nixtlats import TimeGPT

Data Reading and Processing

The data used in this study can be found here.

file_path = "walmart.csv"
df = pd.read_csv(file_path)

#Covert Date into Datetime Format
df['Date'] = pd.to_datetime(df['Date'], format='%d-%m-%Y')

# Set "Date" column as index
df.set_index('Date', inplace=True)

# Resample data into Weekly frequency. Although this is not required. Keeping it so that one can change into 'MS' if needed.
df_resampled = df.resample('W').sum()

df_resampled.reset_index(inplace=True)

df_resampled = df_resampled[["Date","Weekly_Sales"]]

print(df_resampled.head())

Data Visualization

sns.set(style="darkgrid")
plt.figure(figsize=(10, 6))
sns.lineplot(x="Date", y='Weekly_Sales', data=df_resampled, color='green')
plt.title('Monthly Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Weekly Sales')
plt.show()

STL Decomposition of the Data

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

df_resampled.set_index('Date', inplace=True)
result = seasonal_decompose(df_resampled['Weekly_Sales'], model='additive')

fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(10, 12))

result.observed.plot(ax=ax1, color='green')
ax1.set_ylabel('Observed')

result.trend.plot(ax=ax2, color='green')
ax2.set_ylabel('Trend')

result.seasonal.plot(ax=ax3, color='green')
ax3.set_ylabel('Seasonal')

result.resid.plot(ax=ax4, color='green')
ax4.set_ylabel('Residual')

plt.tight_layout()
plt.show()

df_resampled.reset_index(inplace=True)

Now Let’s proceed to model building.

Train & Test Split

train_size = int(len(df_resampled) * 0.8)
train, test = df_resampled.iloc[:train_size], df_resampled.iloc[train_size:]

print(f'Train set size: {len(train)}')
print(f'Test set size: {len(test)}')

Output:

Train set size: 114

Test set size: 29

Forecasting with StatsForecast – AutoARIMA

train_ = pd.DataFrame({'unique_id':[1]*len(train),'ds': train["Date"], "y":train["Weekly_Sales"]})
test_ = pd.DataFrame({'unique_id':[1]*len(test),'ds': test["Date"], "y":test["Weekly_Sales"]})

sf = StatsForecast(models = [AutoARIMA(season_length = 52)],freq = 'W')

sf.fit(train_)
sf_prediction = sf.predict(h=len(test))
sf_prediction.rename(columns={'ds': 'Date'}, inplace=True)

The forecasts are saved in the sf_prediction dataframe and the top 5 rows look like this:

Forecasting with MLForecast – Linear Regression, LGBM Regression, XGB Regression and Random Forest Regression

models = [LinearRegression(),
    lgb.LGBMRegressor(verbosity=-1),
    xgb.XGBRegressor(),
    RandomForestRegressor(random_state=0),
]
@njit
def rolling_mean_7(x):
    return rolling_mean(x, window_size=7)
@njit
def rolling_mean_14(x):
    return rolling_mean(x, window_size=14)
    
#Defining the Model Parameters
fcst = MLForecast(models=models,freq='W',lags=[7,14,28],
    lag_transforms={
        1: [expanding_mean],
        7: [rolling_mean_7, rolling_mean_14],
        14: [rolling_mean_7, rolling_mean_14]
    },
    date_features=['year', 'month', 'day', 'dayofweek', 'quarter', 'week'],
    target_transforms=[Differences([7])])
    
#Fitting the Model and Generating Forecasts
fcst.fit(train_)
ml_prediction = fcst.predict(len(test_))
ml_prediction.rename(columns={'ds': 'Date'}, inplace=True)

The forecasts are saved in the ml_prediction dataframe and the top 5 rows look like this:

TimeGPT Model

#Installing the Library
!pip install nixtlats

#Importing TimeGPT
from nixtlats import TimeGPT

To use the TimeGPT, you need to have the API Token from Nixtla. For privacy concerns I will not be replacing the token with “Your Token”.

os.environ['TIMEGPT_TOKEN'] = "Your Token"
timegpt = TimeGPT(token=os.environ['TIMEGPT_TOKEN'])

Now you can use TimeGPT to plot the data and its absolutely effortless.

timegpt.plot(df_resampled, time_col='Date', target_col='Weekly_Sales')

Now we will use it TimeGPT to generate forecasts. And this part is really interesting. As you might have seen, the libraries used in this text are quite advanced then traditional pmdarima etc. Still there was significant amount of lines needed to fit and predict on the data. Now let’s see how TimeGPT does it.

timegpt_fcst = timegpt.forecast(df=train, h=len(test), time_col='Date', target_col='Weekly_Sales', freq='W')
print(timegpt_fcst.head())

And this is the best part. Just with one line of code you can generate such forecast for a variety of data points. But… How accurate this is?

This is what we are going to test in the next step. We will use Six key metrices namely MAE, RMSE, MAPE, SMAPE, MdAPE, GMRAE. You can read more about this metrices here.

Firstly we create a result dataframe, where the actuals and all the relevant forecasts are concatenated together.

result = test.copy()
result.drop("index", axis=1, inplace=True)
result.set_index("Date", inplace=True)
result["AutoARIMA_fcst"]=sf_prediction["AutoARIMA"].values
result["LinearRegression_fcst"]=ml_prediction["LinearRegression"].values
result["LGBM_fcst"]=ml_prediction["LGBMRegressor"].values
result["XGB_fcst"]=ml_prediction["XGBRegressor"].values
result["RandomForest_fcst"]=ml_prediction["RandomForestRegressor"].values
result["TimeGPT_fcst"]=timegpt_fcst["TimeGPT"].values
print(result.head())

Then we go for Accuracy Assessment as follows:

def calculate_error_metrics(actual_values, predicted_values):
    actual_values = np.array(actual_values)
    predicted_values = np.array(predicted_values)

    metrics_dict = {
        'MAE': np.mean(np.abs(actual_values - predicted_values)),
        'RMSE': np.sqrt(np.mean((actual_values - predicted_values)**2)),
        'MAPE': np.mean(np.abs((actual_values - predicted_values) / actual_values)) * 100,
        'SMAPE': 100 * np.mean(2 * np.abs(predicted_values - actual_values) / (np.abs(predicted_values) + np.abs(actual_values))),
        'MdAPE': np.median(np.abs((actual_values - predicted_values) / actual_values)) * 100,
        'GMRAE': np.exp(np.mean(np.log(np.abs(actual_values - predicted_values) / actual_values)))
    }

    result_df = pd.DataFrame(list(metrics_dict.items()), columns=['Metric', 'Value'])
    return result_df

# Extract 'Weekly_Sales' as actuals
actuals = result['Weekly_Sales']


error_metrics_dict = {}


for col in result.columns[1:]:  # Exclude 'Weekly_Sales'
    predicted_values = result[col]
    error_metrics_dict[col] = calculate_error_metrics(actuals, predicted_values)['Value'].values  # Extracting 'Value' column


error_metrics_df = pd.DataFrame(error_metrics_dict)
error_metrics_df.insert(0, 'Metric', calculate_error_metrics(actuals, actuals)['Metric'].values)  # Adding 'Metric' column


print(error_metrics_df)

Let’s Visualize the Difference between Actuals and Predicted for each of the methods.

# Set seaborn style to darkgrid
sns.set(style="darkgrid")

# Create subplots for each prediction column
num_cols = len(result.columns[1:])
fig, axes = plt.subplots(nrows=num_cols, ncols=1, figsize=(10, 6 * num_cols))

# Loop through each prediction column and plot on separate subplot
for i, col in enumerate(result.columns[1:]):  # Exclude 'Weekly_Sales'
    axes[i].plot(result.index, result['Weekly_Sales'], label='Weekly Sales', color='green')
    axes[i].plot(result.index, result[col], label=col, color='blue')
    axes[i].set_title(f'Weekly Sales vs {col}')
    axes[i].set_xlabel('Date')
    axes[i].set_ylabel('Values')
    axes[i].legend()

# Adjust layout for better spacing
plt.tight_layout()

# Show the plots
plt.show()

Comparing TimeGPT with Other Methods

TimeGPT outperforms established statistical, machine learning, and deep learning methods in terms of performance, efficiency, and simplicity. The authors evaluated TimeGPT against a broad spectrum of baseline, statistical, machine learning, and neural forecasting models and found that TimeGPT’s zero-shot inference excels in performance, efficiency, and simplicity.

TimeGPT Use Cases

Large-scale time series models like TimeGPT have the potential to revolutionize various industries and fields due to their ability to generate accurate predictions for diverse datasets. Some potential applications include:

Finance: TimeGPT can be used for forecasting stock prices, currency exchange rates, and other financial indicators.
Web Traffic Analysis: It can help in predicting website traffic patterns and optimizing server allocation.
Internet of Things (IoT): TimeGPT can be applied to forecast sensor data, network traffic, and device performance.
Weather Forecasting: It has the potential to improve weather prediction models by analyzing historical data and making accurate forecasts.
Demand Forecasting: TimeGPT can assist in predicting demand for products and services, optimizing inventory management and supply chain operations.
Electricity Consumption: It can be utilized to forecast electricity consumption patterns, aiding in energy production and distribution planning.

These applications demonstrate the wide-ranging impact of large-scale time series models like TimeGPT across diverse domains, offering opportunities for improved decision-making and resource allocation.

How Does TimeGPT Handle Missing Data in Time Series Forecasting?

TimeGPT handles missing data in time series forecasting by leveraging its underlying transformer-based architecture and the principles of training on a diverse set of time series data. The implications for its accuracy and performance are significant:

1. Imputation Mechanisms: TimeGPT is designed to handle missing data through its ability to learn complex patterns and relationships within time series. The model can implicitly capture and impute missing values based on the context of the available data, thereby reducing the impact of missing data on forecasting accuracy.
2. Robustness to Missing Data: By training on a diverse set of time series with varying characteristics, including noise, outliers, and missing values, TimeGPT is equipped to handle the challenges posed by missing data. This robustness contributes to the model’s ability to maintain accuracy and performance even in the presence of incomplete time series data.
3. Forecasting Reliability: The model’s adaptability to missing data enhances its reliability in generating accurate forecasts, as it can effectively incorporate available information to make predictions while accounting for the absence of certain data points.
4. Implications for Real-World Applications: In practical scenarios where missing data is common, such as sensor measurements or irregularly sampled time series, TimeGPT’s capability to handle missing data can lead to more reliable and actionable forecasts, thereby improving decision-making processes.

Overall, TimeGPT’s handling of missing data in time series forecasting contributes to its robustness, reliability, and potential for real-world applications, as it enables the model to produce accurate predictions even in the presence of incomplete data.

TimeGPT for Anomaly Detection in Time Series Data

TimeGPT serves as a tool for anomaly detection in time series data, demonstrating promising results in this task. Here’s how TimeGPT can be utilized for anomaly detection and its comparison with other methods:

Unsupervised Anomaly Detection

Utilize TimeGPT for unsupervised anomaly detection by training the model on a large set of normal time series data. Then, use it to identify deviations from learned patterns, proving effective across domains like finance, healthcare, and IoT.

Semi-Supervised Anomaly Detection

Additionally, TimeGPT facilitates semi-supervised anomaly detection by training the model on a combination of normal and anomalous time series data. This method effectively detects rare and complex anomalies not present in normal data.

Comparison to Other Methods

TimeGPT demonstrates superior performance compared to traditional statistical methods like ARIMA and exponential smoothing in anomaly detection for time series data. It also competes effectively with other deep learning-based approaches like LSTM and Autoencoder. Moreover, it offers the advantage of being a pre-trained foundation model, which can be fine-tuned for specific anomaly detection tasks.

Implications for Real-World Applications

Anomaly detection is a critical task in various domains, including cybersecurity, fraud detection, and predictive maintenance. TimeGPT’s ability to detect anomalies in time series data with high accuracy and efficiency can lead to improved decision-making processes and reduced costs associated with false positives and negatives.

Overall, TimeGPT has shown to be a promising method for anomaly detection in time series data, offering advantages over traditional statistical methods and being competitive with other deep learning-based methods. Its potential for real-world applications in various domains makes it an exciting area of research for future developments.

Challenges While Using TimeGPT for Multivariate Time Series Data

Adapting TimeGPT to handle multivariate time series data involves addressing several challenges while leveraging the model’s strengths. Here are some considerations for adapting TimeGPT to multivariate time series data and the associated challenges:

Input Representation

Multivariate time series data consists of multiple variables or features evolving over time. Adapting TimeGPT to handle this data requires an appropriate input representation that captures the interdependencies and temporal dynamics among the variables. This may involve encoding the multivariate time series data in a format suitable for the model’s transformer-based architecture.

Temporal Relationships

Capturing the temporal relationships between different variables in a multivariate time series is crucial for accurate forecasting and anomaly detection. TimeGPT needs to effectively learn and model the complex interactions and dependencies among the variables over time, which may require modifications to the model’s attention mechanisms and input embeddings.

Dimensionality and Scale

Multivariate time series data often introduces higher dimensionality and scale compared to univariate data. Adapting TimeGPT to handle the increased dimensionality while maintaining computational efficiency is a significant challenge. Efficiently processing and learning from high-dimensional multivariate time series data without overwhelming the model’s capacity is a key consideration.

Training and Fine-Tuning

Training TimeGPT on multivariate time series data involves considerations for model convergence, regularization, and fine-tuning to effectively capture the relationships between the variables. Balancing the learning of inter-variable dependencies with the model’s ability to generalize across diverse multivariate time series is a non-trivial task.

Evaluation and Interpretability

Assessing the performance of TimeGPT on multivariate time series data and interpreting its predictions pose challenges related to model evaluation, uncertainty quantification, and explainability. Ensuring that the model’s forecasts and anomaly detection capabilities align with the complex dynamics of multivariate time series data is essential.

Addressing these challenges while adapting TimeGPT to handle multivariate time series data holds the potential to enhance the model’s applicability across a wide range of domains, including finance, healthcare, and industrial processes. By effectively capturing the interdependencies and temporal dynamics within multivariate time series, TimeGPT can offer valuable insights and accurate predictions for complex real-world scenarios.

Conclusion

The article introduces TimeGPT, a foundational model for Time Series Forecasting, discussing its architecture, training philosophy, and comparing its forecasts with traditional models.

TimeGPT exhibits excellent performance with MAPE as the main accuracy metric (5.98). It performs nearly equal to other models across other accuracy metrics, making it suitable for various business use cases.

Thus, TimeGPT offers a highly efficient solution to the Time Series Problem, being the first GPT-based model trained and used for time series data problems. As experimentally proven, it performs well in many Supply Chain Business Cases, enhancing forecasting speed and accuracy.

Key Takeaways from TimeGPT

TimeGPT is the first pre-trained foundation model for time series forecasting that can produce accurate predictions across diverse domains without additional training.
This Model is adaptable to different input sizes and forecasting horizons due to its transformer-based architecture.
TimeGPT is specialized in handling time series data and trained to minimize forecasting error.
It simplifies the forecasting process by reducing pipelines to the inference step, reducing complexity and time investment.
TimeGPT democratizes the advantages of large transformers models, impacting the forecasting field and redefining current practices.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Satyajit Chaudhuri 19 Mar 2024

Deep Learning Regression Time Series Time Series Forecasting