Time series analysis is a statistical technique for analysing time-ordered data. It is widely used in various fields, including finance, economics, weather forecasting, and more. Understanding the components and methods of time series analysis can significantly enhance the accuracy of predictions and insights derived from data.
Use Cases:
Let’s walk through a practical example using Python, showcasing the essential steps and techniques for analysing and modelling time series data. We will use a dataset of monthly passengers.
Importing Libraries
Explanation: We import necessary libraries. pandas is for data manipulation, numpy for numerical operations, matplotlib for plotting, and warnings to ignore any warnings for cleaner output.
import pandas as pd import numpy as np import matplotlib.pylab as plt %matplotlib inline from matplotlib.pylab import rcParams from datetime import datetime import warnings warnings.filterwarnings('ignore')
Loading the Data
Explanation: We load the dataset and display the first few rows to understand its structure. The data contains monthly passenger numbers.
data = pd.read_csv(r'D:Passengers.csv') data.head()
Month | Passengers |
1949-01 | 112 |
1949-02 | 118 |
1949-03 | 132 |
1949-04 | 129 |
1949-05 | 121 |
Preprocessing the Data
Explanation: We convert the ‘Month’ column to a datetime format and set it as the index for easier time series manipulation.
data['Month'] = pd.to_datetime(data['Month'], infer_datetime_format=True) data = data.set_index(['Month']) data.tail(5)
Month | Passengers |
01-08-1960 | 606 |
01-09-1960 | 508 |
01-10-1960 | 461 |
01-11-1960 | 390 |
01-12-1960 | 432 |
Plotting the Data
Explanation: We plot the time series data to visualize trends, seasonality, and any patterns.
plt.figure(figsize=(20,10)) plt.xlabel("Month") plt.ylabel("Number of Passengers") plt.plot(data) Output:
Rolling Mean and Standard Deviation
Rolling Mean and Standard Deviation: These are statistical measures used to analyze time series data.
Explanation: We calculate the rolling mean and standard deviation with a window of 12 months to smooth the time series and observe trends more clearly. Rolling mean helps to identify the long-term trend, while rolling standard deviation shows the variability over the window.
rolmean = data.rolling(window=12).mean() rolstd = data.rolling(window=12).std() print(rolmean, rolstd) Plotting Rolling Statistics
Explanation: We plot the actual data, rolling mean, and rolling standard deviation to visualize how the rolling statistics smooth out the time series and highlight the trend.
plt.figure(figsize=(20,10)) actual = plt.plot(data, color='red', label='Actual') mean_6 = plt.plot(rolmean, color='green', label='Rolling Mean') std_6 = plt.plot(rolstd, color='black', label='Rolling Std') plt.legend(loc='best') plt.title('Rolling Mean & Standard Deviation') plt.show(block=False) Output:![]()
Dickey-Fuller Test
This is a statistical test used to check for stationarity in a time series. A stationary time series has constant mean and variance over time, which is essential for many time series models.
Explanation: The Dickey-Fuller test checks for stationarity in the time series. Stationary data has constant mean and variance over time, which is essential for many time series models. The test statistic and p-value indicate whether the time series is stationary. If the p-value is less than 0.05, we reject the null hypothesis and conclude that the series is stationary.
from statsmodels.tsa.stattools import adfuller print('Dickey-Fuller Test: ') dftest = adfuller(data['Passengers'], autolag='AIC') dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','Lags Used','No. of Obs']) for key, value in dftest[4].items(): dfoutput['Critical Value (%s)' % key] = value print(dfoutput) Output:![]()
Log Transformation
This is applied to stabilize the variance of a time series. By compressing the range of values, log transformation can make a time series more stationary.
Explanation: We apply a log transformation to stabilize the variance of the time series. Log transformation compresses the range of values and makes the series more stationary.
plt.figure(figsize=(20,10)) data_log = np.log(data) plt.plot(data_log) Output:![]()
Rolling Mean and Standard Deviation of Log Data
Explanation: We calculate and plot the rolling mean and standard deviation for the log-transformed data to observe the smoothed trends.
plt.figure(figsize=(20,10)) MAvg = data_log.rolling(window=12).mean() MStd = data_log.rolling(window=12).std() plt.plot(data_log) plt.plot(MAvg, color='red') Output:![]()
Stationarity Function
This function calculates and plots rolling statistics and performs the Dickey-Fuller test to check for stationarity of the time series.
Explanation: This function calculates and plots the rolling statistics and performs the Dickey-Fuller test. It helps to check the stationarity of the time series. The rolling mean and standard deviation plot shows trends and variability, while the Dickey-Fuller test results indicate if the series is stationary.
def stationarity(timeseries): rolmean = timeseries.rolling(window=12).mean() rolstd = timeseries.rolling(window=12).std() plt.figure(figsize=(20,10)) actual = plt.plot(timeseries, color='blue', label='Actual') mean_6 = plt.plot(rolmean, color='red', label='Rolling Mean') std_6 = plt.plot(rolstd, color='black', label='Rolling Std') plt.legend(loc='best') plt.title('Rolling Mean & Standard Deviation') plt.show(block=False) print('Dickey-Fuller Test: ') dftest = adfuller(timeseries['Passengers'], autolag='AIC') dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','Lags Used','No. of Obs']) for key, value in dftest[4].items(): dfoutput['Critical Value (%s)' % key] = value print(dfoutput) stationary(data_log_diff)
Log Transformation for Smoothing
This technique uses exponential moving averages to smooth the time series, highlighting the overall trend while reducing short-term fluctuations.
Explanation: We apply an exponential moving average to the log-transformed data to smooth it. Exponential moving average gives more weight to recent data points, making it responsive to recent changes while smoothing short-term fluctuations.
expma_log = data_log.ewm(halflife=12).mean() plt.figure(figsize=(20,10)) plt.plot(data_log) plt.plot(expma_log, color='red') Output:Difference Between Log Data and Exponential Moving Average
Explanation: We calculate the difference between the log-transformed data and its exponential moving average to remove trends and make the series more stationary. We then check the stationarity of the differenced data using the stationarity function.
log_diff = data_log - expma_log plt.figure(figsize=(20,10)) log_diff.dropna(inplace=True) plt.plot(log_diff) stationarity(log_diff)
Time series analysis provides essential insights into data that is sequentially ordered over time. By understanding and applying techniques like trend analysis, seasonality decomposition, and exponential moving averages, one can uncover valuable patterns and make accurate forecasts. The methods demonstrated—from basic visualization to sophisticated ARIMA modelling—are crucial for transforming raw time series data into actionable insights.
This exploration highlights the process of smoothing data and preparing it for modelling, particularly the importance of transforming and comparing different forms of data, such as log-transformed data and its exponential moving average. As we continue to develop more advanced models, maintaining clarity and focus in our analysis remains crucial to achieve reliable and actionable results.
Neha Vittal Annam