ECO742

Aims and Objectives

This course aims to introduce advanced topics in time series analysis. At the end of the course, the students shall be able to create pattern recognition models for world time series data. The main software used in this course for statistical programming is R. Students shall be able to use R and its related packages. The content of the course is dynamic and changing in every year. This year content includes (but not limited):

What Will be Covered

Multivariate time series and exploratory analysis
- Check ECO665 Exploratory Time Series Analysis Notes
- Correlation, Cross-Correlation, PCA
- Visualization
Mixed Effects Models (Hierarchical Linear Models, Multilevel Linear Models, Growth Curve Analysis)
Similarities of time series patterns

In general two types of similarity are being discussed in Literature. These are Structural and Shape similarities. The first one deals with more structure of data generating process and the latter deals with patterns of trajectories. We will be covering the following similarity (and association) measures a bit in detail (we will skip some structural similarity measures since they are the part of Time Series Analysis and Econometrics courses);

Correlation Based \(r_{x,y} = \frac{cov(X,Y)}{\sqrt{var(X)} \sqrt{var(Y)}}\).
Euclidian Distance \(\mathrm{D}(\mathbf{x},\mathbf{y}) = \sqrt{\sum_{i=1}^n (x_i-y_i)^2}\)
Mutual Information (Information Theoretic)
\(I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = H(X) + H(Y) - H(X,Y)\)
For a brief information about these entropy functions click information entropy and mutual information
Dynamic Time Warping wiki page For R package and the article (Toni Giorgino (2009). Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software, 31(7), 1-24.)
Longest Common Subsequence wiki page
Brownian Distance Correlation wiki page
Maximal Information Coefficient (Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J. Lander, E. S. Mitzenmacher, M., Sabeti, P. C. (2011), “Detecting Novel Associations in Large Data Sets”, Science 334, 6062, 1518-1524) and website. There is an ongoing discussion and probably we may go thorugh some of them. The R code is available from this website

See On the Comovement of Exchange Rates (OZER-IMER, Itir and Ibrahim, OZKAN, (2013), On the Co-Movements of Exchange Rates, ch. 1, p. 12-37 in Mirdala, Rajmund eds., Financial Aspects of Recent Trends in the Global Economy, vol. 2, ASERS Publishing.) for the application of some of the similarity measure listed above and the hierarchical clustering based on these measures.
Clustering of time series baes on common patterns
- Clustering in General: K-Means Clustering, Hierarchical Clustering
- Time Series Clustering
Time Series Decision Tree
- Simple Decision Tree
- Model Based Decision Tree
Motif discovery in time series (if time permits)

References and Suggested Readings

Rue Tsay, Analysis of Financial Time Series, Wiley, 2010
R. O. Duda, P. E. Hart, D. G. Stork Pattern Classification, John Wiley & Sons, 2000
K. Fukunaga, Statistical Pattern Recognition, Morgan Kaufman, 1990
Jude Pearl, The Art and Science of Cause and Effect, 1997 Lecture
T. W. Liao, Clustering of Time Series Data - A survey, Pattern Recognition, 2005, 38, p:1857-1874
A. K. Palit, D. Popovic, Computational Intelligence in Time Series Forecasting, Springer-Verlag 2005
Stefano Iacus, Simulation and Inference for Stochastic Differential Equations with R examples, Springer Science, 2008

Steps for Learning from Data

1. Data Collection, Raw Data Preparation

Data Sources (Mostly Economic, Finance related):
- Search Engines, Google, Yahoo, Bing etc.
- Worldbank and the package related with Worldbank data is WDI
- Central Bank of the Republic of Turkey
- OECD
- Statistics offices, Turkish Statistical Institute, European Comissions: EuroStat, etc.
- Others like Quandl, Penn World Table, Maddison Project, Groningen Growth and Development Centre
- Some other databases, please do share the addresses with me!!
Some usefull R packages for data manipulation and preparation
- plyr
- dplyr, see introduction page
- data.table, Extension of data.frame for fast manipulation of large data
- reshape and reshape2
- stringr
Some packages to Load/Read Data
- foreign
- xlsx, XLconnect : For Excel

2. Data Preprocessing, First Steps in Exploratory Data Analysis

Numerical summaries:

summary(), aggregate() functions are the first step but several good packages available for numerical summaries. Among them, Hmisc package (describe() function), skimr package (skim()) function etc.

Plotting and Graphical Summaries
+ Some useful time series plot functions are avaliable in forecast library, such as, tsdisplay(), tsdiag(), seasonplot() etc.

ggplot2 and the book related with package is ggplot2: Elegant Graphics for Data Analysis and another good book that uses this package is R Graphics Cookbook and book website
lattice package and its book Lattice: Multivariate Data Visualization with R and the book website
googleVis package. See example page
Displaying time series, spatial, and space-time data with R book and the accompanying website
Table and Plot -> Slopegraph in-class example
Example of Univariate TS plots (ECO665 Course)
History of R Financial Time Series Plotting by a blogger -Timely Portfolio

Missing Values and Treatments (imputations/interpolations/deletions, etc):

There are some functions/packages available for missing value treatment. As functions, na.approx, na.spline, na.locf of zoo package may be used. For imputations Amelia, mi and mice, imputation are the packages one may want to check. Alse refer to missing data section of SocialSciences task view

Some Preliminary Ttransformations (if needed) helpers
+ Aggregation of time seris: There are several functions available for aggregation of time series. The examples are, base function: aggregate(); xts functions: apply.daily(), apply.weekly(), apply.monthly(), apply.quarterly(), apply.yearly(), to.daily(), to.weekly(), to.monthly(), to.quarterly(), to.yearly(), to_period(); zoo function rollapply()

tempdisagg package is a good starting point. Check in-class example for temporal diaggregation.
Normalization and Standardization etc.
Classical Decomposition of Time series: decompose() (using classical moving average) and stl() (using local polynomial Regression-loess-) functions of stats (a base) package can be used for decomposition.
Filtering of Time Series: mFilter package implements, Baxter-King filter (M. Baxter and R.G. King. Measuring business cycles: Approximate bandpass filters. The Review of Economics and Statistics, 81(4):575-93, 1999), Butterworth filter (D.S.G. Pollock. Trend estimation and de-trending via rational square-wave filters. Journal of Econometrics, 99:317-334, 2000), Christiano-Fitzgerald filter (L. Christiano and T.J. Fitzgerald. The bandpass filter. International Economic Review, 44(2):435-65, 2003) download, Hodrick-Prescott filter (R.J. Hodrick and E.C. Prescott. Postwar US business cycles: an empirical investigation. Journal of Money, Credit, and Banking, 29(1):1-16, 1997) and Trigonometric regression filter. See the package Reference Manual for details.

Hodrick-Prescott is one of the widely used filter in Economics. Hence we will discuss this filtering a bit in detail. There exist ongoing discussion about the smoothing parameter of this filter. One optimal filtering will be discussed and implemented as in-class activity

3. Knowledge Extraction (Modelling and Inference)

Traditional Modelling (in Economics mostly) in Time Domain (we will be discussing mostly in this domain)
- Box-Jenkings Modelling: Please do refer to Forecastiong: Principles and Practice, R. J. Hyndman and G. Athanasopoulos, open access textbook chapter 8, Arima Models. Also check the in-class discussions of Ar-Ma-ARMA Modeling and ARIMA modeling.
- Markov Chains and Markov Processes (A very brief introduction and some applications)
- Hidden Markov Models
- Intervention Analysis (an intro to transfer function models and the ARMAX models)
- Multivariate Time Series Regressions (skipped. Refer to Econometrics Course)
- Groth Curve Analysis (Applications will assigned.. Refer to in-class discussions)
Challenges in real world time series
- Very Long Time Series (we will not discuss)
- The (very wide let’s say) large number of Dimension and Associations
  - Similarity of Time Series
- Clustering (of time Series)
  - Issues of clustering of time series (does it make sense to find the groups?): In-class discussion. Please do read some related material before joining our discussion.
  - Introduction to clustering algorithms
  - hierarchical clustering
  - K-means clustering
  - Soft clustering example. Fuzzy c-mean clustering
- Classification example. Decision Trees see an example for a brief introduction and an introductory example for time series.
- Symbolic Aggregations of Time Seires (optional topic to discuss)
  - A very simple example
  - SAX