Арпсс В Форексе

арпсс в форексе

DRNN-ARIMA Approach to Short-term Trend Forecasting in Forex Market

Abstract: Foreign Exchange (Forex) market is a financial market that is totally liquid and highly volatile. Traders earn money by placing buying or selling orders on currency pairs by analyzing the market behaviour before they place any orders. There are two types of analysis namely fundamental analysis and technical analysis. Technical analysis is based on historical price data. Several attempts have been made to forecast the forex prices. Among them time series forecasting is significant. According to Efficient Market Hypothesis (EMH) theory and random walk theory, future prices are independent of the historical prices in financial markets. Thus, to overcome the challenge faced by existing method for technical analysis of short-term trend forecasting in Forex prices was proposed by integrating a DRNN and ARMIA model. Our results have proven the potential for forecasting future Forex prices with a reasonable accuracy.

Published in: 18th International Conference on Advances in ICT for Emerging Regions (ICTer)

Article #:

Date of Conference: September

Date Added to IEEE Xplore: 17 January

ISBN Information:

Electronic ISBN:

CD:

Print on Demand(PoD) ISBN:

ISSN Information:

Electronic ISSN:

Print on Demand(PoD) ISSN:

INSPEC Accession Number:

Persistent Link: eunic-brussels.eu?punumber=
More »

Publisher: IEEE

Refereed Comparing ANN Based Models with ARIMA for Prediction of Forex Rates a b Joarder Kamruzzaman and Ruhul A Sarker Abstract Exchange rates prediction is one of the demanding applications of modern time In the dynamic global economy, the series forecasting. The rates are inherently accuracy in forecasting the foreign currency noisy, non-stationary and deterministically exchange (Forex) rates or at least chaotic [3, 22]. These characteristics predicting the trend correctly is of crucial suggest that there is no complete importance for any future investment. The information that could be obtained from the use of computational intelligence based past behaviour of such markets to fully techniques for forecasting has been proved capture the dependency between the future extremely successful in recent times. In this rates and that of the past. One general paper, we developed and investigated three assumption is made in such cases is that Artificial Neural Network (ANN) based the historical data incorporate all those forecasting models using Standard behaviour. As a result, the historical data is Backpropagation (SBP), Scaled Conjugate the major player (/input) in the prediction Gradient (SCG) and Backpropagation with process. However, it is not clear how good Baysian Regularization (BPR) for Australian is these predictions. The purpose of this Foreign Exchange to predict six different paper is to investigate and compare two currencies against Australian dollar. Five well-known prediction techniques, under moving average technical indicators are different parameter settings, for several used to build the models. These models different exchange rates. were evaluated using three performance metrics, and a comparison was made with For more than two decades, Box and the best known conventional forecasting Jenkins’ Auto-Regressive Integrated model ARIMA. All the ANN based models Moving Average (ARIMA) technique [1] has outperform ARIMA model. It is found that been widely used for time series SCG based model performs best when forecasting. Because of its popularity, the measured on the two most commonly used ARIMA model has been used as a metrics and shows competitive results when benchmark to evaluate many new modelling compared with BPR based model on the approaches [8]. However, ARIMA is a third indicator. Experimental results general univariate model and it is demonstrate that ANN based model can developed based on the assumption that closely forecast the forex market. the time series being forecasted are linear and stationary [2]. Introduction The Artificial Neural Networks, the well- known function approximators in prediction The foreign exchange market has and system modelling, has recently shown experienced unprecedented growth over its great applicability in time-series analysis the last few decades. The exchange rates and forecasting []. ANN assists play an important role in controlling multivariate analysis. Multivariate models dynamics of the exchange market. As a can rely on greater information, where not result, the appropriate prediction of only the lagged time series being forecast, exchange rate is a crucial factor for the but also other indicators (such as technical, success of many businesses and fund fundamental, inter-marker etc. for financial managers. Although the market is well- market), are combined to act as predictors. known for its unpredictability and volatility, there exist a number of groups (like Banks, a Gippsland School of Computing and IT, Agency and other) for predicting exchange Monash University, Churchill, VIC rates using numerous techniques. b School of CS, UNSW@ADFA, Canberra, ACT 2 ASOR BULLETIN, Volume 22 Number 2, June In addition, ANN is more effective in Backpropagation which has been studied describing the dynamics of non-stationary considerably in other studies. time series due to its unique non- parametric, non-assumable, noise-tolerant After introduction, ARIMA, ANN based and adaptive properties. ANNs are forecasting models and the performance universal function approximators that can metrics are briefly introduced. In the map any nonlinear function without a priori following two sections, data collection and assumptions about the data [2]. experimental results are presented. Finally conclusions are drawn. In several applications, Tang and Fishwich [17], Jhee and Lee [10], Wang and Leu [18], Hill et al. [7], and many other researchers ARIMA: An Introduction have shown that ANNs perform better than ARIMA models, specifically, for more The Box-Jenkins method [1 & 2] of irregular series and for multiple-period- forecasting is different from most ahead forecasting. Kaastra and Boyd [11] conventional optimization based methods. provided a general introduction of how a This technique does not assume any neural network model should be developed particular pattern in the historical data of the to model financial and economic time series to be forecast. It uses an iterative series. Many useful, practical approach of identifying a possible useful considerations were presented in their model from a general class of models. The article. Zhang and Hu [23] analysed chosen model is then checked against the backpropagation neural networks' ability to historical data to see whether it accurately forecast an exchange rate. Wang [19] describes the series. If the specified model cautioned against the dangers of one-shot is not satisfactory, the process is repeated analysis since the inherent nature of data by using another model designed to could vary. Klein and Rossin [12] proved improve on the original one. This process is that the quality of the data also affects the repeated until a satisfactory model is found. predictive accuracy of a model. More recently, Yao et al. [20] evaluated the A general class of Box-Jenkins models for a capability of a backpropagation neural- stationary time series is the ARIMA or network model as an option price autoregression moving-average, models. forecasting tool. They also recognised the This group of models includes the AR fact that neural-network models are context model with only autoregressive terms, the sensitive and when studies of this type are MA models with only moving average conducted, it should be as comprehensive terms, and the ARIMA models with both as possible for different markets and autoregressive and moving-average terms. different neural-network models. The Bob-Jenkins methodology allows the analyst to select the model that best fits the In this paper, we apply ARIMA and ANNs data. The details of AR, MA and ARIMA for predicting currency exchange rates of models can be found in Jarrett [6 & 9] Australian Dollar with six other currencies such as US Dollar (USD), Great British Pound (GBP), Japanese Yen (JPY), Artificial Neural Network: An Singapore Dollar (SGD), New Zealand Introduction Dollar (NZD) and Swiss Franc (CHF). A total weeks (closing rate of the week) In this section we first briefly present data are used to build the model and 65 artificial neural networks and then the weeks data to evaluate the models. Under learning algorithms used in this study to ANNs, three models using standard train the neural networks. backpropagation, scaled conjugate gradient and Baysian regression were developed. Artificial Neuron The outcomes of all these models were compared with ARIMA based on three In the quest to build an intelligent machine different error indicators. The results show in the hope of achieving human like that ANN models perform much better than performance in the field of speech and ARIMA models. Scaled conjugate gradient pattern recognition, natural language and Baysian regression models show processing, decision making in fuzzy competitive results and these models situation etc. we have but one naturally forecasts more accurately than standard occurring model: the human brain itself, a ASOR BULLETIN, Volume 22 Number 2, June 3 highly powerful computing device. It follows input xj from other neuron j which is that one natural idea is to simulate the multiplied by the connection strength called functioning of brain directly on a computer. weight ωj (synaptic strength in biological The general conjecture is that thinking neuron) to produce total net input as the about computation in terms of brain weighted sum of all inputs as shown below. metaphor rather than conventional computer will lead to insights into the nature net = ∑ ω j x j of intelligent behavior. This conjecture is j strongly supported by the very unique The output of the neuron is produced by structure of human brain. passing the net input through an activation function. The commonly used activation Digital computers can perform complex functions are hard limiter, sigmoidal or calculations extremely fast without errors gaussian activation function. and are capable of storing vast amount of information. Human being cannot approach x1 these capabilities. On the other hand humans routinely perform tasks like output common sense reasoning, talking, walking, x2 net y=f(net) and interpreting a visual scene etc. in real time effortlessly. Human brain consists of hundred billions of neurons, each neuron xn being an independent biological information processing unit. On average each neuron is Fig An artificial neuron. connected to ten thousands surrounding neurons, all act in parallel to build a Neural Network Architecture massively parallel architecture. What we do in about hundred computational steps, Neural networks can be very useful to computers cannot do in million steps. The realize an input-output mapping when the underlying reason is that, even though each exact relationship between input-output is neuron is an extremely slow device unknown or very complex to be determined compared to the state-of-art digital mathematically. Because of its ability to component, the massive parallelism gives learn complex mapping, recently it has human brain the vast computational power been used for modelling nonlinear necessary to carry out complex tasks. economic relationship. By presenting a data Human brain is also highly fault tolerant as set of input-output pair iteratively, a neural we continue to function perfectly though network can be trained to determine a set of neurons are constantly dying. We are also weights that can approximate the mapping. better capable of dealing with fuzzy situations by finding closest matches of new Multilayer feedforward network, as shown in problem to the old ones. Inexact matching Fig. 2, is one of most commonly used is something brain-style model seem to be neural network architecture. It consists of good at, because of the diffuse and fluid an input layer, an output layer and one or way in which knowledge is represented. All more intermediate layer called hidden layer. these serve a strong motivation for the idea All the nodes at each layer are connected to of building an intelligent machine modeled each node at the upper layer by after biological neuron, now known as interconnection strength called weights. xi's artificial neural networks. are the inputs, ω's are the weights, yk's are output produced by the network. All the Artificial neural network models are very interconnecting weights between layers are simplified versions of our understanding of initialized to small random values at the biological neuron, which is yet far from beginning. During training inputs are complete. Each neuron’s input fibre called presented at the input layer and associated dendrite receives excitatory signals through target output is presented at the output thousands of surrounding neurons’ output layer. A training algorithm is used to attain a fibre called axon. When the total summation set of weights that minimizes the difference of excitatory signals becomes sufficient it the target output and actual output causes the neuron to fire sending excitatory produced by the network. signal to other neurons connected to it. Figure 1 shows a basic artificial neural network model. Each neuron receives an 4 ASOR BULLETIN, Volume 22 Number 2, June new direction is perpendicular to the old direction. This approach to the minimum is a zigzag path and one step can be mostly yk Output undone by the next. In CG method, a new ωk search direction spoils as little as possible j hj Hidden layer the minimization achieved by the previous j one and the step size is adjusted in each ji ωj iteration. The general procedure to i i Input layer determine the new search direction is to combine the new steepest descent direction xi with the previous search direction so that the current and previous search directions are conjugate as governed by the following Fig. 2. A multiplayer feerforward ANN equations. structure. ωk +1 = ωk + α k pk , There are many different neural net learning algorithms found in the literature. No study pk = − E ′ (ω) + α k pk +1 has been reported to analytically determine the generalization performance of each where pk is the weight vector in k-th algorithm. In this study we experimented iteration, pk and pk+1 are the conjugate with three different neural network learning directions in successive iterations. αk and βk algorithms, namely standard are calculated in each iteration. An Backpropagation (BP), Scaled Conjugate important drawback of CG algorithm is the Gradient Algorithm (SCG) and requirement of a line search in each Backpropagation with regularization (BPR) iteration which is computationally in order to evaluate which algorithm expensive. Moller introduced the SCG to predicts the exchange rate of Australian avoid the time-consuming line search dollar most accurately. In the following we procedure of conventional CG. SCG needs describe the three algorithms briefly. to calculate Hessian matrix which is approximated by Training Algorithms Standard BP: BP [16] uses steepest E ′ (ωk + σ k pk ) − E ′ (ωk ) gradient descent technique to minimize the E ′′ (ωk ) pk = + λ k pk sum-of-squared error E over all training σk data. During training, each desired output dj is compared with actual output yj and E is where E' and E'' are the first and second calculated as sum of squared error at the derivative of E. pk, σk and λk are the search output layer. direction, parameter controlling the second derivation approximation and parameter The weight ωj is updated in the n-th training regulating indefiniteness of the Hessian cycle according to the following equation. matrix. Considering the machine precision, the value of σ should be as small as ∂E -4 possible (≤ 10 ). A detailed description of ∆ ω j ( n) = − η + α ∆ ω j ( n − 1) ∂ω j the algorithm can be found in [15]. The parameters η and α are the learning BPR: A desired neural network model rate and the momentum factor, respectively. should produce small error on out of sample The learning rate parameter controls the data, not only on sample data alone. To step size in each iteration. For a large-scale produce a network with better problem Backpropagtion learns very slowly generalization ability, MacKay [14] and its convergence largely depends on proposed a method to constrain the size of choosing suitable values of η and α by the network parameters by regularization. user. Regularization technique forces the network to settle to a set of weights and biases SCGA: In conjugate gradient methods, a having smaller values. This causes the search is performed along conjugate network response to be smoother and less directions, which produces generally faster likely to overfit [5] and capture noise. In convergence than steepest descent regularization technique, the cost function F directions [5]. In steepest descent search, a is defined as ASOR BULLETIN, Volume 22 Number 2, June 5 Bank of Australia. We considered exchange 1−γ n 2 F = γE + n ∑ω j rate of US dollar, British Pound, Japanese j =1 Yen, Singapore dollar, New Zealand dollar and Swiss Franc. As outlined in an earlier where E is the sum-squared error and γ section, weekly data was considered of (<) is the performance ratio parameter, which first weekly data was used in the magnitude of which dictates the training and the remaining 65 weekly data emphasis of the training. A large γ will drive for evaluating the model. The plots of the error E small whereas a small γ will historical rates for US Dollar (USD), Great emphasize parameter size reduction at the British Pound (GBP), Singapore Dollar expense of error and yield smoother (SGD), New Zealand Dollar (NZD) and network response. Optimum value of γ can Swiss Franc (CHF) are shown in Figure 3, be determined using Bayesian and for Japanese Yen (JPY) in Figure 4. regularization in combination with Levenberg-Marquardt algorithm [4] USD SGD NZD CHF Neural Network Forecasting Model GBP Technical and fundamental analyses are the two major financial forecasting methodologies. In recent times, technical analysis has drawn particular academic interest due to the increasing evidence that markets are less efficient than was originally thought [13]. Like many other economic time series model, exchange rate 1 51 20 25 30 35 40 45 50 exhibits its own trend, cycle, season and W eek N um b er irregularity. In this study, we used time Figure 3. Historical rates for USD, GBP, delay moving average as technical data. SGD, NZD and CHF The advantage of moving average is its tendency to smooth out some of the Performance Metrics irregularity that exits between market days [21]. In our model, we used moving average The forecasting performance of the above values of past weeks to feed to the neural mentioned models is evaluated against network to predict the following week’s rate. three widely used statistical metric, namely, The indicators are MA5, MA10, MA20, Normalized Mean Square Error (NMSE), MA60, MA and Xi, namely, moving Mean Absolute Error (MAE) and Directional average of one week, two weeks, one Symmetry (DS). These criteria are defined month, one quarter, half year and last in Table 1. NMSE and MAE measure the week's closing rate, respectively. The deviation between actual and forecasted predicted value is Xi+1. So the neural value. Smaller values of these metrics network model has 6 inputs for six indicate higher accuracy in forecasting. DS indicators, one hidden layer and one output measures correctness in predicted unit to predict exchange rate. Historical data directions and higher value indicates are used to train the model. Once trained correctness in trend prediction. the model is used for forecasting. Experimental Results JPY In this section, we present the data 95 Exchange Rate collection procedure and the results of 85 experiments. 75 Data Collection 65 55 The data used in this study is the foreign 1 51 Week Num ber exchange rate of six different currencies against Australian dollar from January Figure 4. Historical rates Japanese Yen to July made available by the Reserve 6 ASOR BULLETIN, Volume 22 Number 2, June Table 1: Performance metrics used in the experiment. SBP SCG BPR ARIMA ∑ ( x k − xˆ k ) 2 1 NMSE = = 2 k 2 ∑ ( x k − xˆ k ) ∑ (x k − x k) σ2N k k 1 MAE = x k − xˆ k 0 N USD GBP JPY SGD NZD CHF DS = ∑ dk , Fig. 5(b) MAE over 35 weeks N k 1 if ( x k − xk −1) ( xˆ k − xˆ k −1) ≥ 0 SBP SCG BPR ARIMA dk =  80 0 otherwise 60 Simulation Results 40 Simulation was performed with different 20 neural networks and ARIMA model. The 0 performance of a neural network depends USD GBP JPY SGD NZD CHF on a number of factors, e.g., initial weights chosen, different learning parameters used Fig. 5(c) DS over 35 weeks during training (described in section ) and the number of hidden units. For each 3 algorithm, we trained 30 different networks SBP SCG BPR ARIMA with different initial weights and learning parameters. The number of hidden units 2 was varied between 3~7 and the training was terminated at iteration number between 1 to The simulation was done in MATLAB using modules for SBP, SCG and BPR from neural network toolbox. The best 0 results obtained by each algorithm are USD GBP JPY SGD NZD CHF presented below. The ARIMA model (with parameters setting 1,0,1) was run from Fig. 6(a) NMSE over 65 weeks Minitab on a IBM PC. SBP SCG BPR ARIMA After a model is built, exchange rate is forecasted for each currency over the test data. Prediction performance is measured in terms of MNSE, MAE and DS over 35 weeks and 65 weeks by comparing the forecasted and actual exchange rate. Figures 5(a)~(c) and 6(a)~(c) present the 0 performance metrics graphically over 35 USD GBP JPY SGD NZD CHF and 65 weeks respectively. Fig. 6(b) MAE over 65 weeks SBP SCG BPR ARIMA SBP SCG BPR ARIMA 1 80 60 40 20 0 0 USD GBP JPY SGD NZD CHF USD GBP JPY SGD NZD CHF Fig. 5(a) NMSE over 35 weeks Fig. 6(c) DS over 65 weeks ASOR BULLETIN, Volume 22 Number 2, June 7 From Figures 5, 6 and 8, it is clear that the ANN based models outperformed ARIMA quality of forecast with ARIMA model model measured on three different deteriorates with the increase of the number performance metrics. Results demonstrate of periods for the forecasting (/testing) that ANN based model can forecast the phase. In other words, ARIMA could be Forex rates closely. Among the three ANN suitable for shorter term forecasting than based models, SCG based model yields longer term. However, the results show that best results measured on two popular neural network models produce better metrics and shows results comparable to performance than the conventional ARIMA BPR based models when measured on the model for both shorter and longer term indicator DS. forecasting which means ANN is more suitable for financial modelling. References As we can see in Figure 5 and 6, both SCG and BPR forecasts are better than SBP in [1] G. E. P. Box and G. M. Jenkins, Time terms of all metrics. In our experiment this Series Analysis: Forecasting and Control, is consistently observed in all other Holden-Day, San Francosco, CA. currencies also. In terms of the most commonly used criteria, i.e., NMSE and [2] L. Cao and F. Tay, “Financial MAE, SCG perform better than BPR in all Forecasting Using Support Vector currencies except Japanese Yen. In terms Machines,” Neural Comput & Applic, vol. indicator DS, SCG yields slightly better 10, pp, performance in case of Swiss France, BPR slightly better in US Dollar and British [3] G. Deboeck, Trading on the Edge: Pound, both perform equally in case of Neural, Genetic and Fuzzy Systems for Japanese Yen, Singapore and New Chaotic Financial Markets, New York Wiley, Zealand Dollar. Although we reported only the best predictions in this paper, a sample outputs based on error indicator NMSE for [4] F. D. Foresee and M.T. Hagan, “Gauss- the best and worst predictions produced by Newton approximation to Bayesian SBP for British-Pound are shown in Figure regularization,” Proc. IJCNN , pp. 7. The actual and forecasted time series of six currency rates using ARIMA, and SCG model are shown in Figures 8 and 9 [5] M.T. Hagan, H.B. Demuth and M.H. respectively. From Figures 5 and 6, one can Beale, Neural Network Design, PWS easily imagine the superiority of ANN based Publishing, Boston, MA, models over ARIMA. [6] J. Hanke and A. Reitsch, Business Actual Predict-SBP(worst) Predict-SBP(best) Forecasting, Allyn and Bacon, Inc., Boston, [7] T. Hill, M. O’Connor and W. Remus, “Neural Network Models for Time Series Forecasts,” Management Science, vol. 42, MNSE pp , [8] H. B. Hwarng and H. T. Ang, “A Simple Neural Network for ARMA(p,q) Time Series,” OMEGA: Int. Journal of 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 Management Science, vol. 29, pp , Week Figure 7. Sample worst and best predictions [9] J. Jarrett, Business Forecasting Conclusion Methods, Basil Blackwell, Oxford, In this study, we investigated three ANN based forecasting models to predict six [10] W. C. Jhee and J. K. Lee, foreign currencies against Australian dollar “Performance of Neural Networks in using historical data and moving average Managerial Forecasting,” Intelligent technical indicators, and a comparison was Systems in Accounting, Finance and made with traditional ARIMA model. All the Management, vol. 2, pp , 8 ASOR BULLETIN, Volume 22 Number 2, June [18] J. eunic-brussels.eu and J. Y. Leu, “Stock Market [11] I. Kaastra and M. Boyd, “Designing a Trend Prediction Using ARIMA-based Neural Network for Forecasting Financial Neural Networks,” Proc. of IEEE Int. Conf. and Economic Time-Series,” on Neural Networks, vol. 4, pp. , Neurocomputing, vol. 10, pp, [12] B. D. Klein and D. F. Rossin, “Data [19] S. Wang () An Insight into the Quality in Neural Network Models: Effect of Standard Back-Propagation Neural Network Error Rate and Magnitude of Error on Model for Regression Analysis, OMEGA: Predictive Accuracy,” OMEGA: Int. Journal Int. Journal of Management Science, 26, of Management Science, vol. 27, pp pp , [20]J. Yao, Y. Li and C. L. Tan, “Option [13] B. LeBaron, “Technical trading rule Price Forecasting Using Neural Networks,” profitability and foreign exchange OMEGA: Int. Journal of Management intervention,” Journal of Int. Economics, vol. Science, vol. 28, pp , 49, pp. , [21] J. Yao and C.L. Tan, “A case study on [14] D.J.C. Mackay, “Bayesian using neural networks to perform technical interpolation,” Neural Computation, vol. 4, forecasting of forex,” Neurocomputing, vol. pp. , 34, pp. , [15] A.F. Moller, “A scaled conjugate [22] S. Yaser and A. Atiya, “Introduction to gradient algorithm for fast supervised Financial Forecasting,“ Applied Intelligence, learning,” Neural Networks, vol. 6, pp vol. 6, pp , , [23] G. Zhang and M. Y. Hu, “Neural [16] D.E Rumelhart, J.L. McClelland and the Network Forecasting of the British PDP research group, Parallel Distributed Pound/US Dollar Exchange Rate,” OMEGA: Processing, vol. 1, MIT Press, Int. Journal of Management Science, 26, pp , [17] Z. Tang and P. A. Fishwich, “Back- Propagation Neural Nets as Models for Time Series Forecasting,” ORSA Journal on Computing, vol. 5(4), pp , ASOR BULLETIN, Volume 22 Number 2, June 9 Actual Forecast Actual Forecast 1 11 21 31 41 51 61 1 11 21 31 41 51 61 (a) USD/AUD (b) GBP/AUD 75 A ctual 1 Forecast A ctual 70 Forecast 65 60 55 50 1 11 21 31 41 51 61 1 11 21 31 41 51 61 (c) JPY/AUD (d) SGD/AUD A ctual A ctual Forecast Forecast 1 11 21 31 41 51 61 1 11 21 31 41 51 61 (e) NZD/AUD (f) CHF/AUD Figure 8. Forecasting of different currencies by ARIMA model over 65 weeks. 10 ASOR BULLETIN, Volume 22 Number 2, June Actual Forecast Actual Forecast (a) USD/AUD (b) GBP/AUD 1 11 21 31 41 51 61 1 11 21 31 41 51 61 75 Actual Forecast Actual Forecast 70 1 65 60 55 (c) JPY/AUD (d) SGD/AUD 50 1 11 21 31 41 51 61 1 11 21 31 41 51 61 1 Actual Forecast Actual Forecast (e) NZD/AUD (f) CHF/AUD 1 11 21 31 41 51 61 1 11 21 31 41 51 61 Figure 9. Forecasting of different currencies by SCG based neural network model over 65 weeks. ASOR BULLETIN, Volume 22 Number 2, June 11

Exchange rate forecasting has long been an issue of interest to academics, industries, and governments, where changes in exchange rates may affect real economic activity and financial markets. Many studies have shown that traditional economic models cannot beat random walk models for out-of-sample forecasting of exchange rates, because most of the traditional models are linear, while exchange rate series are inherently dynamic and non-linear. In recent years, many deep models have been proposed, and although they are mainly used in natural language processing studies, their excellent ability to capture nonlinear properties has led more and more studies to try to apply them in exchange rate prediction. This paper mainly refers to the model from Yilmaz and Arabaci (), which splits the forecast of exchange rate into linear and non-linear components, using the Autoregressive Integrated Moving Average (ARIMA) model and Self-Attention (SA) mechanism to forecast the return of USD/CAD, AUD/USD, and GBP/USD respectively. Comparing this model with the ARIMA-LSTM model using the Long Short-term Memory (LSTM) framework in Yilmaz and Arabaci () and the random walk model, the results show that the ARIMA-SA model has worse predictive power than the ARIMA-LSTM model. Moreover, unlike the results shown by Yilmaz and Arabaci (), the predictive power of the ARIMA-LSTM model in terms of daily exchange rate returns is inferior to that of the random walk model.

Recently, I wrote about fitting mean-reversion time series analysis models to financial data and using the models&#; predictions as the basis of a trading strategy. Continuing our exploration of time series modelling, let&#;s research the autoregressive and conditionally heteroskedastic family of time series models. In particular, we want to understand the autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroskedasticity (GARCH) models. Why? Well, they are both referenced frequently in the quantitative finance literature, and it&#;s about time I got up to speed so why not join me!

What follows is a summary of what I learned about these models, a general fitting procedure and a simple trading strategy based on the forecasts of a fitted model.

Let&#;s get started!

What are these time series analysis models?

Several definitions are necessary to set the scene. I don&#;t want to reproduce the theory I&#;ve been wading through; rather here is my very high-level summary of what I&#;ve learned about time series modelling, in particular, the ARIMA and GARCH models and how they are related to their component models:

At its most basic level, fitting ARIMA and GARCH models is an exercise in uncovering the way in which observations, noise and variance in a time series affect subsequent values of the time series. Such a model, properly fitted, would have some predictive utility, assuming of course that the model remained a good fit for the underlying process for some time in the future.

ARIMA

An ARMA model (note: no &#;I&#;) is a linear combination of an autoregressive (AR) model and moving average (MA) model. An AR model is one whose predictors are the previous values of the series. An MA model is structurally similar to an AR model, except the predictors are the noise terms. An autoregressive moving average model of order p,q &#; ARMA(p,q) &#; is a linear combination of the two and can be defined as:

\begin{equation} \label{eq:poly}
X_{t} = a_{1}X_{t-1} + a_{2}X_{t-2} + &#; + a_{p}X_{t-p} + w_{t} + b_{1}w_{t-1} + b_{2}w_{t-2} + &#; + b_{q}w_{t-q}
\end{equation}

where $w_{t}$ is white noise and $a_{i}$ and $ b_{i}$ are coefficients of the model.

An ARIMA(p,d,q) model is simply an ARMA(p,q) model differenced &#;d&#; times &#; or integrated (I)- to produce a stationary series.

GARCH

Finally, a GARCH model attempts to also explain the heteroskedastic behaviour of a time series (that is, the characteristic of volatility clustering) as well as the serial influences of the previous values of the series (explained by the AR component) and the noise terms (explained by the MA component). A GARCH model uses an autoregressive process for the variance itself, that is, it uses past values of the variance to account for changes to the variance over time.

So how do we apply these models?

With that context setting out of the way, I next fit an ARIMA/GARCH model to the EUR/USD exchange rate and use it as the basis of a trading system. The model&#;s parameters for each day are estimated using a fitting procedure, that model is then used to predict the next day&#;s return and a position is entered accordingly and held for one trading day. If the prediction is the same as for the previous day, the existing position is maintained.

A rolling window of log returns is used to fit an optimal ARIMA/GARCH model at the close of each trading day. The fitting procedure is based on a brute force search of the parameters that minimize the Aikake Information Criterion, but other methods can be used. For example, we could choose parameters that minimize the Bayesian Information Criterion, which may help to reduce overfitting by penalizing complex models (that is, models with a large number of parameters). This fitting procedure was inspired by Michael Halls-Moore&#;s post about an ARIMA+GARCH trading strategy for the S&P, and I borrowed some of his code.

I chose to use a rolling window of days to fit the model, but this is a parameter for optimization. There is a case for using as much data as possible in the rolling window, but this may fail to capture the evolving model parameters quickly enough to adapt to a changing market. I won&#;t explore this too much here, but it would be interesting to investigate the strategy&#;s performance as a function of the lookback window.

Here&#;s the code:

### ARIMA/GARCH trading model library(quantmod) library(timeSeries) library(rugarch) # get data and initialize objects to hold forecasts EURUSD <- eunic-brussels.eu(&#;eunic-brussels.eu&#;, header = T) EURUSD[, 1] <- eunic-brussels.eu(eunic-brussels.euter(EURUSD[, 1]), format="%d/%m/%Y") returns <- diff(log(EURUSD$C)) ## ttr::ROC can also be used: calculates log returns by default eunic-brussels.eu <- eunic-brussels.eu <- length(returns) - eunic-brussels.eu forecasts <- vector(mode="numeric", length=eunic-brussels.eu) directions <- vector(mode="numeric", length=eunic-brussels.eu) eunic-brussels.eu <- vector(mode="numeric", length=eunic-brussels.eu) # loop through every trading day, estimate optimal model parameters from rolling window # and predict next day&#;s return for (i in 0:eunic-brussels.eu) { eunic-brussels.eus <- returns[(1+i):(eunic-brussels.eu + i)] # create rolling window eunic-brussels.eu <- Inf eunic-brussels.eu <- c(0,0,0) # estimate optimal ARIMA model order for (p in ) for (q in ) { # limit possible order to p,q <= 5 if (p == 0 && q == 0) next # p and q can&#;t both be zero arimaFit <- tryCatch( arima(eunic-brussels.eus, order = c(p,0,q)), error = function( err ) FALSE, warning = function( err ) FALSE ) if (!eunic-brussels.eul( arimaFit)) { eunic-brussels.eu <- AIC(arimaFit) if (eunic-brussels.eu < eunic-brussels.eu) { # retain order if AIC is reduced eunic-brussels.eu <- eunic-brussels.eu eunic-brussels.eu <- c(p,0,q) eunic-brussels.eu <- arima(eunic-brussels.eus, order = eunic-brussels.eu) } } else next } # specify and fit the GARCH model spec = ugarchspec(eunic-brussels.eu <- list(garchOrder=c(1,1)), eunic-brussels.eu <- list( armaOrder <- c(eunic-brussels.eu[1], eunic-brussels.eu[3]), eunic-brussels.eu = T), eunic-brussels.eu = "sged") fit = tryCatch(ugarchfit(spec, eunic-brussels.eus, solver = &#;hybrid&#;), error = function(e) e, warning = function(w) w) # calculate next day prediction from fitted mode # model does not always converge - assign value of 0 to prediction and eunic-brussels.eu in this case if (is(fit, "warning")) { forecasts[i+1] <- 0 print(0) eunic-brussels.eu[i+1] <- 0 } else { eunic-brussels.eu = ugarchforecast(fit, eunic-brussels.eu = 1) x = eunic-brussels.eu@forecast$seriesFor directions[i+1] <- ifelse(x[1] > 0, 1, -1) # directional prediction only forecasts[i+1] <- x[1] # actual value of forecast print(forecasts[i]) # analysis of residuals resid <- eunic-brussels.euc(residuals(fit, standardize = TRUE)) eunic-brussels.eu <- eunic-brussels.eu(resid, lag = 20, type = "Ljung-Box", fitdf = 0) eunic-brussels.eu[i+1] <- eunic-brussels.eu$eunic-brussels.eu } } dates <- EURUSD[, 1] eunic-brussels.eu <- xts(forecasts, dates[(eunic-brussels.eu):length(returns)]) # create lagged series of forecasts and sign of forecast eunic-brussels.eusts <- Lag(eunic-brussels.eu, 1) eunic-brussels.euion <- ifelse(eunic-brussels.eusts > 0, 1, ifelse(eunic-brussels.eusts < 0, -1, 0)) # Create the ARIMA/GARCH returns for the directional system eunic-brussels.eus <- eunic-brussels.euion * returns[(eunic-brussels.eu):length(returns)] eunic-brussels.eus[1] <- 0 # remove NA # Create the backtests for ARIMA/GARCH and Buy & Hold eunic-brussels.eu <- cumsum( eunic-brussels.eus) eunic-brussels.eu <- xts(returns[(eunic-brussels.eu):length(returns)], dates[(eunic-brussels.eu):length(returns)]) eunic-brussels.eu <- cumsum(eunic-brussels.eu)) eunic-brussels.eu <- cbind(eunic-brussels.eu, eunic-brussels.eu) names(eunic-brussels.eu) <- c("Strategy returns", "Buy and hold returns") # plot both curves together myColors <- c( "darkorange", "blue") plot(x = eunic-brussels.eu[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return", main = "Cumulative Returns", ylim = c(, ), eunic-brussels.eu= "quarters", eunic-brussels.eu = FALSE, col = "darkorange") lines(x = eunic-brussels.eu[,"Buy and hold returns"], col = "blue") legend(x = &#;bottomleft&#;, legend = c("Strategy", "B&H"), lty = 1, col = myColors)

First, the directional predictions only: buy when a positive return is forecast and sell when a negative return is forecast. The results of this approach are shown below (no allowance for transaction costs):
GARCH Returns EURUSD - time series analysis
You might have noticed that in the model fitting procedure above, I retained the actual forecast return values as well as the direction of the forecast return. I want to investigate the predictive power of the magnitude of the forecast return value. Specifically, does filtering trades when the magnitude of the forecast return is below a certain threshold improve the performance of the strategy? The code below performs this analysis for a small return threshold. For simplicity, I converted the forecast log returns to simple returns to enable manipulation of the sign of the forecast and easy implementation.

# Test entering a trade only when prediction exceeds a threshold magnitude eunic-brussels.eusts <- exp(eunic-brussels.eusts) - 1 threshold <- eunic-brussels.euold <- ifelse(eunic-brussels.eusts > threshold, 1, ifelse(eunic-brussels.eusts < -threshold, -1, 0)) eunic-brussels.eus <- eunic-brussels.euold * returns[(eunic-brussels.eu):length(returns)] eunic-brussels.eus[1] <- 0 # remove NA eunic-brussels.eu <- cumsum(eunic-brussels.eus)) eunic-brussels.eu <- cbind(eunic-brussels.eu, eunic-brussels.eu) names(eunic-brussels.eu) <- c("Strategy returns", "Buy and hold returns") # plot both curves together plot(x = eunic-brussels.eu[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return", main = "Cumulative Returns", eunic-brussels.eu= "quarters", # eunic-brussels.eu = FALSE, ylim = c(, ), col = "darkorange") lines(x = eunic-brussels.eu[,"Buy and hold returns"], col = "blue") legend(x = &#;bottomleft&#;, legend = c("Strategy", "B&H"), lty = 1, col = myColors)

And the results overlaid with the raw strategy:
Filtered on magnitude of forecast
It occurred to me that the ARIMA/GARCH model we fit on certain days may be a better or worse representation of the underlying process than other days. Perhaps filtering trades when we have less confidence in our model would improve performance. This approach requires that the statistical significance of each day&#;s model fit be evaluated, and a trade only entered when this significance exceeds a certain threshold. There are a number of ways this could be accomplished. Firstly, we could visually examine the correlogram of the model residuals and make a judgement on the goodness of fit on that basis. Ideally, the correlogram of the residuals would resemble a white noise process, showing no serial correlation. The correlogram of the residuals can be constructed in R as follows:
acf(fit@fit$residuals, main = &#;ACF of Model Residuals&#;)
ACF of Model Residuals - time series analysis
While this correlogram suggests a good model fit, it is obviously not a great approach as it relies on subjective judgement, not to mention the availability of a human to review each day&#;s model. A better approach would be to examine the Ljung-Box statistics for the model fit. The Ljung-Box is a hypothesis test for evaluating whether the autocorrelations of the residuals of a fitted model differ significantly from zero. In this test, the null hypothesis is that the autocorrelation of the residuals is zero; the alternate is that our time series analysis possesses serial correlation. Rejection of the null and confirmation of the alternate would imply that the model is not a good fit, as there is unexplained structure in the residuals. The Ljung-Box statistic is calculated in R as follows:

eunic-brussels.eu <- eunic-brussels.eu(resid, lag = 20, type = "Ljung-Box", fitdf = 0) eunic-brussels.eu Box-Ljung test data: resid X-squared = , df = 20, p-value =

The p-value in this case provides evidence that the residuals are independent and that this particular model is a good fit. By way of explanation, the Ljung-Box test statistic (X-squared in the code output above) grows larger for increasing autocorrelation of the residuals. The p-value is the probability of obtaining a value as large or larger than the test statistic under the null hypothesis. Therefore, a high p-value, in this case, is evidence for independence of the residuals. Note that it applies to all lags up to the one specified in the eunic-brussels.eu() function.

Applying the Ljung-Box test to each day&#;s model fit reveals very few days where the null hypothesis of independent residuals is rejected, so extending the strategy to also filter any trades triggered by a poor model fit is unlikely to add much value:
2 filters - time series analysis

Time series analysis conclusions and future work

The performance of the ARIMA/GARCH strategy outperforms a buy and hold strategy on the EUR/USD for the backtest period, however, the performance is nothing spectacular. It seems that it is possible to improve the performance of the strategy by filtering on characteristics such as the magnitude of the prediction and the goodness of fit of the model, although the latter does not add much value in this particular example. Another filtering option could be to calculate the 95% confidence interval for each day&#;s forecast and only enter a trade when the sign of each limit is the same, although this would greatly reduce the number of trades actually taken.

There are many other varieties of the GARCH model, for example exponential, integrated, quadratic, threshold, structural and switching to name a few. These may or may not provide a better representation of the underlying process than the simple GARCH (1,1) model used in this example. For an exposition of these and other flavors of GARCH, see Bollerslev et. al. ().

An area of research that I have found highly interesting recently is forecasting with time series analysis through the intelligent combination of disparate models. For example, by taking the average of the individual predictions of several models or seeking consensus or a majority vote on the sign of the prediction. To borrow some machine learning nomenclature, this &#;ensembling&#; of models can often produce more accurate forecasts than any of the constituent models. Perhaps a useful approach would be to ensemble the predictions of the ARIMA/GARCH model presented here with a suitably trained artificial neural network or other statistical learning method. We could perhaps expect the ARIMA/GARCH model to capture any linear characteristics of the time series, while the neural network may be a good fit for the non-linear characteristics. This is all pure speculation, potentially with some backing from this paper, but an interesting research avenue nevertheless.

If you have any ideas for improving the forecast accuracy of time series analysis models, I&#;d love to hear about them in the comments.
Finally, credit where credit is due: although I worked my way through numerous sources of information on financial time series modelling, I found Michael Halls-Moore&#;s detailed posts on the subject extremely helpful. He starts from the beginning and works through various models of increasing complexity. As stated in the main post, I also borrowed from his ARIMA + GARCH trading strategy for the S&P in designing the EUR/USD strategy presented here, particularly the approach to determining model parameters through iterative minimization of the Aikake Information Criterion. The ideas around filtering trades on the basis of the results of the Ljung-Box test and the absolute magnitude of the forecast value were my own (although I&#;m sure I&#;m not the first to come up with them).

Found this post useful? Chances are you&#;ll love our exploration of the Hurst Exponent.

Other references I found particularly useful:

Bollerslev, T. (). Financial Econometrics: Past Developments and Future Challenges, in Journal of Econometrics, Vol. ,
Bollerslev, T., Engle, R.F. and Nelson, D.B. (). GARCH Models, in: Engle, R.F., and McFadden, D.L. (eds.) Handbook of Econometrics, Vol. 4, Elsevier, Amsterdam,
Engle, R. (). New Frontiers for ARCH Models, in Journal of Applied Econometrics, Vol. 17,
Qi, M. and Zhang, G.P. (). Trend Time Series Modelling and Forecasting with Neural Networks, in IEEE Transactions on Neural Networks, Vol. 19, No. 5,
Tsay, R. (). Conditional Heteroscedastic Models, in Tsay, R. Analysis of Financial Time Series, Third Edition, Wiley,
Here you can download the code and data used in this analysis: arima_garch

nest...

19302 19303 19304 19305 19306