Professional Documents
Culture Documents
UNDER SUPERVISION OF
pROF. sIBA PANDA
2 0 1 5 - 2 0 1 7 D E C L A R ATI O N
I hereby declare that the research project titled, The
Application of Pair Trading to Stock Markets, submitted
by me is based on original work carried out by me. I
certify that it has not been submitted anywhere else. We
further declare that Mukesh Patel School of Technology
Management and Engineering-NMIMS (deemed-to-beuniversity) will have the copyright on the project report
submitted by me to the college (MPSTME).
Thanking You
Gunjan Dadhich
ACKNOWLEDGMENT
It is my proud privilege to release the feelings of my
gratitude to several persons who helped me directly or
indirectly to conduct this research project work. I express
my heart full indebtness and owe a deep sense of
gratitude to my faculty guide Prof. Siba Panda, Prof.
Sarada Samantaray for their sincere guidance and
inspiration in completing this project. I am extremely
thankful to the Mr. Anshul Gupta, Mr. Hemant Palivela
and all faculty members of M.Tech Data Science of
MPSTME for their coordination and cooperation and for
their kind guidance and encouragement. I also thank all
my friends who have more or less supported and
encouraged me to complete this project. I will be always
indebted to them. The study has indeed helped me to
explore more knowledgeable avenues related to my topic
and I am sure it will help me in my future.
Gunjan Dadhich
A-607
M.Tech (Data Science)
TABLE OF CONTENTS
ABSTRACT
This project is to implement the usefulness of a hedge fund trading strategy known as
pairs trading applied to different stocks. The profit return of a simplified pairs
trading strategy is modeled by using a mean-reverting process of the futures price
spread. As per the comparative statics of the model, the high mean-reversion and high
volatility of the spread give rise to the high overall return from trading. Analyzing
energy futures (more specifically, HPCL and BPCL) traded on the National Stock
Exchange, we present empirical evidence that pairs trading can produce a relatively
stable profit. We are using the static model in this project where we are calculating the
hedge ratio from the historical prices. We are using linear regression to find out the
Hedge ratio, and doing ADF test to check the Co integration between the stocks. We
are calculating the return from the pair trading and plotting the significant trading
strategy on the spread where we are selling the expensive and buying the cheap stock
when the spread is moved above certain extend. The data has been taken from Yahoo
finance website and cleaned and formatted as per the project requirement using R,
Also we are using R tool for the implementation of the pair trading. The another Stock
pair of AAPL and QQQ is taken to implement the pair trading in R ,this has been done
4
with the Library pairtrading.We also suggested that the pair trading can be good
with the high frequency data with the entry point and exiting point is calculated on the
high frequency data as compare to the static model. We have included HFT processing
as the future scope of this project. This project is more focused on the in depth study
of the pair trading concept and normal implementation of the pair trading concept in
the R using Quant library of the R.
2. INTRODUCTION
The Pair trading or commonly known as statistical arbitrage, is the most popular
trading strategy among hedge funds, as they are perfect for the minimized risk and
ability to produce returns in any of market environment that the pair trading strategy
gives. Pair trading is been there since the invention of markets, Jesse Livermore, is the
one of the most famous traders of his time he used pair trading back in the late 1800s,
he would recognize a strong stock then short what he called the sister stock. Pair
trading really used with large investment banks and hedge funds in the 1980s with the
help of increased uses of computers.
In pairs trading strategy the trader identifies two brands of stock prices that are highly
correlated which means the two stock prices moves significantly together based on
their price histories and then starts the trades by opening long and short positions of
those two brands selected. The pair trading strategy solely depends on the correlation
of the brand of stocks and the hedge ratio, a ratio that will compares the value of
futures positions that have been bought or sold to the value of the underlying
commodity being hedged. It also can be in reference to the ratio that compares the
value of some part of a security position being hedged with the size of the entire
position itself. If we are for example long one unit of P, how many units of Q should
we sell short? That quantity is known as Hedge Ratio. In this study we are doing the
pair trading on the two Indian stocks from the same industries Hindustan petroleum
corporation limited (HPCL) and Bharat Petroleum Corporation limited (BPCL). We
will be finding the correlation between these two stocks and then trying to fit
historical prices of these models in to the regression model to calculate the Hedge
ratio, then creating the spread for these stocks. The statistical programing language, R
is used for the implementation of the strategy and the R packages QUANTMOD.
The Gatev-Goetzmann, and Rouwenhorst in (2006) perform the empirical tests of
pairs trading on the common stock. They demonstrated that a pairs trading strategy is
much more profitable, even after taking into account such as transaction costs. The
Jurek and Yang (2007) equate the performance of their optimal mean-reversion
strategy with that of Gatev, Goetzmann, and Rouwenhorst (2006) using the simulated
data. They demonstrate that their strategy provides even better performance than the
Gatev-Goetzmann, and Rouwenhorst. Although a pairs trading strategy has been
applied primarily as a stock market trading strategy, there is no need to limit the
strategy to that asset class. A pairs trading strategy generally requires two highly
correlated prices.
High Frequency Data also may be used in conjunction with a pairs trading strategy. In
pairs trading strategies, a trader takes conflicting long and short positions in two
assets when the difference their prices hits a decided opening threshold. These
positions are then closed when a definite closing threshold is reached. The difference
in prices that the trader uses to judge when to open and close a position is commonly
referred to as the spread between the pair of assets. The two stocks identified are
expected to move together due to their status as close substitutes for each other.
Examples of pairs include oil manufacturing companies, large financial institutions,
and some credit card companies. The Pairs trading strategies seek to exploit
temporary mispricing of assets within the market and thus, they rely on meanreversion and develop market-neutral portfolios whose net market exposure is
negligible.
Recently, with the growing admiration of HFT various studies which examine that the
applicability of pairs trading strategies to high frequency environments have been
performed. The Bowen et al. (2010) examine the importance of high frequency
strategies to market attributes, noting that primary returns to their strategy arise in the
very first hour and last hour of trading days, when the trading volume is expected to
be highest.
The history of pair trading is bit interesting somewhere in mid-1980's the Wall Street
quant Nunzio Tartaglia decide to assembled a team of good physicists, good
mathematicians and some of computer scientists to uncover some arbitrage
opportunities in the equities markets. During that period Tartaglia's groups of former
academics used some sophisticated statistical methods to develop a high-tech trading
programs, which are executable through automated trading systems that took the
intuition and trader's skill out of arbitrage and replaced it with disciplinedand more
consistent filter rules. Among other things Tartaglia's programs identified such pairs
of securities whose prices are tended to move together. They traded these pairs with
huge success in 1987 a year when the group reportedly made a $50 million profit for
the firm totally. Although the Morgan Stanley group is disbanded in 1989 after a
couple of bad years of performance but pair trading become an increasingly popular
market-neutral investment strategy which is afterwards used by individual and
institutional traders as well as hedge funds. The amplified popularity of quantitativebased on statistical arbitrage strategies has also apparently affected profits in a New
York Times interview, David Shaw head of one of the most successful modern quant
shops and himself an early Tartaglias acolyte, suggests that recent pickings for quantshops have become slim he attributes the success of his firm D.E. Shaw to early entry
into the business. The Tartaglia's own explanation for pairs trading strategy is
psychological. He claims that Human beings don't like to trade against human
nature, which wants to buy stocks after they go up not down. 1 Could pairs traders be
the self-controlled investors taking advantage of the undisciplined over-reaction
displayed by individual investors.
The area of normalized and cum-dividend prices, i.e. cumulative total returns with
dividends re-invested, is the basic space for the pairs trading strategies in this project.
The main observation about our motivating models of the HPCL-BPCL variety is that
they are known to imply perfect collinearity of prices which is readily rejected by the
data. On the other hand, Bossaerts (1988) finds evidence of price co-integration for
the US stock market. We would like to keep the concept of the empirically observed
co-movement of prices, without unnecessarily restrictive assumptions, hence we
proceed in the spirit of the co-integrated prices method.
More precisely, our matching in price space can be interpreted as follows. Suppose
that prices obey a statistical model of the form,
p it = il p lt + it , k < n (1)
where it denotes a weakly dependent error in the sense of Bossaerts (1988). Assume
also that pit is feebly dependent after differencing once. Under these assumptions, the
price vector pt is co-integrated of order 1 with co-integrating rank r = n-k , in the
sense of Engle and Granger (1987) and Bossaerts (1988). Thus, there exist r linearly
independent vectors { q}q=1.. r such that zq = q `pt are weakly dependent. That means,
r linear combinations of prices will not driven by the k common non-stationary
components pl. Also note that this interpretation does not imply that the market is
inefficient, in contrary it says that certain assets are weakly redundant, so that any
deviation of their price from a linear combination of the prices of other assets is
expected to be temporary and returning.
In pair trading to interpret the pairs as co-integrated prices, we need to assume that for
n k, there are some co-integrating vectors which have only two nonzero coordinates.
In the case like this the sum or difference of scaled prices will be reverting to zero and
a trading rule could be constructed to exploit the expected temporary deviations. Our
strategy relies upon exactly same conclusion. In principle we could construct trading
strategies with trios, quadruples, etc. of stocks which would presumably capture more
co-integrated prices and would give better profits.
The hypothesis that a linear combination of two stocks can be weakly dependent may
be understood as saying that a co-integrating vector can be partitioned in two parts,
such that the two corresponding portfolios are priced within a weakly dependent error
of another stock. With given the large universe of stocks, this statement is always
empirically valid and provides the basis of our formation of procedure.
3.5 The Bankruptcy Risk
The unpredicted risk of bankruptcy is one of the reasons why the returns on individual
securities cannot be taken as stationary. Sensitivity of the pairs trading to the default
premium suggests that the strategy can work because we are pairing two firms, the
first of which may have a constant or decreasing probability of bankruptcy (short
end), while the second may have a momentarily increasing probability of bankruptcy
(long end).And the wonder improvements in the short end are then followed by
improvement in the long end if that stock survives. In other words, the source of the
profit is the improving ex-post (non) realization of bankruptcy risk in the long (loser)
stock. In such case, we would expect to have asymmetry in the profits from the long
and the short components, with most of the profits coming from the long end. We
have to test long and short positions separately to see if this is driving our results.
4. Research Methodology.
In this study, we first select the pair of stocks HPCL and BPCL and their historical
prices and then we will check if these two stocks are correlated or not. Once the
correlation is found will run the regression model to confirm the correlation and find
the hedge ratio ,which is nothing but gives the equation on if we have one long unit
of HPCL how many units of BPCL we should sell. This ratio will help in creating the
Spread on the prices of HPCL and BPCL. Now we will be deciding our trading
strategy such that.
1)
2)
3)
For each time point in the time series, calculate the risk-adjusted spread
between the two assets of the pair.
Call the amount the spread deviates from a measure of the historical spread
the signal. If the signal is greater than or equal to the opening threshold,
open a position if not already in one.
If the spread is above its historical mean, then we expect that stock 1 is
overpriced and stock 2 is under-priced. Thus, we short-sell stock 1 and buy
10
4)
5)
6)
stock 2. On the other hand, if the spread is under its historical mean, we
buy stock 2 and short-sell stock 1.
If the signal is less than the closing threshold, close any existing position
in the pair.
If the signal is greater than the stop-loss threshold, we close the position.
If a position is open on the last time point in the data series, we close the
position.
Relation Between the two prices for the yearlong interval 2012-2013 is
ben downloaded by using the Quantmod package in R for the given time
period.
Fig1: It gives the prices plots of the two stocks Red is for HPCL and Green is for BPCL.
The ~ sign is used to separate the independent from the dependent variables. The
expression Stock_y ~ Stock_x is a formula that species the linear model with one
independent variable and an intercept. If we wanted to t the same model, but
without the intercept, we would specify the formula as Stock_y ~ Stock_x 1. This
tells R to omit the intercept (force it to zero).In the trading application we have to run
the model without the intercept as the trader will only be interested on the significance
coefficient of the two stocks and not with the intercept.
Model1 <- lm(pdtHPC ~ pdtBPC -1)
Whenever a regression is performed, it is very important to analyze the residuals (e)
of the tted model. If everything goes according to plan, the residuals will be
normally distributed with no visible pattern in the data, no auto-correlation and no
heteroskedasticity. The residuals can be extracted from the regression object by using
the residuals keyword.
res<- model1$residuals
plot(res)
acf(res)
The given function acf ( ) computes and by default plots the estimates of
the autocovariance or autocorrelation function. And we can check the
correlation of the two stocks with this test.
12
Fig3: The ACF plot to show the Autocorrelation of the two stocks.
The below summary keyword is used to obtain the results of the linear regression
model t.
Summary(model)
Along with the other variables the p-values and t-statistics is used to evaluate the
statistical signicance of the coefcients. The lesser the p-value, the more certain we
are that the coefcient estimate is close to the actual population coefcient. Both the
intercept and the independent variable coefcient is signicant in this example. The
extraction of the coefcients can be done by coefficient variable name.
p-value
Coefficients:
Adjusted R-squared:
< 2.2e-16
1.582053
0.9637
Now that our data frames for HPCL and BPCL are loaded into memory, lets extract
some prices.
The data is in below manner.
BPCL.NS.
BPCL.NS. BPCL.NS. BPCL.NS.C BPCL.NS.Vol BPCL.NS.Adj
Open
High
Low
lose
ume
usted
635.5
635.5
635.5
635.5
0
260.825
635.55
657.6
632
652
1429000
267.597
656.45
656.45
639.2
640.2
1609600
262.754
642
648.4
628.1
629.7
1943100
258.444
631
637.35
614.1
620
1845500
254.463
619
635.95
619
628.9
1144200
258.116
Table 2: The format of the data downloaded from the yahoo Finance by the
QUANTMODE library
The data is on daily basis
OPEN: The price on the open day of the stock market.
HIGH: The highest price the stock reached that day.
LOW: The lowest price the stock reached that day.
CLOSE: The price of the last trade when the market closed that day.
14
p-value
0.008721
Coefficients:
0.14799
Adjusted R-squared:
Table 3: Regression Results on the return of the two stocks.
0.5061
16
We have to decide now our trading strategy ,once the spread will exceeds our
upper threshold, we sell BPCL and buy HPCL. Once the spread drops below
our lower threshold, we buy BPCL and sell HPCL.
Ind_Sell <- which(Spread_T >= meanT + sdT)
Ind_Buy <- which(Spread_T <= meanT - sdT)
for(i in 1:Spread_L) {
spTemp <- Sp_T[i]
if(spTemp < Lower_Thr)
{
if(Total_P <= 0)
{
Total_P <- Total_P + Trade_Qty
Prices_B[i] <- spTemp
}
} else if(spTemp > Upper_Thr)
{
if(Total_P >= 0)
{
Total_P <- Total_P Trade_Qty
Prices_S[i] <- spTemp
}
}
}
18
Fig 7: The graph gives the position (red Dots) where we have open our position of
trading and yellow dot where we have stop trading.
Fig 9: Shows the two stocks APPL and QQQ are moving together .
library(PairTrading)
pair.price<-cbind(tAAPL,tQQQ)
Here we are taking adjusted values of the two stocks which include the adjustment of
the dividends.
reg1<-EstimateParameters(pair.price, method = lm)
Estimate Parameters function calculate the spread of the two stocks and hedge ratio of
the two stocks and the premium. Its a pre-defined function in Library PairTrading.
reg1$hedge.ratio
plot(reg1$spread)
20
Fig 11: This the spread after getting the hedge ratio from the historical values
barplot(signal,col="blue",space = 0, border =
"blue",xaxt="n",yaxt="n",xlab="",ylab="")
par(new=TRUE)
plot.ts(params$spread, type="l", col = "red",
lwd = 3, main = "Spread & Signal")
abline(h = upperThr, col = "blue", lwd = 2)
6.
Future Scope
In this project we have worked on the static model, the hedge ratio is calculated on the
historical values of the prices and the trading strategy is decided for the large interval
of time. Recently, with the growing popularity of HFT, various studies which examine
the applicability of pairs trading strategies to high frequency environments have been
performed. Bowen et al. (2010) examine the sensitivity of high frequency strategies to
market attributes, noting that the primary returns to their strategy arise in the first hour
and last hour of trading days, when trading volume is expected to be highest.
As the future scope we can build the dynamic model to calculate the hedge ratio
dynamically and then changing our trading strategy frequently to maximize the profit.
7. References
The paper of Bakshi, G. and Z. Chen, 1997, Stock Valuation in Dynamic
Economies, working paper, OhioState University.
The literature of DAvolio, G., 2002, The Market for Borrowing Stock, Journal of
Financial Economics, 66,271-306.
The literature of Bossaerts, P., 1988, Common Nonstationary Components of Asset
Prices, Journal of
8. Coding.
#getting better understanding of linear regression
x<- rnorm(1000)
y<- (x-2) + rnorm(1000)
lmout<- lm(y~x)
summary(lmout)
#y=-2.007+.96x
plot (lmout$residuals)
#residual error is high it means that the X and Y are highly corellated
plot(lmout)
#also the residual plot is Random not following any patten
res<- lmout$residuals
plot(res,type="l")
# acf will the bet scenario of checking wheather the varailbles is corellated
acf(res)
lmout$coefficients
require(quantmod)
symbols <- c("HINDPETRO.NS","BPCL.NS")
#write.csv(BPCL.NS,"BPCL.CSV")
#write.csv(BPCL.NS,"BPCL.CSV")
getSymbols(symbols,from='2010-01-01',to = '2013-01-01')
24
summary(HINDPETRO.NS)
HINDPETRO.NS[,6]
tHPC <- HINDPETRO.NS[,6]
tBPC <- BPCL.NS[,6]
plot(tHPC,tBPC)
plot.ts(tHPC,type='l',col="red",main = " Price HPC(RED) vs.
BPC(Green)",ylab='Ad.Price',ylim=c(100,500),xlim=c(0,1000))
par(new=TRUE)
plot.ts(tBPC,type='l',col="green",ylab='Ad.Price',ylim=c(100,500),xlim=c(0,1000))
View(tHPC)
View(tBPC)
length(tBPC)
length(tHPC)
tBPC<- tBPC[1:763]#length(tBPC)
cor(tHPC,tBPC) #0.2831817
#We will use the data to compute a simple hedge ratio
#and then we will apply this hedge ratio to the out of sample data.
#CALCULATE THE RETURN
pdtHPC <- diff(tHPC)[-1]
pdtBPC <- diff(tBPC)[-1]
cor(pdtHPC,pdtBPC)#0.329 there is corelation
# the above plot gives the good understanding of the relationship of both the stocks.
plot.ts(pdtHPC,type='l',col="red",main = " REturn HPC(RED) vs.
BPC(Green)",ylab='RETURN',ylim=c(-15,15),xlim=c(0,10))
par(new=TRUE)
plot.ts(pdtBPC,type='l',col="green",ylab='RETURN',ylim=c(-15,15),xlim=c(0,10))
#build the model
length(pdtHPC)
length(pdtBPC)
model <- lm(pdtHPC ~ pdtBPC -1)
model <- lm(pdtBPC ~ pdtHPC,-1)
summary(model)
model$coefficients[1]
res1<- model$residuals
plot(res1)
acf(res1)
?acf
model$coefficients
hr<- as.numeric(model$coefficients[1])
hr
?acf
spread_T#spread price (in-sample)
#tAAPL = X + hr *tQQQ
# X= tAAPl - hr*tQQQ
# X is nothing but the spread
spread_T <- tHPC - hr * tBPC
fix(spread_T)
# to calculate the mean fo the spread
meanT <- as.numeric(mean(spread_T,na.rm=TRUE)) ; meanT
# to calculate the Standard deveiation for the spread.
sdT <- as.numeric(sd(spread_T,na.rm=TRUE)) ;sdT
#similarlly calculating the first and the second SD for the spread.
upperThr <- meanT + 1 * sdT
lowerThr <- meanT - 1 * sdT
upperThr2 <- meanT + 2 * sdT
lowerThr2 <- meanT - 2 * sdT
sdT
upperThr2
lowerThr2
?abline
plot(spread_T)
spread_T
plot(spread_T, main = "HPC vs. BPC spread (in-sample period)")
abline(h = meanT, col = "red", lwd =2)
abline(h = meanT + 1 * sdT, col = "blue", lwd=1.5)
abline(h = meanT - 1 * sdT, col = "blue", lwd=1.5)
abline(h = upperThr2, col = "blue", lwd=2)
abline(h = lowerThr2, col = "blue", lwd=2)
points(xts(prices_B,index(spread_T)), col="green", cex=1.9, pch=19)
points(xts(prices_S,index(spread_T)), col="red", cex=1.9, pch=19)
points(xts(prices_B,index(spread_T)), col="green", cex=1.9, pch=19)
points(xts(prices_S,index(spread_T)), col="red", cex=1.9, pch=19)
ind_Buy
spread_T[1] ;spread_T[125] #all the values are below the 1sd
spread_T
spread_L <- length(spread_T)
prices_B <- c(rep(NA,spread_L))
prices_S <- c(rep(NA,spread_L))
prices_B
prices_S
sp <- as.numeric(spread_T)
sp
spread_L
View(spread_T)
tradeQty<-1000
totalP <- 0
tradep<- 0
for (i in 1:spread_L) {
spTemp<- sp[i]
if(spTemp < lowerThr) {
if(totalP <= 0){
totalP <- totalP + tradeQty
prices_B[i] <- spTemp
}
} else if(spTemp > upperThr) {
if(totalP >= 0){
totalP <- totalP - tradeQty
prices_S[i] <- spTemp
}
}
}
plot(spread_T, main = "AAPL vs. QQQQ spread (in-sample period)")
abline(h = meanT, col = "red", lwd =2)
abline(h = meanT + 1 * sdT, col = "blue", lwd = 2)
abline(h = meanT - 1 * sdT, col = "blue", lwd = 2)
points(xts(prices_B,index(spread_T)), col="green", cex=1.9, pch=19)
points(xts(prices_S,index(spread_T)), col="red", cex=1.9, pch=19)
?cex
xts(prices_B,index (spread_T) )
?points
# the BELOW CODING IS FOR THE STOCKS AAPL AND QQQ USING
library(PairTrading)
library(PairTrading)
#COMBINING THE PRICES OF HE TWO STOCKS
pair.price<-cbind(tAAPL,tQQQ)
head(pair.price)
summary(pair.price)
tQQQ
return.pairtrading), main =
"Performance of pair trading")
plot(tAAPL)
par(new=TRUE)
plot(tQQQ)
plot.ts(tHPC,type='l',col="red",main = "AAPL(RED) vs. QQQ(Green)")
par(new=TRUE)
plot(tBPC,type='l',col="green")
if(signal[i - 1] == 1){
if(spread[i] >= take.profit.lower){
signal[i] <- 0
}else{
signal[i] <- signal[i - 1]
}
}else if(signal[i - 1] == -1){
if(spread[i] <= take.profit.upper){
signal[i] <- 0
}else{
signal[i] <- signal[i - 1]
}
}
}
}
return(signal)
}
****************************END OF
REPORT**************************