You are on page 1of 50

Automated Trading

Strategies with R
3rd April 2014

Richard Pugh, Commercial Director


rpugh@mango-solutions.com

Agenda

Overview of Mango
Data Analytics
Introduction to Backtesting
The Backtesting Project
Leveraging Oracle R Enterprise
Summary

Overview of Mango Solutions

Mango in a nutshell
Providers of analytic products and services
Specialise in analytic application development
Unique mix of business-focused statisticians and
mainstream software developers
Private company founded in 2002
Offices in UK & China
Global Team of 65 and expanding
ISO 9001 Accredited
Partner with Oracle on R project

Analytic Application Development

Data Analytics

Data Analytics
Companies are awash with structured and
unstructured data
The insight locked in this data can help us to
make better decisions and gain a competitive
advantage
Data Analytics can help to extract the key
information from our data

Data Analytic Examples

Who is a good driver?

How do we win more games?

What bonus should I pay?

Will someone like this?

When might this break?

What are they likely to want?

Challenges of Integrating Analytics


Clear questions are needed
Data may not be analytic-ready
Sophisticated analytics require niche technology
that can be difficult to integrate
The language of analytics can be difficult to
penetrate and requires specialists
Integrating the right analytics is key

Introduction to Backtesting

Introduction to Backtesting
Algorithmic trading makes up a large % of market
trades
Backtesting is the process of testing a trading
strategy using historical data
Allows the development of an automated trading
strategy

Backtesting Example

Buy every stock beginning with A and sell all


stocks beginning with Z
How do we know if this works??

Backtest!!

Backtesting Example

Long
Short

Key Factors in Backtesting

Easy selection and execution of strategies


Performance of backtest
Optimisation across sectors, styles, etc
Comparison with hurdle (e.g. interest rates)
Transaction costs

The Backtesting Project

The Backtesting Project


Mango engaged by a major hedge fund to create
backtest solution
Competitive advantage over off the shelf solution
Particular complexity around transaction cost
(futility switching) and optimisation
Framework with possibility for extensions

The Backtesting Solution


Backtest
Application
Universe
Feed

Graphical User
Interface

Scripting
Interface

Raw Data &


Alpha Storage

Analytic
Engine

Analytic Code
MgMent UI

The Backtesting Solution


Backtest
Application
Universe
Feed

Graphical User
Interface

Scripting
Interface

Raw Data &


Alpha Storage

Analytic
Engine

Analytic Code
MgMent UI

The Backtesting Solution


Backtest
Application
Universe
Feed

Graphical User
Interface

Scripting
Interface

Raw Data &


Alpha Storage

Analytic
Engine

Analytic Code
MgMent UI

The Backtesting Solution


Backtest
Application
Universe
Feed

Graphical User
Interface

Scripting
Interface

Raw Data &


Alpha Storage

Analytic
Engine

Analytic Code
MgMent UI

The Backtesting Solution


Backtest
Application
Universe
Feed

Scripting
Interface

Raw Data &


Alpha Storage

Analytic
Engine

Bespoke C

Graphical User
Interface

Analytic Code
MgMent UI

The Backtesting Project


Outcome
Deemed a success
Used to drive an industry-beating fund
Scripting interface more popular than the
graphical user interface
Code management interface allowed for the
addition of new routines without impact to the rest
of the application

The Backtesting Project


Constraints & Challenges
Performance bottleneck meant restricted to
weekly data
Creation of the C layer for data access was
unexpected
As number of power users increased, more
sophisticated code management would have
helped

Leveraging Oracle R Enterprise

Leveraging Oracle R Enterprise


The project was operated on a shared-cost basis,
with Mango retaining the IP
Mango now looking to further develop the
application and release as a product
ORE identified as perfect way to replace nonperformant parts of the application
Oracle products familiar to Mango

Steps to Integrating with ORE

Use Oracle for Object Management


Replace functions with ORE equivalent
Use embedded scripts for execution
Expose interface as SQL
Build User Interface

ORE Functions
> apropos("^ore")
[1] "OREShowDoc"
[5] "ore.corr"
[9] "ore.datastoreSummary"
[13] "ore.doEval"
[17] "ore.exists"
[21] "ore.getXlevels"
[25] "ore.groupApply"
[29] "ore.indexApply"
[33] "ore.load"
[37] "ore.minute"
[41] "ore.odmAssocRules"
[45] "ore.odmNB"
[49] "ore.predict"
[53] "ore.push"
[57] "ore.rm"
[61] "ore.rollsd"
[65] "ore.save"
[69] "ore.showHiveOptions"
[73] "ore.sync"
[77] "ore.year"

"ore.attach"
"ore.create"
"ore.delete"
"ore.drop"
"ore.frame"
"ore.getXnlevels"
"ore.hash"
"ore.is.connected"
"ore.ls"
"ore.month"
"ore.odmDT"
"ore.odmNMF"
"ore.pull"
"ore.push"
"ore.rollmax"
"ore.rollsum"
"ore.scriptCreate"
"ore.sort"
"ore.tableApply"
"oreOut"

"ore.connect"
"ore.crosstab"
"ore.detach"
"ore.esm"
"ore.freq"
"ore.glm"
"ore.hiveOptions"
"ore.lazyLoad"
"ore.make.names"
"ore.neural"
"ore.odmGLM"
"ore.odmOC"
"ore.pull"
"ore.rank"
"ore.rollmean"
"ore.rollvar"
"ore.scriptDrop"
"ore.stepwise"
"ore.toXML"

"ore.const"
"ore.datastore"
"ore.disconnect"
"ore.exec"
"ore.get"
"ore.glm.control"
"ore.hour"
"ore.lm"
"ore.mday"
"ore.odmAI"
"ore.odmKMeans"
"ore.odmSVM"
"ore.pull"
"ore.recode"
"ore.rollmin"
"ore.rowApply"
"ore.second"
"ore.summary"
"ore.univariate"

Step #1: Oracle for Object Management


Ported the application to use Oracle for object
(data) management
Suite of ore.* functions to allow easy storage /
retrieval of R objects
Immediate benefit in performance for data i/o
Code base simplification (no need for bespoke C
layer)

Step #1: Oracle for Object Management


> writeRdaObject
function(object, fileName, category = "RawData", dataMethod = .backTest$dataMethod, ) {

ore.save(object, name = returnObject, overwrite = TRUE)

}
> loadRdaObject
function(fileName, category = "RawData", dataMethod = .backTest$dataMethod, ) {

get(ore.load(returnObject))

> grep("^RAW", ore.datastore()[[1]], value = TRUE)


[1] "RAWDATA_BRD_NO"
"RAWDATA_BRD_SECT"
"RAWDATA_DIVYIELD"
[5] "RAWDATA_FISCALYR2" "RAWDATA_FY13M"
"RAWDATA_FY23M"
[9] "RAWDATA_HIGHY2"
"RAWDATA_IH6Y1"
"RAWDATA_IH6Y2"

> ore.datastoreSummary("RAWDATA_PRICE")
object.name class
size length row.count col.count
1
getIt matrix 34776172 4336119
3917
1107

"RAWDATA_FISCALYR1"
"RAWDATA_HIGHY1"
"RAWDATA_IH7Y1"

Step #2: Replace Functions with ore*


The ORE library contains many optimised
versions of existing R functions
There are also new functions not available in
Base R
Using these ORE functions improves performance
and simplifies the code base

Step #2: Replace Functions with ore*


> apropos("^ore")
[1] "OREShowDoc"
[5] "ore.corr"
[9] "ore.datastoreSummary"
[13] "ore.doEval"
[17] "ore.exists"
[21] "ore.getXlevels"
[25] "ore.groupApply"
[29] "ore.indexApply"
[33] "ore.load"
[37] "ore.minute"
[41] "ore.odmAssocRules"
[45] "ore.odmNB"
[49] "ore.predict"
[53] "ore.push"
[57] "ore.rm"
[61] "ore.rollsd"
[65] "ore.save"
[69] "ore.showHiveOptions"
[73] "ore.sync"
[77] "ore.year"

"ore.attach"
"ore.create"
"ore.delete"
"ore.drop"
"ore.frame"
"ore.getXnlevels"
"ore.hash"
"ore.is.connected"
"ore.ls"
"ore.month"
"ore.odmDT"
"ore.odmNMF"
"ore.pull"
"ore.push"
"ore.rollmax"
"ore.rollsum"
"ore.scriptCreate"
"ore.sort"
"ore.tableApply"

"ore.connect"
"ore.crosstab"
"ore.detach"
"ore.esm"
"ore.freq"
"ore.glm"
"ore.hiveOptions"
"ore.lazyLoad"
"ore.make.names"
"ore.neural"
"ore.odmGLM"
"ore.odmOC"
"ore.pull"
"ore.rank"
"ore.rollmean"
"ore.rollvar"
"ore.scriptDrop"
"ore.stepwise"
"ore.toXML"

"ore.const"
"ore.datastore"
"ore.disconnect"
"ore.exec"
"ore.get"
"ore.glm.control"
"ore.hour"
"ore.lm"
"ore.mday"
"ore.odmAI"
"ore.odmKMeans"
"ore.odmSVM"
"ore.pull"
"ore.recode"
"ore.rollmin"
"ore.rowApply"
"ore.second"
"ore.summary"
"ore.univariate"

Step #2: Replace Functions with ore*


MMIN <- function (data, Lag, ) {

rMin <- apply(data, 2, ore.rollmin, K = Lag, align = "right")

> myMat
[,1] [,2] [,3] [,4] [,5]
[1,]
4
7
1
1
1
[2,]
2
4
6
2
3
[3,]
4
0
4
2
3
[4,]
2
2
5
4
4
[5,]
4
1
2
2
1
[6,]
5
4
4
4
1
[7,]
2
3
0
0
4
[8,]
4
4
4
4
4

> MMIN(myMat, 3)
[,1] [,2] [,3] [,4] [,5]
[1,]
NA
NA
NA
NA
NA
[2,]
NA
NA
NA
NA
NA
[3,]
2
0
1
1
1
[4,]
2
0
4
2
3
[5,]
2
0
2
2
1
[6,]
2
1
2
2
1
[7,]
2
1
0
0
1
[8,]
2
3
0
0
1

Step #3: Use Embedded Scripts


R Scripts Stored and Managed in the Database
Execution controlled by Oracle Database and
performed on database server
Set of ore.* functions for managing and executing
scripts

Step #3: Use Embedded Scripts


try(ore.scriptDrop("doBacktest"))
ore.scriptCreate("doBacktest", function(alphaName, alphaDesc, alphaCat, alphaFormula,
optimMethod, optimFactors, numBaskets, lowerThreshold, upperThreshold, portName,
portDesc = alphaDesc) {
require(backTest)

# Load the backTest package

# Now do the backtest


myAlpha <- runAlpha(alphaName, alphaDesc, alphaCat, alphaFormula)
myClass <- switch(optimMethod,
"Simple" = Classify(simpleOpt(myAlpha, .myData$Factors[optimFactors]), numBaskets),
Classify(myAlpha, numBaskets))
myPort <- createPort(myAlpha, myClass, lowerThreshold, upperThreshold, Splits=numBaskets)
fullReport(myPort, portName, portDesc, alphaName, optimFactors, optimMethod,
theDir = "/home/oracle/Results", fileName = "backTestReport.pdf")
})

Step #3: Use Embedded Scripts


alphaForm <- c(
"aUpDwnY1 = (IH7Y1-IH8Y1)/pmax(IH7Y1+IH8Y1,IH6Y1)*100",
"aUpDwnY2 = (IH7Y2-IH8Y2)/pmax(IH7Y2+IH8Y2,IH6Y2)*100",
"aUpDwnSc = UPR((UPR(aUpDwnY1)+UPR(aUpDwnY2)))",
"aFyrevs = UPR((UPR(FY13M)+UPR(FY23M)))",
"UPR(aUpDwnSc+aFyrevs)")
res <- ore.doEval(FUN.NAME="doBacktest", ore.connect = TRUE,
alphaName = "aRevSc", alphaDesc = "Simple Revision Alpha", alphaCat = "Revisions",
alphaFormula = alphaForm, optimMethod = "Simple", optimFactors = c("Style", "Sector"),
numBaskets = 5, lowerThreshold = .5, upperThreshold = 2, portName = "OptRevScore")

user
0.134

system elapsed
0.037 240.697

> 240.697/60
[1] 4.01161

An Aside the Backtest Report


try(ore.scriptDrop("doBacktest"))
ore.scriptCreate("doBacktest", function(alphaName, alphaDesc, alphaCat, alphaFormula,
optimMethod, optimFactors, numBaskets, lowerThreshold, upperThreshold, portName,
portDesc = alphaDesc) {
require(backTest)

# Load the backTest package

# Now do the backtest


myAlpha <- runAlpha(alphaName, alphaDesc, alphaCat, alphaFormula)
myClass <- switch(optimMethod,
"Simple" = Classify(simpleOpt(myAlpha, .myData$Factors[optimFactors]), numBaskets),
Classify(myAlpha, numBaskets))
myPort <- createPort(myAlpha, myClass, lowerThreshold, upperThreshold, Splits=numBaskets)
fullReport(myPort, portName, portDesc, alphaName, optimFactors, optimMethod,
theDir = "/home/oracle/Results", fileName = "backTestReport.pdf")
})

An Aside the Backtest Report

Automatically generated and


emailed to fund manager

Another Aside getting interactive!


Results are stored as ore objects in the database
I can access the object for more in-depth analysis
> x <- loadPort("OptRevScore", "aRevSc")
Loading object STRATEGIES_REVISIONS_AREVSC_OPTREVSCORE_PORT
> names(x)
[1] "baskets" "bRets"
"alpha"
[8] "turnOver" "costData

"relRets"

"hMat"

"classed"

"tCosts"

> ls("package:backTest", pattern = "*lot")


[1] "alphaPlot"
"dayPlot"
"monthPlot"
"pairsPlot"
"plotPort"
[6] "qRetsPlot"
"qSharpePlot"
"qTranCostPlot" "qTurnOverPlot" "qVolsPlot"
[11] "textPlot"
"turnOverPlot"

Another Aside getting interactive!


> plotPort(x, removeTcosts = TRUE, title = "Simple Optimised Revision Strategy")

Another Aside getting interactive!


> textPlot(x)

Another Aside getting interactive!


> monthPlot(x, "2012-01")

Another Aside getting interactive!


> pairsPlot(x, start = "2010-01-1")

Step #3: Expose via SQL Interface


R Scripts Stored and Managed in the Database
Execution controlled by Oracle Database and
performed on database server
Set of ore.* functions for managing and executing
scripts
Outputs can be stored as XML or PNG (blobs)

Updated Application
Backtest
Application
Universe
Feed

Scripting
Interface

Raw Data &


Alpha Storage

Analytic
Engine

Bespoke C

Graphical User
Interface

Analytic Code
MgMent UI

Updated Application
Backtest
Application
Universe
Feed

Graphical User
Interface

Scripting
Interface

Raw Data &


Alpha Storage

Analytic
Engine

Analytic Code
MgMent UI

Updated Application
Backtest
Application
Universe
Feed

Graphical User
Interface

SQL
Interface

Embedded
Scripts

Raw Data &


Alpha Storage

Analytic
Engine

Analytic Code
Interface

Benefits of ORE
Significant immediate benefits in performance and
code management
Database script management makes deployment
very simple
Script and SQL interfaces allow for close
integration into business processes in a controlled
manner

Summary

Summary
Oracle R Enterprise provides a sophisticated platform
for integrating R into business processes
Adds scalability and performance improvements to
flexible R environment
Integrating a legacy application with ORE proved to
be easy to achieve
We have this running on demo servers if you want to
see it .

Discussion

You might also like