Algorithm Zoo

Capturing Markets with Code

Top 10 March returns in SPX in less than 6 lines

| Comments

Time to look into Julia for your financial datamining adventures. The following snippet uses Julia release-0.1, and will not work the cutting-edge Julia.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

julia> using TradingInstrument

julia> spx = yahoo("^GSPC", 1,1,1900, 3,5,2013, "m");

julia> sp = simple_return!(spx, "Close");

julia> s = indexmonth(sp, 3);

julia> sortby!(s, [(:Close_RET, Sort.Reverse)]);

julia> head( s[["Date", "Close_RET"]], 10)
10x2 DataFrame:
               Date Close_RET
[1,]     2000-03-01 0.0967199
[2,]     2009-03-02 0.0854045
[3,]     1956-03-01 0.0692545
[4,]     2010-03-01 0.0587964
[5,]     1979-03-01 0.0551516
[6,]     1986-03-03 0.0527939
[7,]     1998-03-02 0.0499457
[8,]     1952-03-03 0.0477214
[9,]     1967-03-01   0.03941
[10,]    1999-03-01 0.0387942

thymeforjulia

| Comments

She’s still in chef school, but this Julia will graduate sooner than many are expecting. She’s at the head of her class, has the basics down and is well on her way to making mouth-watering packages. A nice start for trader/hackers is the Thyme package. It’s not the perfect omelette, but it’s a tasty start. You can get your favorite csv data into a DataFrame and make some basic transformations to it, such as calculating returns, moving averages, equity curves and lag/lead observations.

There is already similar functionality in R and Python’s pandas, so we’ll take a look at all three at the end in a little competition. Let’s use a common dataset for comparison, Yahoo’s GSPC time series from 1950 to 2012. This dataset is SPX daily data and includes 15,851 rows.

To demonstrate our first Thyme function, let’s import that data into Julia.

Julia console
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
julia> require("Thyme")
julia> using Thyme

julia> spx = read_stock("GSPC.csv"); # no function to import over http yet 

julia> require("DataFrames")
julia> using DataFrames

julia> head(spx)
6x7 DataFrame:
              Date  Open  High   Low Close  Volume Adj Close
[1,]    1950-01-03 16.66 16.66 16.66 16.66 1260000     16.66
[2,]    1950-01-04 16.85 16.85 16.85 16.85 1890000     16.85
[3,]    1950-01-05 16.93 16.93 16.93 16.93 2550000     16.93
[4,]    1950-01-06 16.98 16.98 16.98 16.98 2010000     16.98
[5,]    1950-01-09 17.08 17.08 17.08 17.08 2520000     17.08
[6,]    1950-01-10 17.03 17.03 17.03 17.03 2160000     17.03

R’s quantmod and Python’s pandas have similar methods of getting this data into the proper structure. Here’s a hint on how that’s done.

ipython console
1
2
3
ln [1]: from pandas import *
ln [2]: from pandas.io.data import DataReader
ln [3]: spx = DataReader("^GSPC", "yahoo", datetime(1950,1,1), datetime(2012,12,31))
R console
1
2
3
R> require(quantmod)
R> getSymbols("^GSPC", from="1950-01-01", to="2012-12-31")
R> spx = GSPC

The basic transformations you can do with quantmod and pandas can be done with Thyme. Lagging and leading functions are not in Julia base, but they are in Thyme

Julia console
1
2
3
4
5
6
7
8
9
10
11
12
13
julia> lag(spx["Close"], 2)
15851-element Float64 DataArray
 NA
 NA
 16.66
 

julia> lead(spx["Close"], 2)
15851-element Float64 DataArray
 
 1426.19
 NA
 NA

The lag and lead functions also have bang versions, lag! and lead!. These modify the DataFrame by adding a column.

Julia console
1
2
3
4
5
6
7
8
julia> lag!(spx, "Close", 2)
15851x8 DataFrame:
              Date  Open  High   Low Close  Volume Adj Close Close_lag_2
[1,]    1950-01-03 16.66 16.66 16.66 16.66 1260000     16.66          NA
[2,]    1950-01-04 16.85 16.85 16.85 16.85 1890000     16.85          NA
[3,]    1950-01-05 16.93 16.93 16.93 16.93 2550000     16.93       16.66
[4,]    1950-01-06 16.98 16.98 16.98 16.98 2010000     16.98       16.85
 

Every quant wants to know about returns. There are two that you’d expect, log_return and simple_return, and a speciality function called equity that generates an equity curve. log_return and simple_return are padded with 0.0 instead of NA while equity is padded with NA. All three of these functions also have bang versions, which we’ll use to continue to modify our spx dataset.

Julia console
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
julia> log_return!(spx, "Close")
15851x8 DataFrame:
              Date  Open  High   Low Close  Volume Adj Close Close_lag_2   Close_ret
[1,]    1950-01-03 16.66 16.66 16.66 16.66 1260000     16.66          NA         0.0
[2,]    1950-01-04 16.85 16.85 16.85 16.85 1890000     16.85          NA     0.01134
[3,]    1950-01-05 16.93 16.93 16.93 16.93 2550000     16.93       16.66  0.00473654
[4,]    1950-01-06 16.98 16.98 16.98 16.98 2010000     16.98       16.85  0.00294898
 


julia>  equity!(spx, "Close")
15851x8 DataFrame:
              Date  Open  High   Low Close  Volume Adj Close Close_lag_2   Close_ret Close_equity
[1,]    1950-01-03 16.66 16.66 16.66 16.66 1260000     16.66          NA         0.0           NA
[2,]    1950-01-04 16.85 16.85 16.85 16.85 1890000     16.85          NA     0.01134       1.0114
[3,]    1950-01-05 16.93 16.93 16.93 16.93 2550000     16.93       16.66  0.00473654      1.01621
[4,]    1950-01-06 16.98 16.98 16.98 16.98 2010000     16.98       16.85  0.00294898      1.01921
 

Moving averages are another staple in the quant cookbook. Instead of creating a special function called sma, Thyme generalizes the function in moving and allows the passing of any valid function. To create a simple moving average, pass in mean. You can also pass in max, min, var, kurtosis, skewness, etc. Let’s use the bang version on our spx object.

Julia console
1
2
3
4
5
6
7
8
julia> moving!(spx, "Adj Close", mean, 2)
15851x8 DataFrame:
             Date  Open  High   Low Close  Volume Adj Close Close_lag_2   Close_ret Close_equity mean_2
[1,]    1950-01-03 16.66 16.66 16.66 16.66 1260000     16.66          NA         0.0           NA     NA
[2,]    1950-01-04 16.85 16.85 16.85 16.85 1890000     16.85          NA     0.01134       1.0114 16.755
[3,]    1950-01-05 16.93 16.93 16.93 16.93 2550000     16.93       16.66  0.00473654      1.01621  16.89
[4,]    1950-01-06 16.98 16.98 16.98 16.98 2010000     16.98       16.85  0.00294898      1.01921 16.955
 

Time for some speed trials. Let’s take the original dataset for the challenge. Thyme will use the moving function and pass in the base Julian skewness. For R we’ll use zoo::rollapply and pass in the PerformanceAnalytics::skewness function. pandas has a dedicated function for this called rolling_skew. Let the Iron Programming Languages begin!

Julia console
1
2
3
julia> @elapsed moving!(spx, "Close", skewness, 100)

0.5475790500640869
R console
1
2
3
R> system.time(rollapply(Cl(GSPC), FUN=kurtosis, width=100))
   user  system elapsed
 21.879   0.150  22.306
ipython console
1
2
In [5]: %timeit rolling_skew(spx["Close"], 100)
1000 loops, best of 3: 1.31 ms per loop

R is a bit slow. Mainly because rollapply uses an R loop so this really isn’t fair. Python is so fast that you need to use timeit. I’m sure it’s a Cython loop and not a Python loop. Julia’s Thyme package did okay. Not even close to pandas but over 40 times faster than the R method.

The nice thing about moving! is that, apart from some housekeeping code, it’s a fairly simple function. It’s a loop defined in a nice single line of code. The entire code below shows moving! calling mvg. This was done for DRY reasons since the mvg code is also in moving. But aside from some (admittedly) bizarre and expensive NA padding, it’s fairly straight-forward.

moving! function
1
2
3
4
5
6
7
8
9
10
11
function mvg(x,f,n)
  foo = [f(x[i:i+(n-1)]) for i=1:length(x)-(n-1)]
  bar = [nas(DataVector[float(n)], n-1) ; float(foo)]
end

function moving!(df::DataFrame, col::ASCIIString, f, n::Int64)
  new_col = strcat(string(f), "_", string(n))
  within!(df, quote
         $new_col = $mvg($df[$col], $f, $n)
         end);
end

Julia is the new kid in the kitchen. Time to start paying attention. She may cook your bacon when you’re not looking.