Machine Learning Strategy [CHI]
金融数据分析大作业 금융데이터분석 수업 프로젝트
1. 데이터 설명
- stockclose.csv : 1990-12-19 부터 2019-10-16 까지 중국 주식 3660 항목의 매일 종가 데이터
- sh000001 : 1990-12-19 부터 2019-10-16 까지 沪深300(후선300)의 매일 개장가, 최고가, 최저가, 종가, 거래량 데이터. 1990-12-19의 종가를 100으로 기준
2. 배경
| Name of technical indicator: | Type of technical indicator: | 
|---|---|
| Momentum | Oscillators | 
| Accumulation Distribution Oscillator | Oscillators | 
| Fast Stochastics %K | Stochastics | 
| Median Price | Indicators | 
| Fast Stochastics %D | Stochastics | 
| Negative Volume Index | Indexes | 
| Highest High | Indicators | 
| Positive Volume Index | Indexes | 
| Lowest Low | Indicators | 
| Slow Stochastics %K | Stochastics | 
| Price and Volume Trend | Indicators | 
| Slow Stochastics %D | Stochastics | 
| Accumulation Distribution Line | Oscillators | 
| Acceleration Between Times | Oscillators | 
| Relative Strength Index | Indexes | 
| Bollinger Bands | Indicators | 
| Volume Rate of Change | Indicators | 
| PercentB | Indicators | 
| Price Rate of Change | Indicators | 
| Bandwidth | Indicators | 
| On-Balance Volume | Indicators | 
| Volatility | Stochastics | 
| Chaikin Volatility | Stochastics | 
| William’s % R | Stochastics | 
| William’s Accumulation Distribution Line | Indicators | 
- 수많은 스토캐스틱 지수를 통해 주가의 상승과 하락을 예측할 수 있다. 머신러닝을 이용해 이 지수들의 예측 성능을 평가하고 최적의 매수, 매도 포지션 포트폴리오를 제작한다.
3. 문제 접근
- 임의의 종목 하나를 골라(000001.SZ) 탐색적 데이터 분석을 한다. 매일 종가 데이터를 통사용해 로그 수익률 계산 후 매주, 매월, 매분기, 매년 수익률 분석을 한다 (수익률 분포 및 샤프비율).
- 이동평균선을 이용하여 매수 매도 신호 칼럼 추가, 실제 신호대로 거래했을 시의 수익률 분석
- 스토캐스틱 지수를 사용하여 (Fast Stochastics, Slow Stochastics, William’s R, Ex-post Volatility) 과거 한달의 데이터(20일)를 가지고 다음날의 주가 상승, 하락 수준을 예측하는 칼럼을 추가한다.
- 실제로 주가가 하락했는지 상승했는지 표시해주는 label칼럼을 추가한 후, decision tree를 이용해 지표의 성능을 평가한다. 또 여러 rolling window와 모델의 파라미터 변경으로 최적의 모델을 찾는다.
- 6개의 지표(Price and Volume Trend, On-Balance Volume, Price Rate of Change, Volume Rate of Change, Accumulation Distribution Line, William’s Accumulation Distribution Line)를 추가한 후 decision tree로 성능을 확인한다.
- 최종적으로 adaboost와 random forest로도 성능을 평가해보고 전략 포트폴리오를 만든다.
4. 결론
- 이동평균선의 매수, 매도 신호를 사용한 전략은 주가가 하락할 때 반응이 신속하지 못해 손실 발생 가능성이 있지만, 상승할 때는 그만큼 초과 수익을 얻기가 쉽다. 연 수익률은 0.71%이 나왔다. 좋은 전략으로 보이지는 않았다.
- 20일의 rolling window를 가진 4개의 지표의 분류 성능은 67%가 나왔다. 상당히 좋은 분류 성능이다.
- 2개월, 1분기, 반년, 1년의 다른 rolling window를 주어서 cross validation 검증을 했을 때 모델의 성능은 80%까지 올라갔고 이때의 rolling window는 2개월(40일)이였고, Decision tree의 최적의 깊이는 4였다.
- 6개의 지표를 추가해 총 10개의 지표로 예측 했을 때는 성능이 75%가 나왔다. 최적의 파라미터 값을 찾아 실행했을 때는 81%가 나왔다.
- Adaboost와, Random Forest를 사용했을 때는 각각 성능이 83%, 82%가 나왔다. 
import numpy as np
import pandas as pd
import os
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import acf
%matplotlib inline
closes = pd.read_csv("stockclose.csv", index_col=0, parse_dates=True)
closes.head()
| 000001.SZ | 000002.SZ | 000004.SZ | 000005.SZ | 000006.SZ | 000007.SZ | 000008.SZ | 000009.SZ | 000010.SZ | 000011.SZ | ... | 688020.SH | 688022.SH | 688028.SH | 688029.SH | 688033.SH | 688066.SH | 688088.SH | 688122.SH | 688333.SH | 688388.SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1990-12-19 | NaN | NaN | NaN | 3.83 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 
| 1990-12-20 | NaN | NaN | NaN | 3.83 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 
| 1990-12-21 | NaN | NaN | NaN | 3.83 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 
| 1990-12-24 | NaN | NaN | NaN | 3.83 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 
| 1990-12-25 | NaN | NaN | NaN | 3.83 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 
5 rows × 3660 columns
stockIndex = '000001.SZ'
asset = closes[stockIndex].dropna()['1999':]
plt.style.use("bmh")
asset.plot()
plt.legend()
asset
# plt.style.available
1999-01-04    14.55
1999-01-05    14.49
1999-01-06    14.58
1999-01-07    14.90
1999-01-08    15.04
              ...  
2019-10-25    16.88
2019-10-28    16.66
2019-10-29    16.91
2019-10-30    16.43
2019-10-31    16.26
Name: 000001.SZ, Length: 5042, dtype: float64
我们任选一只股票进行分析,先获取其至今的收盘价序列,可以看出其股价从发行开始随着宏观经济状况有明显的波动。
如其在08年金融危机后有明显的跌落,98年亚洲金融危机前后股价也有明显降幅,近些年股价处于较低水平。
以下展示其收益率分布:
from math import log
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
def cal_log_return(ts, name):
    def freq_log_return(lts, ts, freq):
        return lts.resample(freq).last() - lts.resample(freq).last().shift()
    lts = ts.map(log)
    lrweek = freq_log_return(lts, ts, "W")
    lrmonth = freq_log_return(lts, ts, "M")
    lrquarter = freq_log_return(lts, ts, "Q")
    lryear = freq_log_return(lts, ts, "Y")
    lr = [lrweek, lrmonth, lrquarter, lryear]
    fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6), sharex=True)
    freqs = ["周收益率", "月收益率", "季度收益率", "年收益率"]
    for i in range(2):
        for j in range(2):
            axes[i][j].plot(lr[i*2+j])
            axes[i][j].set_title(freqs[2*i+j])
    fig.suptitle(name, fontsize=20)
    return np.array(lr)
lr = cal_log_return(asset, stockIndex)
<ipython-input-4-8ae3e31df5e7>:23: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray return np.array(lr)
def distr(lrlist, name):
    fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))
    fig.suptitle(name+"收益率分布", fontsize=20)
    freqs = ["周", "月", "季度", "年"]
    for i in range(2):
        for j in range(2):
            axes[i][j].hist(lrlist[2*i+j], bins=int(110 / (10)**i))
            axes[i][j].set_title(freqs[2*i+j] + "分布")
    plt.show()
distr(lr, stockIndex)
夏普比和最大回撤¶
以一年为滑动窗口,计算A资产年化的Sharp比,计算最大回撤比例,并画出相应的图形;
rolling_window = 252  # 一年为滑动窗口
return_series = asset.pct_change()
# 一年滑动夏普比
anualMean = return_series.rolling(
    window=rolling_window, min_periods=1, center=False).mean()
anualStd = return_series.rolling(
    window=rolling_window, min_periods=1, center=False).std()
Sharpratio = np.sqrt(rolling_window)*(anualMean/anualStd)['2000':]
Sharpratio.plot()
plt.legend()
plt.title("Sharp Ratio_One Year RW", fontsize=16)
Text(0.5, 1.0, 'Sharp Ratio_One Year RW')
# Calculate the max drawdown in the past window days for each day
rolling_max = asset.rolling(rolling_window, min_periods=1).max()
daily_drawdown = asset/rolling_max - 1.0
# Calculate the minimum (negative) daily drawdown
max_daily_drawdown = daily_drawdown.rolling(rolling_window, min_periods=1).min()
# Plot the results
daily_drawdown['2000':].plot()
max_daily_drawdown['2000':].plot()
# Show the plot
plt.title("年滑动窗口最大回撤", fontsize=15)
plt.show()
移动平均线策略及其评估¶
short_window = 20 
long_window = 120
signals = pd.DataFrame(index = asset.index)
signals['signal'] = 0
# Create short simple moving average over the short window
signals['short_mavg'] = asset.rolling(short_window, min_periods=1, center=False).mean()
# Create long simple moving average over the long window
signals['long_mavg'] = asset.rolling(long_window, min_periods=1, center=False).mean()
signals.plot()
<AxesSubplot:>
- Generate Signals when short window average hits the long window average
signals['signal'][short_window:] = np.where(signals['short_mavg'][short_window:] 
                                            > signals['long_mavg'][short_window:], 1.0, 0.0)
plt.scatter(signals.index, signals['signal'].values)
<ipython-input-9-36d604b1025f>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy signals['signal'][short_window:] = np.where(signals['short_mavg'][short_window:] C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:670: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy iloc._setitem_with_indexer(indexer, value)
<matplotlib.collections.PathCollection at 0x25a4fe02a30>
signals['positions'] = signals['signal'].diff()
signals['positions']
1999-01-04    NaN
1999-01-05    0.0
1999-01-06    0.0
1999-01-07    0.0
1999-01-08    0.0
             ... 
2019-10-25    0.0
2019-10-28    0.0
2019-10-29    0.0
2019-10-30    0.0
2019-10-31    0.0
Name: positions, Length: 5042, dtype: float64
- 画出信号图
# Initialize the plot figure
fig = plt.figure()
# Add a subplot and label for y-axis
ax1 = fig.add_subplot(111,  ylabel='Price')
# Plot the short and long moving averages
signals[['short_mavg', 'long_mavg']].plot(ax=ax1, lw=2.)
# Plot the buy signals
ax1.plot(signals.loc[signals.positions == 1.0].index, 
         signals.short_mavg[signals.positions == 1.0],
         '^', markersize=8, color='red')
         
# Plot the sell signals
ax1.plot(signals.loc[signals.positions == -1.0].index, 
         signals.short_mavg[signals.positions == -1.0],
         'v', markersize=7, color='green')
         
# Show the plot
plt.show()
Back-testing¶
# Set the initial capital
initial_capital= float(10000.0)
# Create a DataFram `positions`
positions = pd.DataFrame(index=signals.index).fillna(0.0)
# Buy a 100 shares
positions['Value'] = 100*signals['signal']  
# Initialize the portfolio with value owned   
portfolio = positions.multiply(asset, axis=0)
portfolio.tail() 
| Value | |
|---|---|
| 2019-10-25 | 1688.0 | 
| 2019-10-28 | 1666.0 | 
| 2019-10-29 | 1691.0 | 
| 2019-10-30 | 1643.0 | 
| 2019-10-31 | 1626.0 | 
pos_diff = positions.diff()
# Add `holdings` to portfolio
portfolio['holdings'] = (positions.multiply(asset, axis=0)).sum(axis=1)
portfolio['cash'] = initial_capital - (pos_diff.multiply(asset, axis =0)).sum(axis=1).cumsum()
# Add `total` to portfolio
portfolio['total'] = portfolio['cash'] + portfolio['holdings']
# Add `returns` to portfolio
portfolio['returns'] = portfolio['total'].pct_change()
# Print the first lines of `portfolio`
print(portfolio.tail())
Value holdings cash total returns 2019-10-25 1688.0 1688.0 9967.0 11655.0 0.000086 2019-10-28 1666.0 1666.0 9967.0 11633.0 -0.001888 2019-10-29 1691.0 1691.0 9967.0 11658.0 0.002149 2019-10-30 1643.0 1643.0 9967.0 11610.0 -0.004117 2019-10-31 1626.0 1626.0 9967.0 11593.0 -0.001464
- 组合价值变动图
# Create a figure
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='组合价值(元)')
# Plot the equity curve in dollars
portfolio['total'].plot(ax=ax1, lw=2.)
ax1.plot(portfolio.loc[signals.positions == 1.0].index, 
         portfolio.total[signals.positions == 1.0],
         '^', markersize=10, color='red')
ax1.plot(portfolio.loc[signals.positions == -1.0].index, 
         portfolio.total[signals.positions == -1.0],
         'v', markersize=10, color='green')
plt.title("组合价值变动图", fontsize = 15)
# Show the plot
plt.show()
# Isolate the returns of your strategy
returns = portfolio['returns']
# annualized Sharpe ratio
sharpe_ratio = np.sqrt(252) * (returns.mean() / returns.std())
# Print the Sharpe ratio
print(sharpe_ratio)
0.1752452210952673
# Define a trailing 252 trading day window
window = 252
# Calculate the max drawdown in the past window days for each day
rolling_max = portfolio['total'].rolling(window, min_periods=1).max()
daily_drawdown = portfolio['total']/rolling_max - 1.0
# Calculate the minimum (negative) daily drawdown
max_daily_drawdown = daily_drawdown.rolling(window, min_periods=1).min()
# Plot the results
daily_drawdown.plot()
max_daily_drawdown.plot()
# Show the plot
plt.show()
# Get the number of days in `aapl`
days = (portfolio['total'].index[-1] - portfolio['total'].index[0]).days
# Calculate the CAGR 
cagr = ((((portfolio['total'][-1]) / portfolio['total'][1])) ** (365.0/days)) - 1
# Print CAGR
print("年化收益率:",cagr)
年化收益率: 0.007119633448701146
Summary
由上可以看出策略年化收益率并不高(不足1%),在这支股票上表现不佳
Stochastics¶
- Stochastics: The different indicators are very capable of signalling when the market is considered overbought or oversold as well as upward and downward trend patterns.
我主要分析Stochastics 指标并以此构建决策树。分析指标包括
- Fast Stochastics
- Slow Stochastics
- William's R
- Ex-post Volatility
所用数据是上证指数的股价及成交量数据
Fast Stochastic
asset.head()
1999-01-04 14.55 1999-01-05 14.49 1999-01-06 14.58 1999-01-07 14.90 1999-01-08 15.04 Name: 000001.SZ, dtype: float64
asset = pd.read_excel("sh000001.xlsx", index_col=0, parse_dates=True).dropna(axis=1)
asset.head()
asset['close'].plot()
plt.title("资产价值变动趋势", fontsize = 15)
Text(0.5, 1.0, '资产价值变动趋势')
window = 20 #月频
start_index = '1996'
Pc = asset['close']
Pl = asset['low'].rolling(window, min_periods=1, center=False).min()
Ph = asset['high'].rolling(window, min_periods=1, center=False).max()
fastStochastics = (Pc - Pl)/ (Ph - Pl)*100
plt.figure(figsize=(24,4))
fastStochastics['2010':].plot()
plt.title("Fast Stochastics", fontsize=17)
# 加入原DataFrame
asset['FastStochastics'] = fastStochastics[start_index:]
William's R
plt.figure(figsize=(24,4))
perc_R = (Ph - Pc.shift(window-1))/(Ph-Pl) *(-100)
perc_R['2010':].plot()
plt.title("William's R", fontsize = 17)
asset["WilliamsR"] = perc_R[start_index:]
Slow Stochastics
slowStochastics = fastStochastics.rolling(3, min_periods=1, center=False).mean()
plt.figure(figsize=(24,4))
slowStochastics['2010':].plot()
plt.title("Slow Stochastics", fontsize=17)
asset['SlowStochastics'] = slowStochastics[start_index: ]
#Set Label
asset['month_return'] = asset['close'].shift(-20)/asset['close'] -1
asset['label'] = 1 * (asset['month_return'] > 0.05) + (-1) * (asset['month_return'] < -0.05)
asset[start_index:].head()
| open | high | low | close | amount | vol | FastStochastics | WilliamsR | SlowStochastics | month_return | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||
| 1996-01-02 | 550.26 | 550.26 | 537.38 | 537.87 | 223.12 | 48247800 | 0.462351 | -5.793546 | 1.497209 | -0.009854 | 0 | 
| 1996-01-03 | 535.23 | 542.74 | 530.79 | 542.42 | 252.57 | 55619200 | 10.331349 | -13.262859 | 4.500095 | -0.009347 | 0 | 
| 1996-01-04 | 541.94 | 558.94 | 539.76 | 558.76 | 451.49 | 110591300 | 24.846762 | -10.233632 | 11.880154 | -0.039516 | 0 | 
| 1996-01-05 | 559.03 | 561.36 | 536.12 | 536.37 | 597.19 | 138637100 | 4.956916 | -5.845252 | 13.378342 | -0.000615 | 0 | 
| 1996-01-08 | 533.33 | 540.04 | 527.91 | 539.17 | 287.29 | 69620700 | 10.109535 | -17.983480 | 13.304404 | -0.030269 | 0 | 
Ex-post Volatility
ex_volatility = asset['month_return'].rolling(window, min_periods=1,center=False).std()[start_index:]
asset['Ex-postVolatility'] = ex_volatility[start_index:]
plt.figure(figsize = (12,6))
ex_volatility.plot()
<AxesSubplot:xlabel='date'>
以上是所计算的四个Stochastic指标的图片
由于前三个指标 波动过于剧烈,若画在同一幅图上意义不大,因而只截取一段时间
- 加入均线信号
short_window = 5
mid_window = 20
long_window = 250
signals = pd.DataFrame(index = asset.index)
asset['short_signal'] = 0.0
asset['long_signal'] = 0.0
signals['short_mavg'] = asset['close'].rolling(window=short_window, min_periods=1, center=False).mean()
signals['mid_mavg'] = asset['close'].rolling(window=mid_window, min_periods=1, center=False).mean()
signals['long_mavg'] = asset['close'].rolling(window=long_window, min_periods=1, center=False).mean()
signals
| short_mavg | mid_mavg | long_mavg | |
|---|---|---|---|
| date | |||
| 1990-12-19 | 100.000000 | 100.000000 | 100.000000 | 
| 1990-12-20 | 102.195000 | 102.195000 | 102.195000 | 
| 1990-12-21 | 104.506667 | 104.506667 | 104.506667 | 
| 1990-12-24 | 107.017500 | 107.017500 | 107.017500 | 
| 1990-12-25 | 109.664000 | 109.664000 | 109.664000 | 
| ... | ... | ... | ... | 
| 2019-10-10 | 2924.700000 | 2977.107500 | 2835.175720 | 
| 2019-10-11 | 2932.998000 | 2976.497500 | 2835.945800 | 
| 2019-10-14 | 2953.536000 | 2976.911500 | 2836.750080 | 
| 2019-10-15 | 2969.032000 | 2975.227000 | 2837.547160 | 
| 2019-10-16 | 2979.802000 | 2973.102500 | 2838.176600 | 
7046 rows × 3 columns
asset['short_signal'].iloc[short_window:] = np.where(signals['short_mavg'][short_window:] > signals['mid_mavg'][short_window:], 1.0, 0.0)
asset['long_signal'].iloc[mid_window:] = np.where(signals['mid_mavg'][mid_window:] > signals['long_mavg'][mid_window:], 1.0, 0.0)
asset.head()
C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:670: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy iloc._setitem_with_indexer(indexer, value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:670: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy iloc._setitem_with_indexer(indexer, value)
| open | high | low | close | amount | vol | FastStochastics | WilliamsR | SlowStochastics | month_return | label | Ex-postVolatility | short_signal | long_signal | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||
| 1990-12-19 | 96.05 | 100.00 | 95.79 | 100.00 | 0.49 | 126000 | NaN | NaN | NaN | 0.342500 | 1 | NaN | 0.0 | 0.0 | 
| 1990-12-20 | 104.30 | 104.39 | 99.98 | 104.39 | 0.09 | 19700 | NaN | NaN | NaN | 0.285947 | 1 | NaN | 0.0 | 0.0 | 
| 1990-12-21 | 109.07 | 109.13 | 103.73 | 109.13 | 0.02 | 2800 | NaN | NaN | NaN | 0.230093 | 1 | NaN | 0.0 | 0.0 | 
| 1990-12-24 | 113.57 | 114.55 | 109.13 | 114.55 | 0.03 | 3200 | NaN | NaN | NaN | 0.167351 | 1 | NaN | 0.0 | 0.0 | 
| 1990-12-25 | 120.09 | 120.25 | 114.55 | 120.25 | 0.01 | 1500 | NaN | NaN | NaN | 0.107443 | 1 | NaN | 0.0 | 0.0 | 
Decision Tree
- 训练一个简单的决策树模型,用18年前的所有数据预测此后的股票收益率
from IPython.display import Image
from sklearn import tree
import pydotplus
%matplotlib inline
asset = asset[:'2019-09']
end_index = '2018'
train_data = asset[start_index:end_index]
test_data = asset[end_index:]
clf = tree.DecisionTreeClassifier(max_depth=4)
clf = clf.fit(train_data[['FastStochastics', 'SlowStochastics', 'WilliamsR', 
                          'Ex-postVolatility', 'short_signal', 'long_signal']].values,
              train_data['label'].values)
# 注意:
dot_data = tree.export_graphviz(clf, out_file="temptree.dot", feature_names=["FastStochastics", "SlowStochastics", "WilliamsR", 
                                                                             "Ex-postVolatility",'short_signal','long_signal'],
                                class_names=['-1', '0', '1'],
                                filled=True, rounded=True)
dot_file = r'.\temptree.dot'
graph = pydotplus.graph_from_dot_file(dot_file)
Image(graph.create_png())
clf.score(test_data[['FastStochastics', 'SlowStochastics', 'WilliamsR',
                     'Ex-postVolatility','short_signal','long_signal']].values, test_data['label'].values)
0.6666666666666666
Note
注意: 此处的样本外预测准确率为67%,似乎已经是不错的结果了。 但是通过后面小组作业中的尝试我们可以看到这个数值在rolling window 较小的条件下还能够变得更大
优化模型
通过cross-validation 选择最优参数,在这里的模型中主要的最优参数为最大深度max_depth
from sklearn.model_selection import train_test_split
X = asset[start_index:][['FastStochastics','SlowStochastics','WilliamsR','Ex-postVolatility']]
Y = asset.label[start_index:].values
X_train,X_test, Y_train, Y_test = train_test_split(X, Y,test_size=0.33,random_state=10)
from sklearn.pipeline import Pipeline
pipeline = Pipeline([('clf',tree.DecisionTreeClassifier(criterion='gini'))]) 
from sklearn.model_selection import GridSearchCV
hyperparameters = { 
    'clf__max_depth': (3, 4, 5, 6, 8, 9) 
}  
clf = GridSearchCV(pipeline, hyperparameters, cv=10, n_jobs=1,verbose=1)
#print(Y_test)
clf.fit(X_train, Y_train)
Fitting 10 folds for each of 6 candidates, totalling 60 fits
GridSearchCV(cv=10,
             estimator=Pipeline(steps=[('clf', DecisionTreeClassifier())]),
             n_jobs=1, param_grid={'clf__max_depth': (3, 4, 5, 6, 8, 9)},
             verbose=1)
Report The Model
from sklearn.metrics import classification_report
print ('最佳效果:%0.3f' %clf.best_score_)  
print('最优参数')  
print(clf.best_params_)
predictions = clf.predict(X_test) 
#print(clf.predict(X_test))
print(classification_report(Y_test, predictions)) 
clf.predict(X_test) 
最佳效果:0.589
最优参数
{'clf__max_depth': 4}
              precision    recall  f1-score   support
          -1       0.47      0.22      0.30       372
           0       0.64      0.80      0.71      1037
           1       0.43      0.38      0.40       492
    accuracy                           0.58      1901
   macro avg       0.51      0.47      0.47      1901
weighted avg       0.55      0.58      0.55      1901
array([0, 0, 1, ..., 0, 1, 1])
Strategy¶
这一部分详细地完成了一个策略流程,包括对不同时间rolling window 和 超参数的cross-validation,和预测模型训练完成后的回测
from datetime import datetime
feature_list = ['FastStochastics','WilliamsR', 'SlowStochastics', 
                'Ex-postVolatility', 'short_signal','long_signal']
all_days = list(X.index)
start_backtest = 4500 #起始回测时间
end_backtest = 4600
#start_day = all_days.index(start_month)
day_length = len(all_days)
print(all_days[start_backtest])
2014-08-04 00:00:00
训练流程¶
以下展示了单个决策树模型的训练流程,设置不同rollingwindow的长度(包括2个月,一季度,半年和一年),将过往数据视为截面数据进行Cross-Validation,选择最优参数
可以设定多长时间更新一次模型(refreshFreq),这里设定为1天,即每天都重新用之前rolling window 长度的数据训练决策树模型。在运行大样本的时候考虑算力问题可以考虑变成5左右(一周)以节约时间
## 不同滑动窗口长度, 不同最大深度 做CV,
# 设定全局变量
rolling_windows = [40, 60 ,120, 250]
refreshFreq = 1 #可以设定多长时间更新一次模型,这里设定为1天,在运行大样本的时候考虑算力问题可以变成5左右以做算力上的妥协
hyperparameters = { 
    'clf__max_depth': (3, 4, 5, 6) 
}  
def DecissionTreeClf(featureList, prob = False):
    X = asset[feature_list][start_index:]
    Y = asset['label'][start_index:]
    reports = []
    for rw in rolling_windows:
        report = pd.DataFrame(index = X.index[start_backtest:])
        report['Correct'] = 0 #预测正确率
        report['Signal'] = 0 #策略信号
        report['Return'] = 0 #策略信号
        for day in range(start_backtest, end_backtest, refreshFreq):
            pipeline = Pipeline([('clf',tree.DecisionTreeClassifier(criterion='gini'))])
            clf = GridSearchCV(pipeline, hyperparameters, cv=5, n_jobs=1,verbose=1)
            train_days = all_days[day-rw:day]
            test_days = all_days[day:day+refreshFreq]
            X_train, Y_train = X.loc[train_days], Y.loc[train_days] 
            clf.fit(X_train, Y_train)
            prediction = clf.predict(X.loc[test_days])
            report.loc[test_days,'Correct'] = (prediction == Y.loc[test_days])
            report.loc[test_days, 'Signal'] = prediction
            if prob:
                prediction_prob = clf.predict_proba(X.loc[test_days])
                #print(prediction_prob)
                report.loc[test_days, 'Prob'] = prediction_prob[0][0] - prediction_prob[0][2]
        reports.append(report)
        print("===========")
        print("Finish Rolling Window {}".format(rw))
    return reports
reports = DecissionTreeClf(feature_list)
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 40 Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
=========== Finish Rolling Window 60 Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 120 Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 250
best_report = None
best_pre = 0 
for report in reports:
    corRate = 100*(report['Correct'] == 1).sum()/(end_backtest-start_backtest)
    print("预测正确率: {}%".format(corRate))
    print("信号个数: ", (report['Signal'] != 0).sum() )
    if corRate > best_pre:
        best_report = report
        best_pre = corRate
预测正确率: 80.0% 信号个数: 55 预测正确率: 76.0% 信号个数: 61 预测正确率: 74.0% 信号个数: 53 预测正确率: 61.0% 信号个数: 32
·Summary 可以看到,在选择了适当的rolling window 之后,决策树模型的预测效果有了显著的提升(70% -> 80%)
这从另一个角度佐证了股票市场信息具有短期记忆性
同时,由于我们才实验中发现,最短的rolling window 总有较好的表现(40个交易日或60个),因而此后为了训练速度,我们只选择40天作为rolling window,若需要增加对其他rolling window 的训练,只需要修改此前的参数即可
def backtest(report, initial_capital = 5*10**6):
    report = report[:(end_backtest-start_backtest) + 20]
    report.iloc[(end_backtest-start_backtest)+1:,1] = 0
    report.loc[:,'Signal2'] = report['Signal'].shift(20).fillna(0.0)
    report.loc[:, 'Signal'] = report['Signal'] - report['Signal2'] 
    #initial_capital = 10**6
    report.loc[:,'Signal'] = report['Signal'] * 100
    #print(report)
    portfolio = pd.DataFrame(index = report.index).fillna(0.0)
    portfolio.loc[:,'change'] = report['Signal'].multiply(asset['close'], axis=0)
    portfolio.loc[:,'position'] = report['Signal'].cumsum()
    portfolio.loc[:,'holdings'] = portfolio['position'].multiply(asset['close'], axis = 0)
    portfolio.loc[:,'cash'] = initial_capital - portfolio['change'].cumsum()
    portfolio.loc[:,'total'] = portfolio['cash'] + portfolio['holdings']
    portfolio.iloc[:(end_backtest-start_backtest) + 20,:].total.plot()
    plt.title("Capital Line", fontsize = 15)
    portfolio.loc[:,['holdings','cash']][:(end_backtest-start_backtest) + 20].plot()
    plt.title("Holdings and Cash", fontsize = 15)
    return portfolio
portfolio = backtest(reports[0])
#plt.figure()
#asset.loc['2012-07':'2013-02', 'close'].plot()
portfolio.tail()
C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1765: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(loc, value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1596: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self.obj[key] = _infer_fill_value(value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1745: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(ilocs[0], value)
| change | position | holdings | cash | total | |
|---|---|---|---|---|---|
| date | |||||
| 2015-01-22 | -334334.0 | 400.0 | 1337336.0 | 5369227.0 | 6706563.0 | 
| 2015-01-23 | -335176.0 | 300.0 | 1005528.0 | 5704403.0 | 6709931.0 | 
| 2015-01-26 | -338318.0 | 200.0 | 676636.0 | 6042721.0 | 6719357.0 | 
| 2015-01-27 | -335296.0 | 100.0 | 335296.0 | 6378017.0 | 6713313.0 | 
| 2015-01-28 | -330574.0 | 0.0 | 0.0 | 6708591.0 | 6708591.0 | 
Summary
增加新指标¶
- Momentum 动量 指标
def price_momentum(data, n):
    return data / data.shift(n)
def acceleration_between_times(data, n):
    mo = price_momentum(data, n)
    return mo - mo.shift(n)
asset['Momentum'] = price_momentum(asset['close'],20)
asset['Acc_Between_Times'] = acceleration_between_times(asset['close'], 20)
asset.tail()
| open | high | low | close | amount | vol | FastStochastics | WilliamsR | SlowStochastics | month_return | label | Ex-postVolatility | short_signal | long_signal | Momentum | Acc_Between_Times | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||||
| 2019-09-24 | 2979.48 | 3002.90 | 2973.76 | 2985.34 | 206245.81 | 16390276800 | 65.888764 | -83.361962 | 71.018973 | NaN | 0 | 0.010400 | 1.0 | 1.0 | 1.042524 | 0.068855 | 
| 2019-09-25 | 2977.67 | 2977.67 | 2955.43 | 2955.43 | 203782.75 | 16854313600 | 48.172718 | -88.355150 | 60.021286 | NaN | 0 | 0.009240 | 1.0 | 1.0 | 1.018345 | 0.035331 | 
| 2019-09-26 | 2964.48 | 2970.04 | 2928.26 | 2929.09 | 216277.57 | 18893254400 | 32.571225 | -90.037316 | 48.877569 | NaN | 0 | 0.008638 | 0.0 | 1.0 | 1.012209 | 0.025423 | 
| 2019-09-27 | 2929.49 | 2939.08 | 2920.93 | 2932.17 | 153311.01 | 13290577600 | 34.395546 | -92.809335 | 38.379830 | NaN | 0 | 0.006694 | 0.0 | 1.0 | 1.014269 | 0.020405 | 
| 2019-09-30 | 2927.92 | 2936.48 | 2905.19 | 2905.19 | 142591.71 | 11664680800 | 13.507064 | -74.612245 | 26.824612 | NaN | 0 | 0.005715 | 0.0 | 1.0 | 1.006566 | 0.000150 | 
asset['ado'] = (asset['high'] - asset['open'] + asset['close'] - asset['low']) * 50 / (
    asset['high'] - asset['low']) 
- Index
window = 20 #月频
delta = asset['close'].pct_change()
U = delta.copy()
U[delta<=0]=0.0
D = abs(delta.copy())
D[delta>0]=0.0
U_average = U.rolling(window).mean()
D_average = D.rolling(window).mean()
asset['RSI']= 100-100/(1+U_average/D_average)
def positive_volume_index(asset):
    PVI = asset['close']*0
    PVI[0] = 100          #初值为100
    length = len(PVI)
    for i in range(1,length):
        if asset['vol'][i] < asset['vol'][i-1]:
            PVI[i] = PVI[i-1]
        if asset['vol'][i] >= asset['vol'][i-1]:
            PVI[i] = PVI[i-1]*(1+(asset['close'][i]-asset['close'][i-1])/asset['close'][i-1])
    return PVI
asset['PVI'] = positive_volume_index(asset)
def negative_volume_index(asset):
    NVI = asset["close"]*0
    NVI[0] = 100          #初值为100
    length = len(NVI)
    for i in range(1,length):
        if asset['vol'][i] > asset['vol'][i-1]:
            NVI[i] = NVI[i-1]
        if asset['vol'][i] <= asset['vol'][i-1]:
            NVI[i] = NVI[i-1]*(1+(asset['close'][i]-asset['close'][i-1])/asset["close"][i-1])
    return NVI
asset['NVI'] = negative_volume_index(asset)
Indicators¶
- Price and Volume Trend
- On-Balance Volume
- Price Rate of Change
- Volume Rate of Change
- Accumulation Distribution Line
- William’s Accumulation Distribution Line
import numpy as np
import matplotlib.pylab as plt
import pandas as pd
%matplotlib inline
stocks = pd.read_excel("sh000001.xlsx",index_col=0,parse_dates=True).dropna(axis = 1)
stocks.head()#请改为stocks=asset
| open | high | low | close | amount | vol | |
|---|---|---|---|---|---|---|
| date | ||||||
| 1990-12-19 | 96.05 | 100.00 | 95.79 | 100.00 | 0.49 | 126000 | 
| 1990-12-20 | 104.30 | 104.39 | 99.98 | 104.39 | 0.09 | 19700 | 
| 1990-12-21 | 109.07 | 109.13 | 103.73 | 109.13 | 0.02 | 2800 | 
| 1990-12-24 | 113.57 | 114.55 | 109.13 | 114.55 | 0.03 | 3200 | 
| 1990-12-25 | 120.09 | 120.25 | 114.55 | 120.25 | 0.01 | 1500 | 
Price and Volume Trend¶
dPVT=(stocks["close"]-stocks["close"].shift(1))/stocks["close"].shift(1)*stocks["vol"]
dPVT[[0]]=0
PVT=dPVT.copy()
for i in range(len(PVT)-1):
    PVT[[i+1]]=PVT[[i]].values+dPVT.shift(-1)[[i]].values
PVT.plot()
<AxesSubplot:xlabel='date'>
On-Balance Volume¶
OBV=stocks["vol"]*0
for i in range(len(OBV)-1):
    tmp1=stocks["close"].shift(-1)[[i]]
    tmp2=stocks["close"][[i]]
    if((tmp1>tmp2).bool()):
        OBV[[i+1]]=OBV[[i]].values+stocks["vol"].shift(-1)[[i]].values
    if((tmp1<tmp2).bool()):
        OBV[[i+1]]=OBV[[i]].values-stocks["vol"].shift(-1)[[i]].values
    if((tmp1==tmp2).bool()):
        OBV[[i+1]]=OBV[[i]].values
OBV.plot()
<AxesSubplot:xlabel='date'>
Volume and Price Rate of Change¶
N=70
PROC=100*(stocks["close"].shift(N)/stocks["close"]-1)
PROC=PROC.dropna()
VROC=100*(stocks["vol"].shift(N)/stocks["vol"]-1)
VROC=VROC.dropna()
PROC.plot()
<AxesSubplot:xlabel='date'>
VROC.plot()
<AxesSubplot:xlabel='date'>
Accumulation Distribution Line¶
ADL=(2*stocks["close"]-stocks["low"]-stocks["high"])/(stocks["high"]-stocks["low"])*stocks["vol"]
ADL.plot()
<AxesSubplot:xlabel='date'>
William’s Accumulation Distribution Line¶
WADL=stocks["vol"]*0
for i in range(len(WADL)-1):
    if((stocks["close"].shift(-1)[[i]]>stocks["close"][[i]]).bool()):
        WADL[[i+1]]=WADL[[i]].values+stocks["close"].shift(-1)[[i]].values-min(stocks["low"].shift(-1)[[i]].item(),stocks["close"][[i]].item())
    if((stocks["close"].shift(-1)[[i]]<stocks["close"][[i]]).bool()):
        WADL[[i+1]]=WADL[[i]].values+stocks["close"].shift(-1)[[i]].values-max(stocks["high"].shift(-1)[[i]].item(),stocks["close"][[i]].item())
    if((stocks["close"].shift(-1)[[i]]==stocks["close"][[i]]).bool()):
        WADL[[i+1]]=WADL[[i]].values
WADL.plot()
<AxesSubplot:xlabel='date'>
Decision Tree¶
from sklearn import tree
%matplotlib inline
from IPython.display import Image
import pydotplus
from sklearn.datasets import load_iris
import os
stocks['month_return'] = stocks['close'].shift(-20)/stocks['close']-1
stocks['label'] = 1 * (stocks['month_return'] > 0.05) + (-1) * (stocks['month_return'] < - 0.05)
short_window = 5
mid_window = 20
long_window = 250
signals = pd.DataFrame(index = stocks.index)
stocks['short_signal'] = 0
stocks['long_signal'] = 0
signals['short_mavg'] = stocks['close'].rolling(window=short_window, min_periods=1, center=False).mean()
signals['mid_mavg'] = stocks['close'].rolling(window=mid_window, min_periods=1, center=False).mean()
signals['long_mavg'] = stocks['close'].rolling(window=long_window, min_periods=1, center=False).mean()
stocks['short_signal'].iloc[short_window:] = np.where(signals['short_mavg'][short_window:] > signals['mid_mavg'][short_window:], 1.0, 0.0)
stocks['long_signal'].iloc[mid_window:] = np.where(signals['mid_mavg'][mid_window:] > signals['long_mavg'][mid_window:], 1.0, 0.0)
stocks.head()
C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:670: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy iloc._setitem_with_indexer(indexer, value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:670: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy iloc._setitem_with_indexer(indexer, value)
| open | high | low | close | amount | vol | month_return | label | short_signal | long_signal | |
|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||
| 1990-12-19 | 96.05 | 100.00 | 95.79 | 100.00 | 0.49 | 126000 | 0.342500 | 1 | 0.0 | 0.0 | 
| 1990-12-20 | 104.30 | 104.39 | 99.98 | 104.39 | 0.09 | 19700 | 0.285947 | 1 | 0.0 | 0.0 | 
| 1990-12-21 | 109.07 | 109.13 | 103.73 | 109.13 | 0.02 | 2800 | 0.230093 | 1 | 0.0 | 0.0 | 
| 1990-12-24 | 113.57 | 114.55 | 109.13 | 114.55 | 0.03 | 3200 | 0.167351 | 1 | 0.0 | 0.0 | 
| 1990-12-25 | 120.09 | 120.25 | 114.55 | 120.25 | 0.01 | 1500 | 0.107443 | 1 | 0.0 | 0.0 | 
stocks['PVT']=PVT
stocks['OBV']=OBV
stocks['PROC']=PROC
stocks['VROC']=VROC
stocks['ADL']=ADL
stocks['WADL']=WADL
stocks_plus=stocks.dropna()
stocks_plus.head()
| open | high | low | close | amount | vol | month_return | label | short_signal | long_signal | PVT | OBV | PROC | VROC | ADL | WADL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||||
| 1991-04-01 | 120.69 | 120.73 | 119.68 | 120.73 | 5.67 | 1199900 | -0.049532 | 0 | 0.0 | 0.0 | 18470.101449 | 4095700.0 | -17.170546 | -89.499125 | 1.199900e+06 | 23.70 | 
| 1991-04-02 | 121.21 | 121.29 | 120.73 | 121.21 | 0.39 | 81600 | -0.059979 | -1 | 0.0 | 0.0 | 18794.527855 | 4177300.0 | -13.876743 | -75.857843 | 5.828571e+04 | 24.18 | 
| 1991-04-03 | 121.71 | 121.71 | 121.21 | 121.71 | 0.98 | 206300 | -0.070249 | -1 | 0.0 | 0.0 | 19645.530247 | 4383600.0 | -10.336045 | -98.642753 | 2.063000e+05 | 24.68 | 
| 1991-04-04 | 122.20 | 122.20 | 121.24 | 121.72 | 5.05 | 1051700 | -0.076487 | -1 | 0.0 | 0.0 | 19731.940567 | 5435300.0 | -5.890569 | -99.695731 | 0.000000e+00 | 25.16 | 
| 1991-04-05 | 121.07 | 121.73 | 121.03 | 121.54 | 6.04 | 1158900 | -0.081701 | -1 | 0.0 | 0.0 | 18018.154829 | 4276400.0 | -1.061379 | -99.870567 | 5.297829e+05 | 24.97 | 
asset['PVT']=PVT
asset['OBV']=OBV
asset['PROC']=PROC
asset['VROC']=VROC
asset['ADL']=ADL
asset['WADL']=WADL
print(stocks_plus['label'][:'2019'])
print(stocks_plus[['PVT','OBV','PROC','VROC','ADL','WADL','short_signal','long_signal']][:'2019'])
date
1991-04-01    0
1991-04-02   -1
1991-04-03   -1
1991-04-04   -1
1991-04-05   -1
             ..
2019-09-04    0
2019-09-05    0
2019-09-06    0
2019-09-09    0
2019-09-10    0
Name: label, Length: 6956, dtype: int32
                     PVT           OBV       PROC       VROC           ADL  \
date                                                                         
1991-04-01  1.847010e+04  4.095700e+06 -17.170546 -89.499125  1.199900e+06   
1991-04-02  1.879453e+04  4.177300e+06 -13.876743 -75.857843  5.828571e+04   
1991-04-03  1.964553e+04  4.383600e+06 -10.336045 -98.642753  2.063000e+05   
1991-04-04  1.973194e+04  5.435300e+06  -5.890569 -99.695731  0.000000e+00   
1991-04-05  1.801815e+04  4.276400e+06  -1.061379 -99.870567  5.297829e+05   
...                  ...           ...        ...        ...           ...   
2019-09-04  5.712950e+10  6.541108e+12  -1.606135  -0.969128  2.254959e+10   
2019-09-05  5.742333e+10  6.571652e+12  -2.383233 -34.861292 -1.186953e+10   
2019-09-06  5.752310e+10  6.593334e+12  -3.126750  -5.104213  2.087867e+10   
2019-09-09  5.772708e+10  6.617672e+12  -4.166970 -19.782897  2.078481e+10   
2019-09-10  5.769899e+10  6.593671e+12  -4.339997 -10.028372  1.046992e+10   
                WADL  short_signal  long_signal  
date                                             
1991-04-01     23.70           0.0          0.0  
1991-04-02     24.18           0.0          0.0  
1991-04-03     24.68           0.0          0.0  
1991-04-04     25.16           0.0          0.0  
1991-04-05     24.97           0.0          0.0  
...              ...           ...          ...  
2019-09-04  16008.02           1.0          1.0  
2019-09-05  16036.47           1.0          1.0  
2019-09-06  16054.47           1.0          1.0  
2019-09-09  16079.61           1.0          1.0  
2019-09-10  16073.40           1.0          1.0  
[6956 rows x 8 columns]
clf = tree.DecisionTreeClassifier(max_depth = 6)
clf = clf.fit(stocks_plus[['PVT','OBV','PROC','VROC','ADL','WADL','short_signal','long_signal']][:'2019'], stocks_plus['label'][:'2019'])
dot_data = tree.export_graphviz(clf, out_file=None,
                                feature_names=['PVT','OBV','PROC','VROC','ADL','WADL','short_signal','long_signal'],
                                class_names=['-1','0','1'],
                                filled=True, rounded=True,
                                special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)  
graph.write_png("stocks_DecisionTree.png")
Image(graph.create_png())
clf.score(stocks[['PVT','OBV','PROC','VROC','ADL','WADL','short_signal','long_signal']]['2019':], stocks['label']['2019':])
0.7526315789473684
分别决策并投票¶
# 设定全局变量
rolling_windows = [40]
refreshFreq = 10 #可以设定多长时间更新一次模型,这里设定为1天,在运行大样本的时候考虑算力问题可以变成5左右
hyperparameters = { 
    'clf__max_depth': (3, 4, 5, 6) 
}
print(asset.columns)
Index(['open', 'high', 'low', 'close', 'amount', 'vol', 'FastStochastics',
       'WilliamsR', 'SlowStochastics', 'month_return', 'label',
       'Ex-postVolatility', 'short_signal', 'long_signal', 'Momentum',
       'Acc_Between_Times', 'ado', 'RSI', 'PVI', 'NVI', 'PVT', 'OBV', 'PROC',
       'VROC', 'ADL', 'WADL'],
      dtype='object')
Mom_featureList = ['Momentum', 'Acc_Between_Times', 'ado']
Sto_featureList = ['FastStochastics','WilliamsR', 'SlowStochastics', 'Ex-postVolatility']
Ind_featureList = ['RSI','PVI','NVI']
Indicators_fearureList=['PVT','OBV','PROC','VROC','ADL','WADL']
Ave_featureList = ['short_signal','long_signal']
feature_list = Mom_featureList + Sto_featureList + Ind_featureList + Ave_featureList + Indicators_fearureList
reportsMom = DecissionTreeClf(Mom_featureList+Ave_featureList)
reportsSto = DecissionTreeClf(Sto_featureList+Ave_featureList)
reportsInd = DecissionTreeClf(Ave_featureList+Ave_featureList)
reportsIndicator = DecissionTreeClf(Indicators_fearureList+Ave_featureList)
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 40 Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 40 Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
=========== Finish Rolling Window 40 Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 40
#backtest(reportsMom[0])
#print("预测正确率: {}%".format((reportsMom[0].Correct == True).sum())) 
## 不同滑动窗口长度, 不同最大深度 做CV,
# 设定全局变量
rolling_windows = [40, 60 ,120, 250]
refreshFreq = 1 #可以设定多长时间更新一次模型,这里设定为1天,在运行大样本的时候考虑算力问题可以变成5左右以做算力上的妥协
hyperparameters = { 
    'clf__max_depth': (3, 4, 5, 6) 
} 
def DecissionTreeClf(featureList, prob = False):
    X = asset[feature_list][start_index:]
    Y = asset['label'][start_index:]
    reports = []
    for rw in rolling_windows:
        report = pd.DataFrame(index = X.index[start_backtest:])
        report['Correct'] = 0 #预测正确率
        report['Signal'] = 0 #策略信号
        report['Return'] = 0 #策略信号
        for day in range(start_backtest, end_backtest, refreshFreq):
            pipeline = Pipeline([('clf',tree.DecisionTreeClassifier(criterion='gini'))])
            clf = GridSearchCV(pipeline, hyperparameters, cv=5, n_jobs=1,verbose=1)
            train_days = all_days[day-rw:day]
            test_days = all_days[day:day+refreshFreq]
            X_train, Y_train = X.loc[train_days], Y.loc[train_days] 
            clf.fit(X_train, Y_train)
            prediction = clf.predict(X.loc[test_days])
            report.loc[test_days,'Correct'] = (prediction == Y.loc[test_days])
            report.loc[test_days, 'Signal'] = prediction
            if prob:
                prediction_prob = clf.predict_proba(X.loc[test_days])
                print(prediction_prob)
                report.loc[test_days, 'Prob'] = prediction_prob[0][0] - prediction_prob[0][2]
        reports.append(report)
        print("===========")
        print("Finish Rolling Window {}".format(rw))
    return reports
所有指标集合在一起建立决策树¶
reportsTotal = DecissionTreeClf(Mom_featureList+Sto_featureList+Ind_featureList+Ave_featureList+Indicators_fearureList)
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 40 Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 60 Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
=========== Finish Rolling Window 120 Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits
Fitting 5 folds for each of 4 candidates, totalling 20 fits Fitting 5 folds for each of 4 candidates, totalling 20 fits =========== Finish Rolling Window 250
backtest(reportsTotal[0])
print("预测正确率: {}%".format((reportsTotal[0].Correct == True).sum())) 
C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1765: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(loc, value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1596: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self.obj[key] = _infer_fill_value(value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1745: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(ilocs[0], value)
预测正确率: 81%
AdaBoost¶
原问题中要求我们将所有指标集合在一起建立一个统一的决策树模型。从ML的经验角度这种简单的加总效果并不会很好,考虑使用经典的集成算法(ensemble method) 来训练模型,此处使用的是AdaBoost 算法
from sklearn.ensemble import AdaBoostClassifier
X = asset[feature_list][start_index:]
Y = asset['label'][start_index:]
hyperAda = {
    'Ada__n_estimators': (10, 30, 60, 100, 200)
}
reports = []
for rw in rolling_windows:
    report = pd.DataFrame(index=X.index[start_backtest:])
    report['Correct'] = 0  # 预测正确率
    report['Signal'] = 0  # 策略信号
    report['Return'] = 0  # 策略信号
    for day in range(start_backtest, end_backtest, refreshFreq):
        pipeline = Pipeline(
            [('Ada', AdaBoostClassifier(tree.DecisionTreeClassifier(max_depth=1)))])
        clf = GridSearchCV(pipeline, hyperAda, cv=5, n_jobs=1, verbose=1)
        train_days = all_days[day-rw:day]
        test_days = all_days[day:day+refreshFreq]
        X_train, Y_train = X.loc[train_days], Y.loc[train_days]
        clf.fit(X_train, Y_train)
        prediction = clf.predict(X.loc[test_days])
        report.loc[test_days, 'Correct'] = (prediction == Y.loc[test_days])
        report.loc[test_days, 'Signal'] = prediction
    reports.append(report)
    print("===========")
    print("Finish Rolling Window {}".format(rw))
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
=========== Finish Rolling Window 40 Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
=========== Finish Rolling Window 60 Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
=========== Finish Rolling Window 120 Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Fitting 5 folds for each of 5 candidates, totalling 25 fits
=========== Finish Rolling Window 250
print("AdaBoost 样本外预测正确率: {}%".format((reports[0].Correct == True).sum())) 
portfolioAda = backtest(reports[0])
#plt.figure()
#asset.loc['2012-07':'2013-02', 'close'].plot()
portfolioAda.tail()
AdaBoost 样本外预测正确率: 83%
C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1765: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(loc, value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1596: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self.obj[key] = _infer_fill_value(value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1745: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(ilocs[0], value)
| change | position | holdings | cash | total | |
|---|---|---|---|---|---|
| date | |||||
| 2015-01-22 | -334334.0 | 400.0 | 1337336.0 | 5446548.0 | 6783884.0 | 
| 2015-01-23 | -335176.0 | 300.0 | 1005528.0 | 5781724.0 | 6787252.0 | 
| 2015-01-26 | -338318.0 | 200.0 | 676636.0 | 6120042.0 | 6796678.0 | 
| 2015-01-27 | -335296.0 | 100.0 | 335296.0 | 6455338.0 | 6790634.0 | 
| 2015-01-28 | -330574.0 | 0.0 | 0.0 | 6785912.0 | 6785912.0 | 
Random Forest¶
以下则采用随机森林(bagging 算法)进行训练
from sklearn.ensemble import RandomForestClassifier
hyperRF = {
    'RF__n_estimators': (100, 50, 30)
}
reports = []
for rw in rolling_windows:
    report = pd.DataFrame(index=X.index[start_backtest:])
    report['Correct'] = 0  # 预测正确率
    report['Signal'] = 0  # 策略信号
    report['Return'] = 0  # 策略信号
    for day in range(start_backtest, end_backtest, refreshFreq):
        pipeline = Pipeline(
            [('RF', RandomForestClassifier)])
        #clf = GridSearchCV(pipeline, hyperRF, cv=5, n_jobs=1, verbose=1)
        clf = RandomForestClassifier()
        train_days = all_days[day-rw:day]
        test_days = all_days[day:day+refreshFreq]
        X_train, Y_train = X.loc[train_days], Y.loc[train_days]
        #print(Y_train)
        clf.fit(X_train.values, Y_train.values)
        prediction = clf.predict(X.loc[test_days])
        report.loc[test_days, 'Correct'] = (prediction == Y.loc[test_days])
        report.loc[test_days, 'Signal'] = prediction
    reports.append(report)
    print("===========")
    print("Finish Rolling Window {}".format(rw))
=========== Finish Rolling Window 40 =========== Finish Rolling Window 60 =========== Finish Rolling Window 120 =========== Finish Rolling Window 250
print("AdaBoost 样本外预测正确率: {}%".format((reports[0].Correct == True).sum())) 
portfolioRF = backtest(reports[0])
#plt.figure()
#asset.loc['2012-07':'2013-02', 'close'].plot()
portfolioRF.tail()
AdaBoost 样本外预测正确率: 82%
C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1765: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(loc, value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1596: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self.obj[key] = _infer_fill_value(value) C:\Users\kyjma\anaconda3\lib\site-packages\pandas\core\indexing.py:1745: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(ilocs[0], value)
| change | position | holdings | cash | total | |
|---|---|---|---|---|---|
| date | |||||
| 2015-01-22 | -334334.0 | 400.0 | 1337336.0 | 5440857.0 | 6778193.0 | 
| 2015-01-23 | -335176.0 | 300.0 | 1005528.0 | 5776033.0 | 6781561.0 | 
| 2015-01-26 | -338318.0 | 200.0 | 676636.0 | 6114351.0 | 6790987.0 | 
| 2015-01-27 | -335296.0 | 100.0 | 335296.0 | 6449647.0 | 6784943.0 | 
| 2015-01-28 | -330574.0 | 0.0 | 0.0 | 6780221.0 | 6780221.0 | 
Note:
我们在股票池中的策略是;对每只股票当天的回报率分类生成一个置信水平,并且将正概率减去负概率作为一种“绝对值”,使用这个绝对值指标作为判断股票下一日涨跌的主要依据,选出绝对值最大的若干只组成该日的资产组合
Summary
从图中可以看出, 选择出的预测置信度最高的几只股票的月频率收益率主要围绕5%波动,算是非常不错的最终成绩了