Portfolio Allocation

Our paper: FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance.

Presented at NeurIPS 2020: Deep RL Workshop.

The Jupyter notebook codes are available on our Github and Google Colab.

Tip

Check our previous tutorials: Single Stock Trading and Multiple Stock Trading for detailed explanation of the FinRL architecture and modules.

Overview

To begin with, we would like to explain the logic of portfolio allocation using Deep Reinforcement Learning.We use Dow 30 constituents as an example throughout this article, because those are the most popular stocks.

Let’s say that we got a million dollars at the beginning of 2019. We want to invest this $1,000,000 into stock markets, in this case is Dow Jones 30 constituents.Assume that no margin, no short sale, no treasury bill (use all the money to trade only these 30 stocks). So that the weight of each individual stock is non-negative, and the weights of all the stocks add up to one.

We hire a smart portfolio manager- Mr. Deep Reinforcement Learning. Mr. DRL will give us daily advice includes the portfolio weights or the proportions of money to invest in these 30 stocks. So every day we just need to rebalance the portfolio weights of the stocks.The basic logic is as follows.

tutorial/image/portfolio_allocation_1.png

Portfolio allocation is different from multiple stock trading because we are essentially rebalancing the weights at each time step, and we have to use all available money.

The traditional and the most popular way of doing portfolio allocation is mean-variance or modern portfolio theory (MPT):

image/portfolio_allocation_2.png

However, MPT performs not so well in out-of-sample data. MPT is calculated only based on stock returns, if we want to take other relevant factors into account, for example some of the technical indicators like MACD or RSI, MPT may not be able to combine these information together well.

We introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance. FinRL is a DRL library designed specifically for automated stock trading with an effort for educational and demonstrative purpose.

This article is focusing on one of the use cases in our paper: Portfolio Allocation. We use one Jupyter notebook to include all the necessary steps.

Problem Definition

This problem is to design an automated trading solution for portfolio allocation. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The components of the reinforcement learning environment are:

  • Action: portfolio weight of each stock is within [0,1]. We use softmax function to normalize the actions to sum to 1.

  • State: {Covariance Matrix, MACD, RSI, CCI, ADX}, **state space shape is (34, 30). 34 is the number of rows, 30 is the number of columns.

  • Reward function: r(s, a, s′) = p_t, p_t is the cumulative portfolio value.

  • Environment: portfolio allocation for Dow 30 constituents.

Covariance matrix is a good feature because portfolio managers use it to quantify the risk (standard deviation) associated with a particular portfolio.

We also assume no transaction cost, because we are trying to make a simple portfolio allocation case as a starting point.

Load Python Packages

Install the unstable development version of FinRL:

1 # Install the unstable development version in Jupyter notebook:
2 !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git

Import Packages:

 1 # import packages
 2 import pandas as pd
 3 import numpy as np
 4 import matplotlib
 5 import matplotlib.pyplot as plt
 6 matplotlib.use('Agg')
 7 import datetime
 8
 9 from finrl import config
10 from finrl import config_tickers
11 from finrl.marketdata.yahoodownloader import YahooDownloader
12 from finrl.preprocessing.preprocessors import FeatureEngineer
13 from finrl.preprocessing.data import data_split
14 from finrl.env.environment import EnvSetup
15 from finrl.env.EnvMultipleStock_train import StockEnvTrain
16 from finrl.env.EnvMultipleStock_trade import StockEnvTrade
17 from finrl.model.models import DRLAgent
18 from finrl.trade.backtest import BackTestStats, BaselineStats, BackTestPlot, backtest_strat, baseline_strat
19 from finrl.trade.backtest import backtest_strat, baseline_strat
20
21 import os
22 if not os.path.exists("./" + config.DATA_SAVE_DIR):
23     os.makedirs("./" + config.DATA_SAVE_DIR)
24 if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
25     os.makedirs("./" + config.TRAINED_MODEL_DIR)
26 if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
27     os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
28 if not os.path.exists("./" + config.RESULTS_DIR):
29     os.makedirs("./" + config.RESULTS_DIR)

Download Data

FinRL uses a YahooDownloader class to extract data.

class YahooDownloader:
    """
    Provides methods for retrieving daily stock data from Yahoo Finance API

    Attributes
    ----------
        start_date : str
            start date of the data (modified from config.py)
        end_date : str
            end date of the data (modified from config.py)
        ticker_list : list
            a list of stock tickers (modified from config.py)

    Methods
    -------
        fetch_data()
            Fetches data from yahoo API
    """

Download and save the data in a pandas DataFrame:

1 # Download and save the data in a pandas DataFrame:
2 df = YahooDownloader(start_date = '2008-01-01',
3                      end_date = '2020-12-01',
4                      ticker_list = config_tickers.DOW_30_TICKER).fetch_data()

Preprocess Data

FinRL uses a FeatureEngineer class to preprocess data.

class FeatureEngineer:
    """
    Provides methods for preprocessing the stock price data

    Attributes
    ----------
        df: DataFrame
            data downloaded from Yahoo API
        feature_number : int
            number of features we used
        use_technical_indicator : boolean
            we technical indicator or not
        use_turbulence : boolean
            use turbulence index or not

    Methods
    -------
        preprocess_data()
            main method to do the feature engineering
    """

Perform Feature Engineering: covariance matrix + technical indicators:

 1 # Perform Feature Engineering:
 2 df = FeatureEngineer(df.copy(),
 3                     use_technical_indicator=True,
 4                     use_turbulence=False).preprocess_data()
 5
 6
 7 # add covariance matrix as states
 8 df=df.sort_values(['date','tic'],ignore_index=True)
 9 df.index = df.date.factorize()[0]
10
11 cov_list = []
12 # look back is one year
13 lookback=252
14 for i in range(lookback,len(df.index.unique())):
15   data_lookback = df.loc[i-lookback:i,:]
16   price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
17   return_lookback = price_lookback.pct_change().dropna()
18   covs = return_lookback.cov().values
19   cov_list.append(covs)
20
21 df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list})
22 df = df.merge(df_cov, on='date')
23 df = df.sort_values(['date','tic']).reset_index(drop=True)
24 df.head()
image/portfolio_allocation_3.png

Build Environment

FinRL uses a EnvSetup class to setup environment.

class EnvSetup:
    """
    Provides methods for retrieving daily stock data from
    Yahoo Finance API

    Attributes
        ----------
        stock_dim: int
            number of unique stocks
        hmax : int
            maximum number of shares to trade
        initial_amount: int
            start money
        transaction_cost_pct : float
            transaction cost percentage per trade
        reward_scaling: float
            scaling factor for reward, good for training
        tech_indicator_list: list
            a list of technical indicator names (modified from config.py)
    Methods
        -------
        create_env_training()
            create env class for training
        create_env_validation()
            create env class for validation
        create_env_trading()
            create env class for trading
    """

Initialize an environment class:

User-defined Environment: a simulation environment class.The environment for portfolio allocation:

  1 import numpy as np
  2 import pandas as pd
  3 from gym.utils import seeding
  4 import gym
  5 from gym import spaces
  6 import matplotlib
  7 matplotlib.use('Agg')
  8 import matplotlib.pyplot as plt
  9
 10 class StockPortfolioEnv(gym.Env):
 11     """A single stock trading environment for OpenAI gym
 12     Attributes
 13     ----------
 14         df: DataFrame
 15             input data
 16         stock_dim : int
 17             number of unique stocks
 18         hmax : int
 19             maximum number of shares to trade
 20         initial_amount : int
 21             start money
 22         transaction_cost_pct: float
 23             transaction cost percentage per trade
 24         reward_scaling: float
 25             scaling factor for reward, good for training
 26         state_space: int
 27             the dimension of input features
 28         action_space: int
 29             equals stock dimension
 30         tech_indicator_list: list
 31             a list of technical indicator names
 32         turbulence_threshold: int
 33             a threshold to control risk aversion
 34         day: int
 35             an increment number to control date
 36     Methods
 37     -------
 38     _sell_stock()
 39         perform sell action based on the sign of the action
 40     _buy_stock()
 41         perform buy action based on the sign of the action
 42     step()
 43         at each step the agent will return actions, then
 44         we will calculate the reward, and return the next observation.
 45     reset()
 46         reset the environment
 47     render()
 48         use render to return other functions
 49     save_asset_memory()
 50         return account value at each time step
 51     save_action_memory()
 52         return actions/positions at each time step
 53
 54     """
 55     metadata = {'render.modes': ['human']}
 56
 57     def __init__(self,
 58                 df,
 59                 stock_dim,
 60                 hmax,
 61                 initial_amount,
 62                 transaction_cost_pct,
 63                 reward_scaling,
 64                 state_space,
 65                 action_space,
 66                 tech_indicator_list,
 67                 turbulence_threshold,
 68                 lookback=252,
 69                 day = 0):
 70         #super(StockEnv, self).__init__()
 71         #money = 10 , scope = 1
 72         self.day = day
 73         self.lookback=lookback
 74         self.df = df
 75         self.stock_dim = stock_dim
 76         self.hmax = hmax
 77         self.initial_amount = initial_amount
 78         self.transaction_cost_pct =transaction_cost_pct
 79         self.reward_scaling = reward_scaling
 80         self.state_space = state_space
 81         self.action_space = action_space
 82         self.tech_indicator_list = tech_indicator_list
 83
 84         # action_space normalization and shape is self.stock_dim
 85         self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,))
 86         # Shape = (34, 30)
 87         # covariance matrix + technical indicators
 88         self.observation_space = spaces.Box(low=0,
 89                                             high=np.inf,
 90                                             shape = (self.state_space+len(self.tech_indicator_list),
 91                                                      self.state_space))
 92
 93         # load data from a pandas dataframe
 94         self.data = self.df.loc[self.day,:]
 95         self.covs = self.data['cov_list'].values[0]
 96         self.state =  np.append(np.array(self.covs),
 97                       [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
 98         self.terminal = False
 99         self.turbulence_threshold = turbulence_threshold
100         # initalize state: inital portfolio return + individual stock return + individual weights
101         self.portfolio_value = self.initial_amount
102
103         # memorize portfolio value each step
104         self.asset_memory = [self.initial_amount]
105         # memorize portfolio return each step
106         self.portfolio_return_memory = [0]
107         self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
108         self.date_memory=[self.data.date.unique()[0]]
109
110
111     def step(self, actions):
112         # print(self.day)
113         self.terminal = self.day >= len(self.df.index.unique())-1
114         # print(actions)
115
116         if self.terminal:
117             df = pd.DataFrame(self.portfolio_return_memory)
118             df.columns = ['daily_return']
119             plt.plot(df.daily_return.cumsum(),'r')
120             plt.savefig('results/cumulative_reward.png')
121             plt.close()
122
123             plt.plot(self.portfolio_return_memory,'r')
124             plt.savefig('results/rewards.png')
125             plt.close()
126
127             print("=================================")
128             print("begin_total_asset:{}".format(self.asset_memory[0]))
129             print("end_total_asset:{}".format(self.portfolio_value))
130
131             df_daily_return = pd.DataFrame(self.portfolio_return_memory)
132             df_daily_return.columns = ['daily_return']
133             if df_daily_return['daily_return'].std() !=0:
134               sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
135                        df_daily_return['daily_return'].std()
136               print("Sharpe: ",sharpe)
137             print("=================================")
138
139             return self.state, self.reward, self.terminal,{}
140
141         else:
142             #print(actions)
143             # actions are the portfolio weight
144             # normalize to sum of 1
145             norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
146             weights = norm_actions
147             #print(weights)
148             self.actions_memory.append(weights)
149             last_day_memory = self.data
150
151             #load next state
152             self.day += 1
153             self.data = self.df.loc[self.day,:]
154             self.covs = self.data['cov_list'].values[0]
155             self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
156             # calcualte portfolio return
157             # individual stocks' return * weight
158             portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
159             # update portfolio value
160             new_portfolio_value = self.portfolio_value*(1+portfolio_return)
161             self.portfolio_value = new_portfolio_value
162
163             # save into memory
164             self.portfolio_return_memory.append(portfolio_return)
165             self.date_memory.append(self.data.date.unique()[0])
166             self.asset_memory.append(new_portfolio_value)
167
168             # the reward is the new portfolio value or end portfolo value
169             self.reward = new_portfolio_value
170             #self.reward = self.reward*self.reward_scaling
171
172
173         return self.state, self.reward, self.terminal, {}
174
175     def reset(self):
176         self.asset_memory = [self.initial_amount]
177         self.day = 0
178         self.data = self.df.loc[self.day,:]
179         # load states
180         self.covs = self.data['cov_list'].values[0]
181         self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
182         self.portfolio_value = self.initial_amount
183         #self.cost = 0
184         #self.trades = 0
185         self.terminal = False
186         self.portfolio_return_memory = [0]
187         self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
188         self.date_memory=[self.data.date.unique()[0]]
189         return self.state
190
191     def render(self, mode='human'):
192         return self.state
193
194     def save_asset_memory(self):
195         date_list = self.date_memory
196         portfolio_return = self.portfolio_return_memory
197         #print(len(date_list))
198         #print(len(asset_list))
199         df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
200         return df_account_value
201
202     def save_action_memory(self):
203         # date and close price length must match actions length
204         date_list = self.date_memory
205         df_date = pd.DataFrame(date_list)
206         df_date.columns = ['date']
207
208         action_list = self.actions_memory
209         df_actions = pd.DataFrame(action_list)
210         df_actions.columns = self.data.tic.values
211         df_actions.index = df_date.date
212         #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
213         return df_actions
214
215     def _seed(self, seed=None):
216         self.np_random, seed = seeding.np_random(seed)
217         return [seed]

Implement DRL Algorithms

FinRL uses a DRLAgent class to implement the algorithms.

class DRLAgent:
    """
    Provides implementations for DRL algorithms

    Attributes
    ----------
        env: gym environment class
             user-defined class
    Methods
    -------
        train_PPO()
            the implementation for PPO algorithm
        train_A2C()
            the implementation for A2C algorithm
        train_DDPG()
            the implementation for DDPG algorithm
        train_TD3()
            the implementation for TD3 algorithm
        DRL_prediction()
            make a prediction in a test dataset and get results
    """

Model Training:

We use A2C for portfolio allocation, because it is stable, cost-effective, faster and works better with large batch sizes.

Trading:Assume that we have $1,000,000 initial capital at 2019/01/01. We use the A2C model to perform portfolio allocation of the Dow 30 stocks.

1 trade = data_split(df,'2019-01-01', '2020-12-01')
2
3 env_trade, obs_trade = env_setup.create_env_trading(data = trade,
4                                          env_class = StockPortfolioEnv)
5
6 df_daily_return, df_actions = DRLAgent.DRL_prediction(model=model_a2c,
7                         test_data = trade,
8                         test_env = env_trade,
9                         test_obs = obs_trade)
image/portfolio_allocation_4.png

The output actions or the portfolio weights look like this:

image/portfolio_allocation_5.png

Backtesting Performance

FinRL uses a set of functions to do the backtesting with Quantopian pyfolio.

 1 from pyfolio import timeseries
 2 DRL_strat = backtest_strat(df_daily_return)
 3 perf_func = timeseries.perf_stats
 4 perf_stats_all = perf_func( returns=DRL_strat,
 5                               factor_returns=DRL_strat,
 6                                 positions=None, transactions=None, turnover_denom="AGB")
 7 print("==============DRL Strategy Stats===========")
 8 perf_stats_all
 9 print("==============Get Index Stats===========")
10 baesline_perf_stats=BaselineStats('^DJI',
11                                   baseline_start = '2019-01-01',
12                                   baseline_end = '2020-12-01')
13
14
15 # plot
16 dji, dow_strat = baseline_strat('^DJI','2019-01-01','2020-12-01')
17 import pyfolio
18 %matplotlib inline
19 with pyfolio.plotting.plotting_context(font_scale=1.1):
20         pyfolio.create_full_tear_sheet(returns = DRL_strat,
21                                        benchmark_rets=dow_strat, set_context=False)

The left table is the stats for backtesting performance, the right table is the stats for Index (DJIA) performance.

Plots: