Portfolio Allocation¶
Our paper: FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance.
Presented at NeurIPS 2020: Deep RL Workshop.
The Jupyter notebook codes are available on our Github and Google Colab.
Tip
FinRL Single Stock Trading at Google Colab.
FinRL Multiple Stocks Trading at Google Colab:
Check our previous tutorials: Single Stock Trading and Multiple Stock Trading for detailed explanation of the FinRL architecture and modules.
Overview¶
To begin with, we would like to explain the logic of portfolio allocation using Deep Reinforcement Learning.We use Dow 30 constituents as an example throughout this article, because those are the most popular stocks.
Let’s say that we got a million dollars at the beginning of 2019. We want to invest this $1,000,000 into stock markets, in this case is Dow Jones 30 constituents.Assume that no margin, no short sale, no treasury bill (use all the money to trade only these 30 stocks). So that the weight of each individual stock is non-negative, and the weights of all the stocks add up to one.
We hire a smart portfolio manager- Mr. Deep Reinforcement Learning. Mr. DRL will give us daily advice includes the portfolio weights or the proportions of money to invest in these 30 stocks. So every day we just need to rebalance the portfolio weights of the stocks.The basic logic is as follows.

Portfolio allocation is different from multiple stock trading because we are essentially rebalancing the weights at each time step, and we have to use all available money.
The traditional and the most popular way of doing portfolio allocation is mean-variance or modern portfolio theory (MPT):

However, MPT performs not so well in out-of-sample data. MPT is calculated only based on stock returns, if we want to take other relevant factors into account, for example some of the technical indicators like MACD or RSI, MPT may not be able to combine these information together well.
We introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance. FinRL is a DRL library designed specifically for automated stock trading with an effort for educational and demonstrative purpose.
This article is focusing on one of the use cases in our paper: Portfolio Allocation. We use one Jupyter notebook to include all the necessary steps.
Problem Definition¶
This problem is to design an automated trading solution for portfolio allocation. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.
The components of the reinforcement learning environment are:
Action: portfolio weight of each stock is within [0,1]. We use softmax function to normalize the actions to sum to 1.
State: {Covariance Matrix, MACD, RSI, CCI, ADX}, **state space shape is (34, 30). 34 is the number of rows, 30 is the number of columns.
Reward function: r(s, a, s′) = p_t, p_t is the cumulative portfolio value.
Environment: portfolio allocation for Dow 30 constituents.
Covariance matrix is a good feature because portfolio managers use it to quantify the risk (standard deviation) associated with a particular portfolio.
We also assume no transaction cost, because we are trying to make a simple portfolio allocation case as a starting point.
Load Python Packages¶
Install the unstable development version of FinRL:
1 # Install the unstable development version in Jupyter notebook:
2 !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git
Import Packages:
1 # import packages
2 import pandas as pd
3 import numpy as np
4 import matplotlib
5 import matplotlib.pyplot as plt
6 matplotlib.use('Agg')
7 import datetime
8
9 from finrl import config
10 from finrl import config_tickers
11 from finrl.marketdata.yahoodownloader import YahooDownloader
12 from finrl.preprocessing.preprocessors import FeatureEngineer
13 from finrl.preprocessing.data import data_split
14 from finrl.env.environment import EnvSetup
15 from finrl.env.EnvMultipleStock_train import StockEnvTrain
16 from finrl.env.EnvMultipleStock_trade import StockEnvTrade
17 from finrl.model.models import DRLAgent
18 from finrl.trade.backtest import BackTestStats, BaselineStats, BackTestPlot, backtest_strat, baseline_strat
19 from finrl.trade.backtest import backtest_strat, baseline_strat
20
21 import os
22 if not os.path.exists("./" + config.DATA_SAVE_DIR):
23 os.makedirs("./" + config.DATA_SAVE_DIR)
24 if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
25 os.makedirs("./" + config.TRAINED_MODEL_DIR)
26 if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
27 os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
28 if not os.path.exists("./" + config.RESULTS_DIR):
29 os.makedirs("./" + config.RESULTS_DIR)
Download Data¶
FinRL uses a YahooDownloader class to extract data.
class YahooDownloader:
"""
Provides methods for retrieving daily stock data from Yahoo Finance API
Attributes
----------
start_date : str
start date of the data (modified from config.py)
end_date : str
end date of the data (modified from config.py)
ticker_list : list
a list of stock tickers (modified from config.py)
Methods
-------
fetch_data()
Fetches data from yahoo API
"""
Download and save the data in a pandas DataFrame:
1 # Download and save the data in a pandas DataFrame:
2 df = YahooDownloader(start_date = '2008-01-01',
3 end_date = '2020-12-01',
4 ticker_list = config_tickers.DOW_30_TICKER).fetch_data()
Preprocess Data¶
FinRL uses a FeatureEngineer class to preprocess data.
class FeatureEngineer:
"""
Provides methods for preprocessing the stock price data
Attributes
----------
df: DataFrame
data downloaded from Yahoo API
feature_number : int
number of features we used
use_technical_indicator : boolean
we technical indicator or not
use_turbulence : boolean
use turbulence index or not
Methods
-------
preprocess_data()
main method to do the feature engineering
"""
Perform Feature Engineering: covariance matrix + technical indicators:
1 # Perform Feature Engineering:
2 df = FeatureEngineer(df.copy(),
3 use_technical_indicator=True,
4 use_turbulence=False).preprocess_data()
5
6
7 # add covariance matrix as states
8 df=df.sort_values(['date','tic'],ignore_index=True)
9 df.index = df.date.factorize()[0]
10
11 cov_list = []
12 # look back is one year
13 lookback=252
14 for i in range(lookback,len(df.index.unique())):
15 data_lookback = df.loc[i-lookback:i,:]
16 price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
17 return_lookback = price_lookback.pct_change().dropna()
18 covs = return_lookback.cov().values
19 cov_list.append(covs)
20
21 df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list})
22 df = df.merge(df_cov, on='date')
23 df = df.sort_values(['date','tic']).reset_index(drop=True)
24 df.head()

Build Environment¶
FinRL uses a EnvSetup class to setup environment.
class EnvSetup:
"""
Provides methods for retrieving daily stock data from
Yahoo Finance API
Attributes
----------
stock_dim: int
number of unique stocks
hmax : int
maximum number of shares to trade
initial_amount: int
start money
transaction_cost_pct : float
transaction cost percentage per trade
reward_scaling: float
scaling factor for reward, good for training
tech_indicator_list: list
a list of technical indicator names (modified from config.py)
Methods
-------
create_env_training()
create env class for training
create_env_validation()
create env class for validation
create_env_trading()
create env class for trading
"""
Initialize an environment class:
User-defined Environment: a simulation environment class.The environment for portfolio allocation:
1 import numpy as np
2 import pandas as pd
3 from gym.utils import seeding
4 import gym
5 from gym import spaces
6 import matplotlib
7 matplotlib.use('Agg')
8 import matplotlib.pyplot as plt
9
10 class StockPortfolioEnv(gym.Env):
11 """A single stock trading environment for OpenAI gym
12 Attributes
13 ----------
14 df: DataFrame
15 input data
16 stock_dim : int
17 number of unique stocks
18 hmax : int
19 maximum number of shares to trade
20 initial_amount : int
21 start money
22 transaction_cost_pct: float
23 transaction cost percentage per trade
24 reward_scaling: float
25 scaling factor for reward, good for training
26 state_space: int
27 the dimension of input features
28 action_space: int
29 equals stock dimension
30 tech_indicator_list: list
31 a list of technical indicator names
32 turbulence_threshold: int
33 a threshold to control risk aversion
34 day: int
35 an increment number to control date
36 Methods
37 -------
38 _sell_stock()
39 perform sell action based on the sign of the action
40 _buy_stock()
41 perform buy action based on the sign of the action
42 step()
43 at each step the agent will return actions, then
44 we will calculate the reward, and return the next observation.
45 reset()
46 reset the environment
47 render()
48 use render to return other functions
49 save_asset_memory()
50 return account value at each time step
51 save_action_memory()
52 return actions/positions at each time step
53
54 """
55 metadata = {'render.modes': ['human']}
56
57 def __init__(self,
58 df,
59 stock_dim,
60 hmax,
61 initial_amount,
62 transaction_cost_pct,
63 reward_scaling,
64 state_space,
65 action_space,
66 tech_indicator_list,
67 turbulence_threshold,
68 lookback=252,
69 day = 0):
70 #super(StockEnv, self).__init__()
71 #money = 10 , scope = 1
72 self.day = day
73 self.lookback=lookback
74 self.df = df
75 self.stock_dim = stock_dim
76 self.hmax = hmax
77 self.initial_amount = initial_amount
78 self.transaction_cost_pct =transaction_cost_pct
79 self.reward_scaling = reward_scaling
80 self.state_space = state_space
81 self.action_space = action_space
82 self.tech_indicator_list = tech_indicator_list
83
84 # action_space normalization and shape is self.stock_dim
85 self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,))
86 # Shape = (34, 30)
87 # covariance matrix + technical indicators
88 self.observation_space = spaces.Box(low=0,
89 high=np.inf,
90 shape = (self.state_space+len(self.tech_indicator_list),
91 self.state_space))
92
93 # load data from a pandas dataframe
94 self.data = self.df.loc[self.day,:]
95 self.covs = self.data['cov_list'].values[0]
96 self.state = np.append(np.array(self.covs),
97 [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
98 self.terminal = False
99 self.turbulence_threshold = turbulence_threshold
100 # initalize state: inital portfolio return + individual stock return + individual weights
101 self.portfolio_value = self.initial_amount
102
103 # memorize portfolio value each step
104 self.asset_memory = [self.initial_amount]
105 # memorize portfolio return each step
106 self.portfolio_return_memory = [0]
107 self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
108 self.date_memory=[self.data.date.unique()[0]]
109
110
111 def step(self, actions):
112 # print(self.day)
113 self.terminal = self.day >= len(self.df.index.unique())-1
114 # print(actions)
115
116 if self.terminal:
117 df = pd.DataFrame(self.portfolio_return_memory)
118 df.columns = ['daily_return']
119 plt.plot(df.daily_return.cumsum(),'r')
120 plt.savefig('results/cumulative_reward.png')
121 plt.close()
122
123 plt.plot(self.portfolio_return_memory,'r')
124 plt.savefig('results/rewards.png')
125 plt.close()
126
127 print("=================================")
128 print("begin_total_asset:{}".format(self.asset_memory[0]))
129 print("end_total_asset:{}".format(self.portfolio_value))
130
131 df_daily_return = pd.DataFrame(self.portfolio_return_memory)
132 df_daily_return.columns = ['daily_return']
133 if df_daily_return['daily_return'].std() !=0:
134 sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
135 df_daily_return['daily_return'].std()
136 print("Sharpe: ",sharpe)
137 print("=================================")
138
139 return self.state, self.reward, self.terminal,{}
140
141 else:
142 #print(actions)
143 # actions are the portfolio weight
144 # normalize to sum of 1
145 norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
146 weights = norm_actions
147 #print(weights)
148 self.actions_memory.append(weights)
149 last_day_memory = self.data
150
151 #load next state
152 self.day += 1
153 self.data = self.df.loc[self.day,:]
154 self.covs = self.data['cov_list'].values[0]
155 self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
156 # calcualte portfolio return
157 # individual stocks' return * weight
158 portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
159 # update portfolio value
160 new_portfolio_value = self.portfolio_value*(1+portfolio_return)
161 self.portfolio_value = new_portfolio_value
162
163 # save into memory
164 self.portfolio_return_memory.append(portfolio_return)
165 self.date_memory.append(self.data.date.unique()[0])
166 self.asset_memory.append(new_portfolio_value)
167
168 # the reward is the new portfolio value or end portfolo value
169 self.reward = new_portfolio_value
170 #self.reward = self.reward*self.reward_scaling
171
172
173 return self.state, self.reward, self.terminal, {}
174
175 def reset(self):
176 self.asset_memory = [self.initial_amount]
177 self.day = 0
178 self.data = self.df.loc[self.day,:]
179 # load states
180 self.covs = self.data['cov_list'].values[0]
181 self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
182 self.portfolio_value = self.initial_amount
183 #self.cost = 0
184 #self.trades = 0
185 self.terminal = False
186 self.portfolio_return_memory = [0]
187 self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
188 self.date_memory=[self.data.date.unique()[0]]
189 return self.state
190
191 def render(self, mode='human'):
192 return self.state
193
194 def save_asset_memory(self):
195 date_list = self.date_memory
196 portfolio_return = self.portfolio_return_memory
197 #print(len(date_list))
198 #print(len(asset_list))
199 df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
200 return df_account_value
201
202 def save_action_memory(self):
203 # date and close price length must match actions length
204 date_list = self.date_memory
205 df_date = pd.DataFrame(date_list)
206 df_date.columns = ['date']
207
208 action_list = self.actions_memory
209 df_actions = pd.DataFrame(action_list)
210 df_actions.columns = self.data.tic.values
211 df_actions.index = df_date.date
212 #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
213 return df_actions
214
215 def _seed(self, seed=None):
216 self.np_random, seed = seeding.np_random(seed)
217 return [seed]
Implement DRL Algorithms¶
FinRL uses a DRLAgent class to implement the algorithms.
class DRLAgent:
"""
Provides implementations for DRL algorithms
Attributes
----------
env: gym environment class
user-defined class
Methods
-------
train_PPO()
the implementation for PPO algorithm
train_A2C()
the implementation for A2C algorithm
train_DDPG()
the implementation for DDPG algorithm
train_TD3()
the implementation for TD3 algorithm
DRL_prediction()
make a prediction in a test dataset and get results
"""
Model Training:
We use A2C for portfolio allocation, because it is stable, cost-effective, faster and works better with large batch sizes.
Trading:Assume that we have $1,000,000 initial capital at 2019/01/01. We use the A2C model to perform portfolio allocation of the Dow 30 stocks.
1 trade = data_split(df,'2019-01-01', '2020-12-01')
2
3 env_trade, obs_trade = env_setup.create_env_trading(data = trade,
4 env_class = StockPortfolioEnv)
5
6 df_daily_return, df_actions = DRLAgent.DRL_prediction(model=model_a2c,
7 test_data = trade,
8 test_env = env_trade,
9 test_obs = obs_trade)

The output actions or the portfolio weights look like this:

Backtesting Performance¶
FinRL uses a set of functions to do the backtesting with Quantopian pyfolio.
1 from pyfolio import timeseries
2 DRL_strat = backtest_strat(df_daily_return)
3 perf_func = timeseries.perf_stats
4 perf_stats_all = perf_func( returns=DRL_strat,
5 factor_returns=DRL_strat,
6 positions=None, transactions=None, turnover_denom="AGB")
7 print("==============DRL Strategy Stats===========")
8 perf_stats_all
9 print("==============Get Index Stats===========")
10 baesline_perf_stats=BaselineStats('^DJI',
11 baseline_start = '2019-01-01',
12 baseline_end = '2020-12-01')
13
14
15 # plot
16 dji, dow_strat = baseline_strat('^DJI','2019-01-01','2020-12-01')
17 import pyfolio
18 %matplotlib inline
19 with pyfolio.plotting.plotting_context(font_scale=1.1):
20 pyfolio.create_full_tear_sheet(returns = DRL_strat,
21 benchmark_rets=dow_strat, set_context=False)
The left table is the stats for backtesting performance, the right table is the stats for Index (DJIA) performance.
Plots: