# Portfolio Allocation¶

Presented at NeurIPS 2020: Deep RL Workshop.

The Jupyter notebook codes are available on our Github and Google Colab.

Tip

Check our previous tutorials: Single Stock Trading and Multiple Stock Trading for detailed explanation of the FinRL architecture and modules.

## Overview¶

To begin with, we would like to explain the logic of portfolio allocation using Deep Reinforcement Learning.We use Dow 30 constituents as an example throughout this article, because those are the most popular stocks.

Let’s say that we got a million dollars at the beginning of 2019. We want to invest this \$1,000,000 into stock markets, in this case is Dow Jones 30 constituents.Assume that no margin, no short sale, no treasury bill (use all the money to trade only these 30 stocks). So that the weight of each individual stock is non-negative, and the weights of all the stocks add up to one.

We hire a smart portfolio manager- Mr. Deep Reinforcement Learning. Mr. DRL will give us daily advice includes the portfolio weights or the proportions of money to invest in these 30 stocks. So every day we just need to rebalance the portfolio weights of the stocks.The basic logic is as follows. Portfolio allocation is different from multiple stock trading because we are essentially rebalancing the weights at each time step, and we have to use all available money.

The traditional and the most popular way of doing portfolio allocation is mean-variance or modern portfolio theory (MPT): However, MPT performs not so well in out-of-sample data. MPT is calculated only based on stock returns, if we want to take other relevant factors into account, for example some of the technical indicators like MACD or RSI, MPT may not be able to combine these information together well.

We introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance. FinRL is a DRL library designed specifically for automated stock trading with an effort for educational and demonstrative purpose.

## Problem Definition¶

This problem is to design an automated trading solution for portfolio allocation. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The components of the reinforcement learning environment are:

• Action: portfolio weight of each stock is within [0,1]. We use softmax function to normalize the actions to sum to 1.

• State: {Covariance Matrix, MACD, RSI, CCI, ADX}, **state space shape is (34, 30). 34 is the number of rows, 30 is the number of columns.

• Reward function: r(s, a, s′) = p_t, p_t is the cumulative portfolio value.

• Environment: portfolio allocation for Dow 30 constituents.

Covariance matrix is a good feature because portfolio managers use it to quantify the risk (standard deviation) associated with a particular portfolio.

We also assume no transaction cost, because we are trying to make a simple portfolio allocation case as a starting point.

Install the unstable development version of FinRL:

```1 # Install the unstable development version in Jupyter notebook:
2 !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git
```

Import Packages:

``` 1 # import packages
2 import pandas as pd
3 import numpy as np
4 import matplotlib
5 import matplotlib.pyplot as plt
6 matplotlib.use('Agg')
7 import datetime
8
9 from finrl import config
10 from finrl import config_tickers
12 from finrl.preprocessing.preprocessors import FeatureEngineer
13 from finrl.preprocessing.data import data_split
14 from finrl.env.environment import EnvSetup
15 from finrl.env.EnvMultipleStock_train import StockEnvTrain
17 from finrl.model.models import DRLAgent
18 from finrl.trade.backtest import BackTestStats, BaselineStats, BackTestPlot, backtest_strat, baseline_strat
19 from finrl.trade.backtest import backtest_strat, baseline_strat
20
21 import os
22 if not os.path.exists("./" + config.DATA_SAVE_DIR):
23     os.makedirs("./" + config.DATA_SAVE_DIR)
24 if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
25     os.makedirs("./" + config.TRAINED_MODEL_DIR)
26 if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
27     os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
28 if not os.path.exists("./" + config.RESULTS_DIR):
29     os.makedirs("./" + config.RESULTS_DIR)
```

```class YahooDownloader:
"""
Provides methods for retrieving daily stock data from Yahoo Finance API

Attributes
----------
start_date : str
start date of the data (modified from config.py)
end_date : str
end date of the data (modified from config.py)
ticker_list : list
a list of stock tickers (modified from config.py)

Methods
-------
fetch_data()
Fetches data from yahoo API
"""
```

```1 # Download and save the data in a pandas DataFrame:
3                      end_date = '2020-12-01',
4                      ticker_list = config_tickers.DOW_30_TICKER).fetch_data()
```

## Preprocess Data¶

FinRL uses a FeatureEngineer class to preprocess data.

```class FeatureEngineer:
"""
Provides methods for preprocessing the stock price data

Attributes
----------
df: DataFrame
feature_number : int
number of features we used
use_technical_indicator : boolean
we technical indicator or not
use_turbulence : boolean
use turbulence index or not

Methods
-------
preprocess_data()
main method to do the feature engineering
"""
```

Perform Feature Engineering: covariance matrix + technical indicators:

``` 1 # Perform Feature Engineering:
2 df = FeatureEngineer(df.copy(),
3                     use_technical_indicator=True,
4                     use_turbulence=False).preprocess_data()
5
6
7 # add covariance matrix as states
8 df=df.sort_values(['date','tic'],ignore_index=True)
9 df.index = df.date.factorize()
10
11 cov_list = []
12 # look back is one year
13 lookback=252
14 for i in range(lookback,len(df.index.unique())):
15   data_lookback = df.loc[i-lookback:i,:]
16   price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
17   return_lookback = price_lookback.pct_change().dropna()
18   covs = return_lookback.cov().values
19   cov_list.append(covs)
20
21 df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list})
22 df = df.merge(df_cov, on='date')
23 df = df.sort_values(['date','tic']).reset_index(drop=True)
``` ## Build Environment¶

FinRL uses a EnvSetup class to setup environment.

```class EnvSetup:
"""
Provides methods for retrieving daily stock data from
Yahoo Finance API

Attributes
----------
stock_dim: int
number of unique stocks
hmax : int
maximum number of shares to trade
initial_amount: int
start money
transaction_cost_pct : float
reward_scaling: float
scaling factor for reward, good for training
tech_indicator_list: list
a list of technical indicator names (modified from config.py)
Methods
-------
create_env_training()
create env class for training
create_env_validation()
create env class for validation
"""
```

Initialize an environment class:

User-defined Environment: a simulation environment class.The environment for portfolio allocation:

```  1 import numpy as np
2 import pandas as pd
3 from gym.utils import seeding
4 import gym
5 from gym import spaces
6 import matplotlib
7 matplotlib.use('Agg')
8 import matplotlib.pyplot as plt
9
10 class StockPortfolioEnv(gym.Env):
11     """A single stock trading environment for OpenAI gym
12     Attributes
13     ----------
14         df: DataFrame
15             input data
16         stock_dim : int
17             number of unique stocks
18         hmax : int
19             maximum number of shares to trade
20         initial_amount : int
21             start money
22         transaction_cost_pct: float
23             transaction cost percentage per trade
24         reward_scaling: float
25             scaling factor for reward, good for training
26         state_space: int
27             the dimension of input features
28         action_space: int
29             equals stock dimension
30         tech_indicator_list: list
31             a list of technical indicator names
32         turbulence_threshold: int
33             a threshold to control risk aversion
34         day: int
35             an increment number to control date
36     Methods
37     -------
38     _sell_stock()
39         perform sell action based on the sign of the action
41         perform buy action based on the sign of the action
42     step()
43         at each step the agent will return actions, then
44         we will calculate the reward, and return the next observation.
45     reset()
46         reset the environment
47     render()
48         use render to return other functions
49     save_asset_memory()
50         return account value at each time step
51     save_action_memory()
52         return actions/positions at each time step
53
54     """
56
57     def __init__(self,
58                 df,
59                 stock_dim,
60                 hmax,
61                 initial_amount,
62                 transaction_cost_pct,
63                 reward_scaling,
64                 state_space,
65                 action_space,
66                 tech_indicator_list,
67                 turbulence_threshold,
68                 lookback=252,
69                 day = 0):
70         #super(StockEnv, self).__init__()
71         #money = 10 , scope = 1
72         self.day = day
73         self.lookback=lookback
74         self.df = df
75         self.stock_dim = stock_dim
76         self.hmax = hmax
77         self.initial_amount = initial_amount
78         self.transaction_cost_pct =transaction_cost_pct
79         self.reward_scaling = reward_scaling
80         self.state_space = state_space
81         self.action_space = action_space
82         self.tech_indicator_list = tech_indicator_list
83
84         # action_space normalization and shape is self.stock_dim
85         self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,))
86         # Shape = (34, 30)
87         # covariance matrix + technical indicators
88         self.observation_space = spaces.Box(low=0,
89                                             high=np.inf,
90                                             shape = (self.state_space+len(self.tech_indicator_list),
91                                                      self.state_space))
92
93         # load data from a pandas dataframe
94         self.data = self.df.loc[self.day,:]
95         self.covs = self.data['cov_list'].values
96         self.state =  np.append(np.array(self.covs),
97                       [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
98         self.terminal = False
99         self.turbulence_threshold = turbulence_threshold
100         # initalize state: inital portfolio return + individual stock return + individual weights
101         self.portfolio_value = self.initial_amount
102
103         # memorize portfolio value each step
104         self.asset_memory = [self.initial_amount]
105         # memorize portfolio return each step
106         self.portfolio_return_memory = 
107         self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
108         self.date_memory=[self.data.date.unique()]
109
110
111     def step(self, actions):
112         # print(self.day)
113         self.terminal = self.day >= len(self.df.index.unique())-1
114         # print(actions)
115
116         if self.terminal:
117             df = pd.DataFrame(self.portfolio_return_memory)
118             df.columns = ['daily_return']
119             plt.plot(df.daily_return.cumsum(),'r')
120             plt.savefig('results/cumulative_reward.png')
121             plt.close()
122
123             plt.plot(self.portfolio_return_memory,'r')
124             plt.savefig('results/rewards.png')
125             plt.close()
126
127             print("=================================")
128             print("begin_total_asset:{}".format(self.asset_memory))
129             print("end_total_asset:{}".format(self.portfolio_value))
130
131             df_daily_return = pd.DataFrame(self.portfolio_return_memory)
132             df_daily_return.columns = ['daily_return']
133             if df_daily_return['daily_return'].std() !=0:
134               sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
135                        df_daily_return['daily_return'].std()
136               print("Sharpe: ",sharpe)
137             print("=================================")
138
139             return self.state, self.reward, self.terminal,{}
140
141         else:
142             #print(actions)
143             # actions are the portfolio weight
144             # normalize to sum of 1
145             norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
146             weights = norm_actions
147             #print(weights)
148             self.actions_memory.append(weights)
149             last_day_memory = self.data
150
152             self.day += 1
153             self.data = self.df.loc[self.day,:]
154             self.covs = self.data['cov_list'].values
155             self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
156             # calcualte portfolio return
157             # individual stocks' return * weight
158             portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
159             # update portfolio value
160             new_portfolio_value = self.portfolio_value*(1+portfolio_return)
161             self.portfolio_value = new_portfolio_value
162
163             # save into memory
164             self.portfolio_return_memory.append(portfolio_return)
165             self.date_memory.append(self.data.date.unique())
166             self.asset_memory.append(new_portfolio_value)
167
168             # the reward is the new portfolio value or end portfolo value
169             self.reward = new_portfolio_value
170             #self.reward = self.reward*self.reward_scaling
171
172
173         return self.state, self.reward, self.terminal, {}
174
175     def reset(self):
176         self.asset_memory = [self.initial_amount]
177         self.day = 0
178         self.data = self.df.loc[self.day,:]
180         self.covs = self.data['cov_list'].values
181         self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
182         self.portfolio_value = self.initial_amount
183         #self.cost = 0
185         self.terminal = False
186         self.portfolio_return_memory = 
187         self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
188         self.date_memory=[self.data.date.unique()]
189         return self.state
190
191     def render(self, mode='human'):
192         return self.state
193
194     def save_asset_memory(self):
195         date_list = self.date_memory
196         portfolio_return = self.portfolio_return_memory
197         #print(len(date_list))
198         #print(len(asset_list))
199         df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
200         return df_account_value
201
202     def save_action_memory(self):
203         # date and close price length must match actions length
204         date_list = self.date_memory
205         df_date = pd.DataFrame(date_list)
206         df_date.columns = ['date']
207
208         action_list = self.actions_memory
209         df_actions = pd.DataFrame(action_list)
210         df_actions.columns = self.data.tic.values
211         df_actions.index = df_date.date
212         #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
213         return df_actions
214
215     def _seed(self, seed=None):
216         self.np_random, seed = seeding.np_random(seed)
217         return [seed]
```

## Implement DRL Algorithms¶

FinRL uses a DRLAgent class to implement the algorithms.

```class DRLAgent:
"""
Provides implementations for DRL algorithms

Attributes
----------
env: gym environment class
user-defined class
Methods
-------
train_PPO()
the implementation for PPO algorithm
train_A2C()
the implementation for A2C algorithm
train_DDPG()
the implementation for DDPG algorithm
train_TD3()
the implementation for TD3 algorithm
DRL_prediction()
make a prediction in a test dataset and get results
"""
```

Model Training:

We use A2C for portfolio allocation, because it is stable, cost-effective, faster and works better with large batch sizes.

Trading:Assume that we have \$1,000,000 initial capital at 2019/01/01. We use the A2C model to perform portfolio allocation of the Dow 30 stocks.

```1 trade = data_split(df,'2019-01-01', '2020-12-01')
2
4                                          env_class = StockPortfolioEnv)
5
6 df_daily_return, df_actions = DRLAgent.DRL_prediction(model=model_a2c,
``` The output actions or the portfolio weights look like this: ## Backtesting Performance¶

FinRL uses a set of functions to do the backtesting with Quantopian pyfolio.

``` 1 from pyfolio import timeseries
2 DRL_strat = backtest_strat(df_daily_return)
3 perf_func = timeseries.perf_stats
4 perf_stats_all = perf_func( returns=DRL_strat,
5                               factor_returns=DRL_strat,
6                                 positions=None, transactions=None, turnover_denom="AGB")
7 print("==============DRL Strategy Stats===========")
8 perf_stats_all
9 print("==============Get Index Stats===========")
10 baesline_perf_stats=BaselineStats('^DJI',
11                                   baseline_start = '2019-01-01',
12                                   baseline_end = '2020-12-01')
13
14
15 # plot
16 dji, dow_strat = baseline_strat('^DJI','2019-01-01','2020-12-01')
17 import pyfolio
18 %matplotlib inline
19 with pyfolio.plotting.plotting_context(font_scale=1.1):
20         pyfolio.create_full_tear_sheet(returns = DRL_strat,
21                                        benchmark_rets=dow_strat, set_context=False)
```

The left table is the stats for backtesting performance, the right table is the stats for Index (DJIA) performance.

Plots: