You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document describes the default agent used in the Deep Hedging code base. This is not the agent used in our Deep Hedging paper but is a more advanced version.
The Deep Hedging problem for a horizon $T$ hedged over $M$ time steps with $N$ hedging instruments is finding an optimal action function $a$ as a function of feature states $s_0,\ldots,s_{T-1}$ which solves
c.f. the description in the main document. The objective function $\mathrm{U}$ is a OCE monetary utility which is itself given as
$$
\mathrm{U}(X) := \sup_y:\ \mathrm{E}\left[
u(X+y) - y
\right]
$$
A number of different functions $u$ are supported in objectives.py. Classic choices are the entropy with $u(x) = (1-\exp(-\lambda x))/\lambda$ or CVaR with $u(x)=(1+\lambda) \min(0, x)$. This corresponds to CVaR@p at confidence $p = \lambda/(1+\lambda)$. For example $\lambda=1$ corresponds to CVaR@50%.
The agent in our problem setup is the functional $a$ which maps the current state$s_t$ to an action, which in this case is simply how many units of the $n$ hedging instruments to buy in $t$. In practise, $s_t$ represents the features selected from the available features at time $t$.
Initial Delta
The action at time zero is actually given by $a_0$ from the network described above plus an initial delta $a^\mathrm{init}$. The reason for this is that unless the portfolio $Z_T$ provided already contains the initial hedge, this hedge will look very different then subsequent hedges. Therefore, it is easier to train an additional $a^\mathrm{init}$.
Recurrence
The agent provided in agent.py provides for both "recurrent" and non-recurrent features. It should be noted that since the state at time $s_t$ contains the previous action $a_{t-1}$ as well as the aggregate position $\delta_{t-1}$ strictly speaking even a "non-recurrent" agent is actually recurrent.
where $F$ is a neural network. This is the original recurrent network formulation and suffers from both gradient explosion and long term memory loss. To alleviate this somewhat we restrict $h_t$ to $(-1,+1)$.
Aggregate States represent aggregate statistics of the path, such as realized vol, skew or another functions of the path. The prototype exponential aggregation function for such a hidden state $h$ is given as
where $F$ is a neural network, and where $z_t$ is an "update gate vector". This is also known as a "gated recurrent unit" and is similar to an LSTM node.
In quant finance such states are often written in diffusion notation with $z_t \equiv \kappa_t dt$ where $\kappa_t\geq 0$ is a mean-reversion speed. In this case the $dt\downarrow 0$ limit becomes
The appendix provides the derivation of this formula from its associated SDE.
Past Representation States: information gathered at a fixed points in time, for example the spot level at a reset date. Such data are not accessible before their observation time.
The prototype equation for such a process is
which looks similar as before but where $z_t$ now only takes values in $\{0,1\}$. This allows encoding, for example, the spot level at a given fixed time $\tau$.
Event States which track e.g. whether a barrier was breached. This looks similar to the previous example, but the event itself has values in $\{0,1\}$.
where we need to make sure that $h_{-1}\in\{0,1\}$, too.
Definition of the Network
When using the code, the agent is defined as follows: let $s_t\in\mathbb{R}^{m_\mathrm{s}}$ feature state of the network, and assume we are modelling $m_\mathrm{c}$ classic states, $m_\mathrm{a}$ aggregate states, amd $m_\mathrm{r}$ past representation states. In case the sample path generated by the market are all identical at time step zero, then
the initial states are plain variables $h^c_{-1}\in\mathbb{R}^{m_c},\ldots,h^e_{-1}\in\mathbb{R}^{m_e}$ we will lean. See below comments for further information on how to handle non-trivial initial market states.
Let $m:=m_s+m_c+m_a+m_r+m_e$ be the dimension of the input vector for the network in each step.
At step $t$, assume now that $(s_t,h^c_{t-1},h^a_{t-1},h^r_{t-1},h^e_{t-1})\in \mathbb{R}^m$.
We call $w\in\mathbb{N}$ the width of the network, and $d\in\mathbb{N}$ its depth. Also assume that $\alpha:\mathbb{R}\rightarrow \mathbb{R}$ is an activation function with the usual convention of elementwise application for vector arguments. We will also use a final_activation function $\ell:\mathbb{R}\rightarrow\mathbb{R}$ which will usually be linear.
Usually, a state $s_t$ has different values for different samples. However, the initial state $s_0$ often does not: if we start a market simulation for today, then "today" is fixed for each simulated path.
However, there are a number of applications where this is not necessarily the case:
Training with uncertainty in the initial step. An extreme case if we want to train a model to learn hedging a portfolio $Z_T$ for different current states.
Training for a number of payoffs $Z_T$ at the same time. In this case, the portfolio $Z_T$ may change per sample.
Trainig for different risk-aversion parameters at the same time, to obtain an "efficient frontier".
The examples (2) and (3) are covered in https://arxiv.org/abs/2207.07467. Case (1) is supported by our world simulator: it allows to start "today" in an (approximate) invariant state. In this case spot levels for the hedging instruments different at time $0$ across samples.
The variables affected in the Deep Hedging Framework here are:
The variable $y$ in the OCE objective.
The initial delta $a^\mathrm{init}$.
The initial hidden states $h_{t-1}$ of our agents.
By default, they are all real variables, which corresponds to the case of a constant initial state $s_0$. The can all be turned into networks if this changes. The definitions follow the same logic as the network definitions above.