Trajectory Interpolator
========================
Quick Introduction
~~~~~~~~~~~~~~~~~~~~~
=========================================
Why do we need trajectory interpolation?
=========================================
For a majority of policies in robot learning research, we care about the action output of a policy $a_{t}=\pi(a_{t}|\cdot)$. Such a policy is typically categorized as a high-level decision making. It happens at a *low-frequency*, which is different from the *high-frequency* control $u$ a robot manipulator requires.
Therefore, in most cases, what actually happens is that the robot controllers will expand the high-level action commands into a sequence of fine-grained actions. More concretely, for an action $a_{t}$, we computes $T$ high-frequency control commands $\{u^{(t)}_{i}\}_{i=1}^{T}$ using trajectory interpolation. Interpolated control commands $\{u^{(t)}_{i}\}$ are converted to torque commands $\{\tau_{i}\}$ that are sent to the robot via robot's control api. For simplicity, we assume $a\in\mathcal{A}$, where $\mathcal{A}$ is the action space of joint positions and cartesian positions (Note that it can also contains velocities, but we do not consider in this codebase at the moment).
==============================================
How is a trajectory interpolation implemented?
==============================================
In our codebase, our default setting is 20Hz for $a_{t}$ and 500Hz for $u^{(t)}_{i}$. You might notice that Franka Emika Panda requires 1000Hz control loop. We fulfill this requirement by repeating each $u^{(t)}_{i}$ twice to avoid jittering motion. In summary, each action $a_{t}$ action from the high-level policy is converted to 25 $u^{(t)}_{i}$ commands by default. The following animation visualizes how trajectory interpolations work.
.. raw:: html
.. .. image:: ../_images/traj_interpolation_explanation.mp4
.. :width: 100%
A most straightforward way to compute interpolation commands is to interpolate between the current robot state and the given action command. However, this introduces jitterness in actual control. The reason is that the previous commands are computed based on $a_{t-1}$, not the current robot state $x_{t}$. There are always errors between $x_{t}$ and $a_{t-1}$. Therefore, if the interpolation is between $x_{t}$ and $a_{t}$, it introduces nonsmoothness in the high-frequency control commands $\{u^{(t)}_{i}\}$, and you would hear the noise when robots move. To ensure smoothness in the the commands, we interpolate between $a_{t-1}$ and $a_{t}$.
Trajectory Interpolator Type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this codebase, we majorly implement two types of interpolators: linear interpolators and min jerk interpolators. One interpolates linearly between $a_{t-1}$ and $a_{t}$, and the other interpolates to regularize the jerkiness between each point. Both types are implemented for joint positions and cartesian poses.
===================
Linear Interpolator
===================
Linear interpolator is very simple and straightforward. It aims to computes control commands $\{u^{(t)}_{i}\}$ that evenly divide between $a_{t-1}$ and $a_{t}$:
$$\hat{t} = \min(\max(\frac{i}{\alpha T}, 0), 1)$$
$$u^{(t)}_{i} = (a_{t} - a_{t-1})\hat{t} + a_{t-1}$$
where $\alpha \in [0.1, 1.0] $ which tunes the time fraction of trajectory interpolation within each $T$ steps. For example, if $alpha=0.33$, that means that the control command becomes $a_{t}$ within $\frac{1}{3}$ of an interpolation interval, and remain the same for the rest of $\frac{2}{3}$ within the $T$ step interpolation interval.
.. note::
You can set $\alpha$ smaller than $0.1$, but that would cause rapid motion that are not desired. Therefore, we clip the value with a lower bound $0.1$.
=====================
Min Jerk Interpolator
=====================
The way to regularize the jerkiness between each control commmand is to make sure that their high-order derivaives are continuous and smooth. This is the basic idea behind min jerk interpolators. The following formula shows how the min jerk interpolators are implemented in ``Deoxys``.
$$t' = \min(\max(\frac{i}{\alpha T}, 0), 1)$$
$$\hat{t} = 10{t'}^{3} - 15{t'}^{4} + 6{t'}^{5}$$
$$u^{(t)}_{i} = (a_{t} - a_{t-1})\hat{t} + a_{t-1}$$
where $\alpha \in [0.1, 1.0] $, which is defined in the same way as in linear interpolator case.