A useful technique in control theory is optimal control, which lets you quantify how good a controller is, and then pick the best one.
Problem is, this needs a system model - and if the model is wrong, then the controller is unlikely to be optimal or stable.
Pilco use a gaussian process to model dynamics, which captures both model uncertainty and process noise.
This model is trained using data collected on the robot.
Pilco Probabilistic Inference for Learning Control
Optimal control finds a policy for a system
minimizing a cost function over a horizon
\[
\begin{alignat}{2}
\text{find}&& \quad
\pi^*(\bm{x}) &= \argmin_{\pi(\bm{x})}
J(\bm{x}\tind\cdot, \bm{u}\tind\cdot)
\quad \text{where} \quad
J(\bm{x}\tind\cdot, \bm{u}\tind\cdot) = \class{c-orange}{\sum_{i=0}^{i=N}}
\class{c-purple}{c(\bm{x}\tind{i}, \bm{u}\tind{i})}\quad \\ \label{eq:optimal}
\text{st.}&& \quad
\class{c-green}{\bm{x}\tind{i+1}} &\class{c-green}{= f(\bm{x}\tind{i}, \bm{u}\tind{i})} \\ \nonumber
&&
\class{c-blue}{\bm{u}\tind{i}} &\class{c-blue}{= \pi(\bm{x}\tind{i})}\, \nonumber
\end{alignat}
\]
Needs a system model
— Pilco learns a probabilistic one using Gaussian process regression
\[
\begin{align}
\bm{x}\tind{i + 1} - \bm{x}\tind{i} &= \class{c-cyan}{f_j(\bm{z}\tind{i})
\sim \mathrm{GP}(m_j(\bm{z}), K_j(\bm{z}_1, \bm{z}_2))},
&
\bm{z}\tind{i} &= \begin{bmatrix}\bm{x}\tind{i} \\ \bm{u}\tind{i}\end{bmatrix}
\end{align}
\]
Advantages: data-efficient, immune to modelling mistakes
Let's recap Gaussian processes quickly.
A Gaussian, or normal distribution, is a distribution over scalars.
We can extend this to a distribution over vectors with a multivariate gaussian.
Taking this further, a gaussian process is a distribution over functions.
As a reminder, for pilco, the functions in question are the one-step update dynamics.
Gaussian processes
Scalar: $\class{c-green}{x} \sim \mathcal{N}(\mu, \sigma^2)$
Vector: $\class{c-green}{\begin{bmatrix}x_0 \\ x_1\end{bmatrix}} \sim \mathcal{N}(\mu, \Sigma)$
Function: $\class{c-green}{x(\cdot)}\sim \mathrm{GP}(\class{c-blue}{m(\cdot)}, K(\cdot,\cdot))$
Function: $\class{c-green}{\bm{x}\tind{i + 1} - \bm{x}\tind{i}}\sim \mathrm{GP}(\class{c-blue}{m(\cdot)}, K(\cdot,\cdot))$
Let's move on to the improvements suggested by the previous work.
The first was to remove the manual process of putting controllers into the robot,
and allowing the USB cable tether to be eliminated.
Suggested Improvements
Automate data transfer with the
hardware ✔
Improve physical release procedure ✔
Add simulation model of the small
unicycle ✔
Redesign the hardware to increase the roll
limit ?
The missing piece here was a serial communication protocol.
For this you need two parts - an application layer, to represent structured messages as
bytestrings - for which I used protobuf;
and a framing layer to delineate disjoint bytestrings in a continous stream, for which I used
COBS.
With this in place, it was easy to define messages to change controller policy and extract
state logs.
Data transfer
Application layer Protocol buffers
syntax = "proto3";
message DebugMessage {
string s = 1;
DebugLevel level = 2;
}
message RobotMessage {
oneof msg {
LogBundle log_bundle = 1;
DebugMessage debug = 2;
LogEntry single_log = 3;
}
}
Framing layer COBS (Consistent Overhead Byte Stuffing)
Messages
F1 00 D5
,
0F
,
C0 FF EE
→
Stream
02 F1 02 D5 00
02 0F 00
04 C0 FF EE 00
Count and null-terminator bytes
Next we move onto improving the release procedure.
Suggested Improvements
Automate data transfer with the
hardware ✔
Improve physical release procedure ✔
Add simulation model of the small
unicycle ✔
Redesign the hardware to increase the roll
limit ?
For each controller, the robot must be released from vertical just before it starts
controlling itself.
This is done by hand, waiting for an LED - so reaction time comes into play.
You can see we can get a small improvement by switching to an audio queue - but we can do even
better if we reverse the roles, and have the robot respond to the human instead of the other way around.
We do this by adding a switch that is released at the same time as the robot.
The other issue is pilco assumes the robot is released in the same orientation each time.
Since building a rig or learning this orientation was impractical, I chose to instead simply
inform pilco of the initial orientation via the accelerometer.
Physical release procedure
Reaction times
Gideon Praveen Kumar Jose Shelton. “Comparison between Auditory and Visual Simple
Reaction Times”. In: Neuroscience and Medicine 1.1 (2010)
Human visual — 330ms
Human audio — 280ms
Arduino switch release — really fast!
Repeatability of orientation
Build a rig to drop perfectly ✘ Impractical
Learn the initial orientation ✘ Must be online
Measure the initial orientation ✔ Use the accelerometer
Back to our list of improvements, we move onto adding an accurate simulation model
Suggested Improvements
Automate data transfer with the
hardware ✔
Improve physical release procedure ✔
Add simulation model of the small
unicycle ✔
Redesign the hardware to increase the roll
limit ?
Pilco contains simulations primarily for testing reasons - its much faster to try learning
a controller on a system that doesn't require physical intervention to reset.
A simulated model of the 1m unicycle already existed - as part of this project, parameters
for the smaller model were roughly estimated based on dimensions and material properties -
exact matches of moments of inertia to the hardware were not important.
Simulation models
Software-only tests of Pilco
Only have a model for the 1m unicycle
Approximations of physical parameters good enough
The last improvement suggested was the most involved one, suggesting that the robot needs
to be able to fall over further in roll to learn.
Let's look at this one in more detail.
Suggested Improvements
Automate data transfer with the
hardware ✔
Measure initial robot orientation
accelerometer ✔
Add simulation model of the small
unicycle ✔
Redesign the hardware to increase the roll
limit ?
Effect of roll limit
Roll limit is referring to how far the robot can fall over in roll before its wheel leaves the ground,
and recovery becomes impossible.
Let's take a look at the performance of the simulation under different roll limits.
An easy metric for performance is the length of time the robot stays upright, as a function of iteration number.
Here's what we get for an idealized unicycle.
\[
\theta_\text{max} =
\class{c-blue}{90°}
\]
Idealized
And here's the unicycle with the physical roll limit modelled. Clearly, there's a problem.
Earlier work suggested we redo the frame to reduce this - optimistically, 45 degrees is achievable.
\[
\theta_\text{max} =
\class{c-blue}{90°},\quad
\class{c-yellow}{17°}
\]
Idealized ,
Current
But that doesn't really help.
We can't get much more from this graph as is.
\[
\theta_\text{max} =
\class{c-blue}{90°},\quad
\class{c-orange}{45°},\quad
\class{c-yellow}{17°}
\]
Idealized ,
Proposed design ,
Current
Let's add a line to represent each run of the robot...
so that we can introduce another dimension, plotting the loss function over time.
Again, the ends of the line indicate where the robot fell over
It's already noticeable that the yellow and red lines end at far lower a loss function.
Eliminating the iterations axis, this is even more pronounced.
\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2}
\right]\right)
\]
Falling over when the loss function is low makes the controller think it is doing quite
well - we need our loss function to be close to 1 once a fall has occured in order to
effectively teach the controller not to fall or get close to falling.
\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2}
\right]\right)
\]
Pilco also tracks the expected distribution of loss functions at a given time, here
shown for the 45th iteration, which shows us that not only that the loss indicates that
the robot did not fall over, but that the chance of it doing so is very small.
Let's fix the loss function to penalize falling over in roll.
\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2}
\right]\right)
\]
Before, showing distribution of predictions
We do this by adding an additional term including the roll angle.
As soon as we do this, we see that all of our runs last much longer.
I'll show this from the original view for good measure, so we can compare before and after one more time.
\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2} + \class{c-purple}{\tfrac{\theta^2}{\theta_\text{max}^2}}
\right]\right)
\]
After, showing distribution of predictions
\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2} + \class{c-purple}{\tfrac{\theta^2}{\theta_\text{max}^2}}
\right]\right)
\]
This investigation showed that with a corrected loss, a simulated unicycle with a 17 degree
roll limit is able to perform very similarly to one with a 45-degree limit, suggesting...
With roll loss term
Suggested Improvements
Automate data transfer with the
hardware ✔
Measure initial robot orientation
accelerometer ✔
Add simulation model of the small
unicycle ✔
Redesign the hardware to increase the roll
limit ?
✘ Improvement is small after fixing the loss function