\[
\newcommand{\tind}[1]{^{(#1)}}
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand{\bm}[1]{\mathbf{#1}}
\]

Eric Wieser (`efw27@`

)

MEng. Project 2016-2017

Supervised by Prof. Carl Rasmussen (`cer54@`

)

Balance this unicycle

Fix problems identified in previous work

Review source code and fix bugs found

Improve the tools to make similar projects easier in future

- System overview
- Summary of the method, Pilco
- Errors in earlier work
- Suggested improvements from earlier work
- Results on the hardware
- Future work and conclusions

- Small
- Two actuators - drive wheel, horizontal flywheel
- Arduino-like controller
- C++
- Performs no learning

- Takes data from the robot
- Applies methods from the Pilco toolbox (Matlab)
- Sends a new controller to the robot

Optimal control finds a policy for a system minimizing a cost function over a horizon

\[ \begin{alignat}{2} \text{find}&& \quad \pi^*(\bm{x}) &= \argmin_{\pi(\bm{x})} J(\bm{x}\tind\cdot, \bm{u}\tind\cdot) \quad \text{where} \quad J(\bm{x}\tind\cdot, \bm{u}\tind\cdot) = \class{c-orange}{\sum_{i=0}^{i=N}} \class{c-purple}{c(\bm{x}\tind{i}, \bm{u}\tind{i})}\quad \\ \label{eq:optimal} \text{st.}&& \quad \class{c-green}{\bm{x}\tind{i+1}} &\class{c-green}{= f(\bm{x}\tind{i}, \bm{u}\tind{i})} \\ \nonumber && \class{c-blue}{\bm{u}\tind{i}} &\class{c-blue}{= \pi(\bm{x}\tind{i})}\, \nonumber \end{alignat} \]Needs a system model — Pilco learns a probabilistic one using Gaussian process regression

\[ \begin{align} \bm{x}\tind{i + 1} - \bm{x}\tind{i} &= \class{c-cyan}{f_j(\bm{z}\tind{i}) \sim \mathrm{GP}(m_j(\bm{z}), K_j(\bm{z}_1, \bm{z}_2))}, & \bm{z}\tind{i} &= \begin{bmatrix}\bm{x}\tind{i} \\ \bm{u}\tind{i}\end{bmatrix} \end{align} \]
Advantages: data-efficient, immune to modelling mistakes

Scalar: $\class{c-green}{x} \sim \mathcal{N}(\mu, \sigma^2)$

Vector: $\class{c-green}{\begin{bmatrix}x_0 \\ x_1\end{bmatrix}} \sim \mathcal{N}(\mu, \Sigma)$

Function: $\class{c-green}{x(\cdot)}\sim \mathrm{GP}(\class{c-blue}{m(\cdot)}, K(\cdot,\cdot))$

Function: $\class{c-green}{\bm{x}\tind{i + 1} - \bm{x}\tind{i}}\sim \mathrm{GP}(\class{c-blue}{m(\cdot)}, K(\cdot,\cdot))$

- Incorrect use of Euler angles
- Incorrect integration of gyro — should be $\bm{q}\tind{t_2} = \exp \left(\tfrac{1}{2} \Delta t \bm{\omega}\right) \bm{q}\tind{t_1}$
- Integer overflow in encoder readings
`int16_t curr_enc, last_enc; int32_t total = 0;`

*setup*

`total += curr_enc - last_enc;`

*✘ Wrong + undefined behaviour*

`total += static_cast<int16_t>(curr_enc - last_enc);`

*✘ Undefined behavior*

`total += static_cast<int16_t>( static_cast<uint16_t>(curr_enc) - static_cast<uint16_t>(last_enc) );`

*✔*“When the compiler encounters [undefined behaviour] it is legal for it to make demons fly out of your nose”

^{[1]} - Algebraic errors in the loss function
- Use of contraint-violating trajectories to learn dynamics
- Loss function not scaled to small robot

Exposed metal

+
Loose fastening

=
Molten wires

+
Injury

- Always have a fuse or power switch
- Assume everything is a conductor unless designed not to be

Automate data transfer with the hardware ✔

Improve physical release procedure ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll
limit **?**

```
syntax = "proto3";
message DebugMessage {
string s = 1;
DebugLevel level = 2;
}
message RobotMessage {
oneof msg {
LogBundle log_bundle = 1;
DebugMessage debug = 2;
LogEntry single_log = 3;
}
}
```

Messages

**F1** **00** **D5**
,
**0F**
,
**C0** **FF** **EE**

→
Stream

**02** **F1** **02** **D5** **00**
**02** **0F** **00**
**04** **C0** **FF** **EE** **00**

Count and null-terminator bytes

Count and null-terminator bytes

Automate data transfer with the hardware ✔

Improve physical release procedure ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll
limit **?**

Gideon Praveen Kumar Jose Shelton. “Comparison between Auditory and Visual Simple
Reaction Times”. In: Neuroscience and Medicine 1.1 (2010)
Human visual — 330ms

Human audio — 280ms

Arduino switch release — *really fast!*

- Build a rig to drop perfectly
*✘ Impractical* - Learn the initial orientation
*✘ Must be online* - Measure the initial orientation
*✔ Use the accelerometer*

Automate data transfer with the hardware ✔

Improve physical release procedure ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll
limit **?**

Software-only tests of Pilco

Only have a model for the 1m unicycle

Approximations of physical parameters good enough

Automate data transfer with the hardware ✔

Measure initial robot orientation accelerometer ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll
limit **?**

\[
\theta_\text{max} =
\class{c-blue}{90°}
\]

Idealized

\[
\theta_\text{max} =
\class{c-blue}{90°},\quad
\class{c-yellow}{17°}
\]

Idealized,
Current

\[
\theta_\text{max} =
\class{c-blue}{90°},\quad
\class{c-orange}{45°},\quad
\class{c-yellow}{17°}
\]

Idealized,
Proposed design,
Current

\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2}
\right]\right)
\]

\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2}
\right]\right)
\]

\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2}
\right]\right)
\]

Before, showing distribution of predictions

\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2} + \class{c-purple}{\tfrac{\theta^2}{\theta_\text{max}^2}}
\right]\right)
\]

After, showing distribution of predictions

\[
\text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[
\tfrac{d(\mathbf x)^2}{h^2} +
\tfrac{\phi^2}{(4\pi)^2} + \class{c-purple}{\tfrac{\theta^2}{\theta_\text{max}^2}}
\right]\right)
\]

Before

With roll loss term

Automate data transfer with the hardware ✔

Measure initial robot orientation accelerometer ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll
limit **?**

- Designed and implemented a communication protocol (library published at packetio.readthedocs.io)
- Resolved extensive problems with the current software and hardware stack
- Achieved improved controller performance in simulation.
- Was ultimately unsuccessful in balancing the real robot

- Investigate why learning failed experimentally
- Use Automatic Differentation within Pilco
Deriving gradients manually
"restricts the [ML] community to only using computational structures we are capable of manually deriving gradients for"

Justin Domke. Automatic Differentiation: The most criminally underused tool in the po- tential machine learning toolbox? 2009-02-17 - Apply a quadratic controller

Areas are shaded where $\tau > 0$ and $\tau < 0$. $\tau_w > 0$ corresponds to a force driving the robot forwards, and $\tau_t > 0$ corresponds to a moment rotating the robot clockwise.

`https://eric-wieser.github.io/masters-presentation`

)