Machine learning for a miniature robotic unicycle

Eric Wieser (efw27@)

MEng. Project 2016-2017
Supervised by Prof. Carl Rasmussen (cer54@)

Project goals

Balance this unicycle

Fix problems identified in previous work

Review source code and fix bugs found

Improve the tools to make similar projects easier in future

Outline

System overview
Summary of the method, Pilco
Errors in earlier work
Suggested improvements from earlier work
Results on the hardware
Future work and conclusions

System overview

Robot

Small
Two actuators - drive wheel, horizontal flywheel
Arduino-like controller
C++
Performs no learning

PC

Takes data from the robot
Applies methods from the Pilco toolbox (Matlab)
Sends a new controller to the robot

PilcoProbabilistic Inference for Learning Control

Optimal control finds a policy for a system minimizing a cost function over a horizon

\[ \begin{alignat}{2} \text{find}&& \quad \pi^*(\bm{x}) &= \argmin_{\pi(\bm{x})} J(\bm{x}\tind\cdot, \bm{u}\tind\cdot) \quad \text{where} \quad J(\bm{x}\tind\cdot, \bm{u}\tind\cdot) = \class{c-orange}{\sum_{i=0}^{i=N}} \class{c-purple}{c(\bm{x}\tind{i}, \bm{u}\tind{i})}\quad \\ \label{eq:optimal} \text{st.}&& \quad \class{c-green}{\bm{x}\tind{i+1}} &\class{c-green}{= f(\bm{x}\tind{i}, \bm{u}\tind{i})} \\ \nonumber && \class{c-blue}{\bm{u}\tind{i}} &\class{c-blue}{= \pi(\bm{x}\tind{i})}\, \nonumber \end{alignat} \]

Needs a system model — Pilco learns a probabilistic one using Gaussian process regression

\[ \begin{align} \bm{x}\tind{i + 1} - \bm{x}\tind{i} &= \class{c-cyan}{f_j(\bm{z}\tind{i}) \sim \mathrm{GP}(m_j(\bm{z}), K_j(\bm{z}_1, \bm{z}_2))}, & \bm{z}\tind{i} &= \begin{bmatrix}\bm{x}\tind{i} \\ \bm{u}\tind{i}\end{bmatrix} \end{align} \]

Advantages: data-efficient, immune to modelling mistakes

Gaussian processes

Scalar: $\class{c-green}{x} \sim \mathcal{N}(\mu, \sigma^2)$

Vector: $\class{c-green}{\begin{bmatrix}x_0 \\ x_1\end{bmatrix}} \sim \mathcal{N}(\mu, \Sigma)$

Function: $\class{c-green}{x(\cdot)}\sim \mathrm{GP}(\class{c-blue}{m(\cdot)}, K(\cdot,\cdot))$

Function: $\class{c-green}{\bm{x}\tind{i + 1} - \bm{x}\tind{i}}\sim \mathrm{GP}(\class{c-blue}{m(\cdot)}, K(\cdot,\cdot))$

Problems in inherited work

Software Problems

Incorrect use of Euler angles
Incorrect integration of gyro — should be $\bm{q}\tind{t_2} = \exp \left(\tfrac{1}{2} \Delta t \bm{\omega}\right) \bm{q}\tind{t_1}$

Integer overflow in encoder readings


											int16_t curr_enc, last_enc;
											int32_t total = 0;

setup


											total += curr_enc - last_enc;

✘ Wrong + undefined behaviour


											total += static_cast<int16_t>(curr_enc - last_enc);

✘ Undefined behavior


											total += static_cast<int16_t>(
												static_cast<uint16_t>(curr_enc) -
												static_cast<uint16_t>(last_enc)
											);

✔

“When the compiler encounters [undefined behaviour] it is legal for it to make demons fly out of your nose”^[1]

Algebraic errors in the loss function
Use of contraint-violating trajectories to learn dynamics
Loss function not scaled to small robot

Electrical Problems

Exposed metal

Loose fastening

Molten wires

Injury

Lessons

Always have a fuse or power switch
Assume everything is a conductor unless designed not to be

Suggested Improvements

Automate data transfer with the hardware ✔

Improve physical release procedure ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll limit ?

Data transfer

Application layer Protocol buffers


								syntax = "proto3";
								message DebugMessage {
									string s = 1;
									DebugLevel level = 2;
								}
								message RobotMessage {
									oneof msg {
										LogBundle log_bundle = 1;
										DebugMessage debug = 2;
										LogEntry single_log = 3;
									}
								}

Framing layer COBS (Consistent Overhead Byte Stuffing)

Messages
F1 00 D5 , 0F , C0 FF EE

→

Stream
02 F1 02 D5 00 02 0F 00 04 C0 FF EE 00
Count and null-terminator bytes

Suggested Improvements

Automate data transfer with the hardware ✔

Improve physical release procedure ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll limit ?

Physical release procedure

Reaction times

Gideon Praveen Kumar Jose Shelton. “Comparison between Auditory and Visual Simple Reaction Times”. In: Neuroscience and Medicine 1.1 (2010) Human visual — 330ms

Human audio — 280ms

Arduino switch release — really fast!

Repeatability of orientation

Build a rig to drop perfectly ✘ Impractical
Learn the initial orientation ✘ Must be online
Measure the initial orientation ✔ Use the accelerometer

Suggested Improvements

Automate data transfer with the hardware ✔

Improve physical release procedure ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll limit ?

Simulation models

Software-only tests of Pilco

Only have a model for the 1m unicycle

Approximations of physical parameters good enough

Suggested Improvements

Automate data transfer with the hardware ✔

Measure initial robot orientation accelerometer ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll limit ?

Effect of roll limit

\[ \theta_\text{max} = \class{c-blue}{90°} \]

Idealized

\[ \theta_\text{max} = \class{c-blue}{90°},\quad \class{c-yellow}{17°} \]

Idealized, Current

\[ \theta_\text{max} = \class{c-blue}{90°},\quad \class{c-orange}{45°},\quad \class{c-yellow}{17°} \]

Idealized, Proposed design, Current

\[ \text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[ \tfrac{d(\mathbf x)^2}{h^2} + \tfrac{\phi^2}{(4\pi)^2} \right]\right) \]

Before, showing distribution of predictions

\[ \text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[ \tfrac{d(\mathbf x)^2}{h^2} + \tfrac{\phi^2}{(4\pi)^2} + \class{c-purple}{\tfrac{\theta^2}{\theta_\text{max}^2}} \right]\right) \]

After, showing distribution of predictions

\[ \text{Loss}(\mathbf{x}) = 1 - \exp\left(-\tfrac{1}{2}\left[ \tfrac{d(\mathbf x)^2}{h^2} + \tfrac{\phi^2}{(4\pi)^2} + \class{c-purple}{\tfrac{\theta^2}{\theta_\text{max}^2}} \right]\right) \]

Before

With roll loss term

Suggested Improvements

Automate data transfer with the hardware ✔

Measure initial robot orientation accelerometer ✔

Add simulation model of the small unicycle ✔

Redesign the hardware to increase the roll limit ?

✘ Improvement is small after fixing the loss function

Hardware results

Summary

Designed and implemented a communication protocol (library published at packetio.readthedocs.io)
Resolved extensive problems with the current software and hardware stack
Achieved improved controller performance in simulation.
Was ultimately unsuccessful in balancing the real robot

Future work

Investigate why learning failed experimentally
Use Automatic Differentation within Pilco
Deriving gradients manually "restricts the [ML] community to only using computational structures we are capable of manually deriving gradients for"
Justin Domke. Automatic Differentiation: The most criminally underused tool in the po- tential machine learning toolbox? 2009-02-17
Apply a quadratic controller

Quadratic controller

Areas are shaded where $\tau > 0$ and $\tau < 0$. $\tau_w > 0$ corresponds to a force driving the robot forwards, and $\tau_t > 0$ corresponds to a moment rotating the robot clockwise.

Links to resources

Embedded code Matlab code Final report This presentation (https://eric-wieser.github.io/masters-presentation)

Machine learning for a miniature robotic unicycle

Project goals

Outline

System overview

Robot

PC

PilcoProbabilistic Inference for Learning Control

Gaussian processes

Problems in inherited work

Software Problems

Electrical Problems

Lessons

Suggested Improvements

Data transfer

Application layer Protocol buffers

Framing layer COBS (Consistent Overhead Byte Stuffing)

Suggested Improvements

Physical release procedure

Reaction times

Repeatability of orientation

Suggested Improvements

Simulation models

Suggested Improvements

Effect of roll limit

Suggested Improvements

Hardware results

Summary

Future work

Quadratic controller

Links to resources

Questions?