Master Thesis

Learning to Land on Flexible Structures

Using reinforcement learning to train a drone to land on and maintain contact with flexible tree branches. The policy learns to handle varying branch stiffness through domain randomization, outputting thrust and attitude commands via a neural network controller.

Supervisors: Dr. Emanuele Aucone, Prof. Stefano Mintchev (ETH Zurich, ERL)
Examiner: Prof. Dimos Dimarogonas (KTH)

Simulation Demo

Drone landing on a flexible tree branch in simulation, maintaining contact
RL-trained drone landing on a flexible tree branch and maintaining stable contact (K=100)

Problem Setup

Simulation setup: drone above a 1.5m flexible branch at 2m height
Simulation environment — drone positioned above a 1.5m flexible branch, randomized initial position and branch stiffness
Flexible branch bending under different stiffness parameters
Branch modeled with PD-controlled joints — stiffness varied across training episodes

Method

RL training pipeline: SAC/PPO with simulation environment, reward function, and attitude controller
Training pipeline — PPO/SAC agent outputs thrust and reference attitude, low-level PD controller tracks commands at 48Hz
Policy network: 4-layer MLP with 256-256-256-128 tanh nodes, inputs are position, quaternion, velocity, force
Policy network — 4-layer MLP [256, 256, 256, 128] with tanh activation. Inputs: position, attitude, velocity, external force. Outputs: thrust + reference attitude

Training

Training curves: PPO converges in 14h, SAC converges faster in 6h
Training curves — SAC converges in ~6h (reward ~0.85), PPO in ~14h. Both using Stable-Baselines3

Evaluation — Different Branch Stiffness

Drone landing on very soft branch (K=1)
K = 1 — very soft branch
Drone landing on soft branch (K=10)
K = 10 — soft branch
Drone landing on medium stiffness branch (K=100)
K = 100 — medium stiffness
Drone landing on stiff branch (K=1000)
K = 1000 — stiff branch
Success rate heatmap showing high success across different initial positions
Success rate across initial positions (PPO, K=1) — dot position = initial XY, color = success rate

Contributions

← Back to all projects