Master Thesis

Learning to Land on Flexible Structures

Using reinforcement learning to train a drone to land on and maintain contact with flexible tree branches. The policy learns to handle varying branch stiffness through domain randomization, outputting thrust and attitude commands via a neural network controller.

Supervisors: Dr. Emanuele Aucone, Prof. Stefano Mintchev (ETH Zurich, ERL)
Examiner: Prof. Dimos Dimarogonas (KTH)

Simulation Demo

Drone landing on a flexible tree branch in simulation, maintaining contact

RL-trained drone landing on a flexible tree branch and maintaining stable contact (K=100)

Problem Setup

Simulation setup: drone above a 1.5m flexible branch at 2m height

Simulation environment — drone positioned above a 1.5m flexible branch, randomized initial position and branch stiffness

Flexible branch bending under different stiffness parameters

Branch modeled with PD-controlled joints — stiffness varied across training episodes

Goal: land a drone on flexible tree branches and maintain stable contact — for environmental monitoring and sample collection
Challenge: the branch deforms upon contact, requiring the drone to adapt its thrust and attitude in real time
Domain randomization: branch stiffness, initial drone position, and branch orientation randomized during training

Method

RL training pipeline: SAC/PPO with simulation environment, reward function, and attitude controller

Training pipeline — PPO/SAC agent outputs thrust and reference attitude, low-level PD controller tracks commands at 48Hz

Policy network: 4-layer MLP with 256-256-256-128 tanh nodes, inputs are position, quaternion, velocity, force

Policy network — 4-layer MLP [256, 256, 256, 128] with tanh activation. Inputs: position, attitude, velocity, external force. Outputs: thrust + reference attitude

Training

Training curves — SAC converges in ~6h (reward ~0.85), PPO in ~14h. Both using Stable-Baselines3

Evaluation — Different Branch Stiffness

K = 1 — very soft branch

K = 10 — soft branch

Drone landing on medium stiffness branch (K=100)

K = 100 — medium stiffness

K = 1000 — stiff branch

Success rate heatmap showing high success across different initial positions

Success rate across initial positions (PPO, K=1) — dot position = initial XY, color = success rate

Contributions

Built an RL framework for training aerial physical interaction with flexible structures
Demonstrated that RL can control a drone to land on and maintain contact with deformable tree branches
Compared PPO and SAC — SAC converges faster and achieves higher reward
Evaluated across four branch stiffness levels (K = 1, 10, 100, 1000) with 100 randomized trials each
Defined quantitative metrics: success rate, stability, contact force, and z-displacement

← Back to all projects