I am a staff software engineer at Google DeepMind working on infrastructure for large-scale training of machine learning models. Before joining Google DeepMind, I was postdoc in the RISE lab at Berkeley where I worked on building a platform for developing more predictable and reliable autonomous vehicles. My research interests lie in developing systems for self-driving cars and robots, distributed systems, operating systems, and large scale data processing.

Before joining Berkeley, I spent my time redesigning the software stack for warehouse scale data centres as part of the Systems Research Group at University of Cambridge. In my research there, I designed and built systems that aimed to increase cluster utilization without affecting performance, as well as easy-to-use large-scale data processing solutions. For my research at Cambridge I received a Google Fellowship in Distributed Systems and an NSDI Best Paper Award. Previously, I graduated with an MEng degree in Computing and Software Engineering from Imperial College London where I was awarded the Microsoft Research Prize for an outstanding master thesis.

In the past, I was fortunate to work on large-scale distributed systems in the cluster management team at Google, in the Data Infrastructure team at Facebook, and at Microsoft Research.



Publications · Google Scholar

D3: A Dynamic Deadline-Driven Approach for Building Autonomous Vehicles [code]
Context-Aware Streaming Perception in Dynamic Environments
Falkirk Wheel: Rollback Recovery for Dataflow Systems
Pylot: A Modular Platform for Exploring Latency-Accuracy Tradeoffs in Autonomous Vehicles [code][website]
Firmament: fast, centralized cluster scheduling at scale [code]
Musketeer: All for One, One for all in Data Processing Systems [code][website]
Queues don't matter if you can JUMP them! [code][website]
Broom: sweeping out Garbage Collection from Big Data systems



Selected talks

Firmament: fast, centralized cluster scheduling at scale. Berkeley RISELab Seminar 2017
Firmament: fast, centralized cluster scheduling at scale. OSDI 2016
Understanding cluster schedulers -- and why you'll want a better one. ContainerDays 2016
Musketeer: all for one, one for all in data processing systems. EuroSys 2015
Broom: sweeping out Garbage Collection from Big Data Systems. HotOS 2015
High-quality, flexible and scalable scheduling with Firmament. ContainerSched 2015



Service

PC member: USENIX ATC 2021, ERC for ASPLOS 2021, ACM SoCC 2020, ACM EdgeSys 2020, ACM EdgeSys 2019.