MLSys Seminar 80: ML for ML Compilers

Dustin Zubke
2 min readOct 11, 2023

I want to start writing more about machine learning, so today I will be writing some notes from the MLSys Seminar lecture on ML for ML Compilers (youtube, slides). I am summarizing the lecture, so none of these ideas are my own.

The lecture describes Mangpo Phothilimthana’s work as a staff research scientist at Google DeepMind. Compilers convert high-level programming code into machine code to run on computer hardware. These compilers use various heuristics to make decisions on how to combine various operations most efficiently. These heuristics must be fast to compute because compilation must happen rather quickly, but may not always be fully optimized.

Mangpo’s work is to use machine learning to optimize these heuristics for the compilation of ML workflows. As the slides state, the goal is to “automatically select optimal compiler configurations, at scale for all ML workloads in Google’s fleet”, where the fleet is the data centers of Tensor Processing Units (TPU’s — Google proprietary hardware accelerators) that run Google’s ML workflows.

Slide 7 shows a subset of a computational graph and how the compiler will fuse different operations.

Fusing various operations can optimize the workflow because the intermediate results between each operation do not need to be saved to memory thus reducing memory use and IO time. However, this fusion consumes more compute resources, so the fusion essentially trades memory & IO for computation.

Mangpo’s work uses an Autotuner to aid compilation by finding better optimization decisions by searching a space of different configurations and selects the best configuration according to a performance metric.

The learned policy model labels each node of the computation graph with a zero or one depending on whether that node should be fused with it’s consumer node. Then a second learned cost model will estimate the runtime of the predicted configuration, whether the desired state is having the minimum cost configuration. In this way, machine learning is used find better compilation graphs.

Mangpo’s estimates that this approach saves between 2–10% of the computation time across Google’s fleet, which is considerable.

--

--

Dustin Zubke

Off-Grid Correspondent for PV Magazine. Currently in Uganda.