This project aims to use computer vision to track a normally hidden resource. In Clash Royale, you place troops, spells, and towers to defend your own King Tower and attack your enemies. The resource used to place these towers is called elixir. It is generated at a steady rate and consumed when deploying troops. A core part of this game is that, at any given moment, you do not know the exact elixir count of your opponent. Instead, you infer the quantity based on the timing of their placements and the cost of each unit.
As a computer scientist, this mental calculation can be automated! Thus, I introduce the Clash Royale Elixir Tracker, a project designed to enhance my model training capabilities using PyTorch. The proposed program will take a series of images as input and output the current elixir range (the range of possible elixir values). The data will pass through an initial CNN model that identifies a general area where a troop was deployed. This area will then be processed and passed through a second CNN that outputs class predictions. Based on these predictions, the proposed elixir range will be updated.
Currently, the data pipeline has been initialized. Much of my work has focused on optimizing how to process video data efficiently, as this type of data is extremely memory-intensive. For training, I recorded and labeled gameplay, storing it as an MP4 file. You might assume that for training you could simply store all frames in a single large NumPy array, but I would strongly caution against it—my own naive attempt turned a 3-minute video into 60GB. After experimenting with different approaches, such as seeking frames on demand, I found a nearly optimal storage method: I stored each frame individually but first heavily condensed it, reducing each video to roughly 2GB.
I am excited to continue training and I will continue to post results!