If you are a developer using MACE, you have a choice:
Bundling is faster but increases APK size by hundreds of megabytes. Most developers choose online compilation with aggressive caching.
The file is produced by the MACE compiler toolchain from: mace-cl-compiled-program.bin
// Enable profiling in MACE config.SetGPUPriority(mace::GPUPriority::HIGH); config.SetGPUProfiling(true);
// After inference double gpu_time_ms; engine->GetGPUProfilingTime(&gpu_time_ms);
So, mace-cl-compiled-program.bin is a pre-compiled OpenCL kernel binary that was generated by the MACE framework for a specific hardware GPU (e.g., an Adreno GPU in a Snapdragon chip, or a Mali GPU).
Despite being a performance feature, mace-cl-compiled-program.bin can break. If you are a developer using MACE, you have a choice:
Before understanding the file, you must understand the framework that creates it.
MACE (Mobile AI Compute Engine) is an open-source deep learning inference framework developed by Xiaomi. Unlike TensorFlow Lite or NNAPI, MACE was designed specifically for heterogeneous computing on mobile devices. It specializes in running neural networks (like image segmentation, speech recognition, or super-resolution) using the device's GPU or DSP. Bundling is faster but increases APK size by
MACE is famous for its ability to take a neural network model (usually a .pb or .tflite file) and execute it with low latency and low power consumption.
The compilation process may have included device-specific optimizations: vectorized loads, local memory usage, work-group sizing, and instruction reordering. These can make the model run 2-5x faster than generic OpenCL source.