Moviesmobilenet Patched May 2026

For a frame I_t of size H×W:

This yields a tensor of shape [T, P, 3, 128, 128]. moviesmobilenet patched

Author: AI Research Lab Date: April 12, 2026 For a frame I_t of size H×W :

The proliferation of streaming services necessitates robust automatic movie genre classification. While 3D Convolutional Neural Networks (3D CNNs) and Video Transformers achieve high accuracy, they are computationally prohibitive for real-time or edge applications. This paper introduces MovieSMobileNet, a novel architecture that marries a patched frame sampling strategy with a modified MobileNetV3 backbone. By dividing each frame into spatial patches and applying a temporal attention mechanism across patch sequences, MovieSMobileNet captures both local textures and short-term motion cues without 3D convolutions. Experimental results on the MMAct and a subset of MovieNet show that our patched approach improves F1-score by 4.2% over standard frame aggregation, achieving 89.1% accuracy with only 5.2M parameters and 1.8 GFLOPs—suitable for mobile deployment. This yields a tensor of shape [T, P, 3, 128, 128]

Given a movie clip of T frames (e.g., T=16), each frame is split into N×N patches (e.g., 16x16 pixels). Each patch is normalized and passed through a shared MobileNetV3-small backbone to extract a feature vector. Then, a Temporal Patch Attention (TPA) layer learns which patches change meaningfully over time. Finally, a classifier outputs genre probabilities.