We present a method to aggregate four different facial cues to help identify distraction among online learners: facial emotion detection, micro-sleep tracking, yawn detection, and iris distraction detection. In our proposed method, the first module identifies facial emotions using both 2D and 3D convolutional neural networks (CNNs) which facilitates comparison between spatiotemporal and solely spatial features. The other three modules use a 3D facial mesh to localize the eye and lip coordinates in order to track a student’s facial landmarks and identify iris positions as well as signs of micro-sleep like yawns or drowsiness. The results from each module are combined to form an all-encompassing label displayed on an integrated user interface that can further be used to provide real-time alerts to students and instructors when required. From our experiments, the emotion, micro-sleep, yawn, and iris monitoring modules individually achieved 72.5%, 95%, 97%, and 93% accuracy scores, respectively.