This paper explores in depth the suitability of the 2012 Automated Student Assessment Prize (ASAP) contest's essay datasets. It evaluates the potential of deep learning and state-of-the-art NLP tools in automated essay scoring (AES) to predict not only holistic scores but also the finer-grained rubric scores, an area underexplored but essential to provision formative feedback and uncover the AI reasoning behind AES. For comparison purpose, this paper advocates the need for transparency when sharing AES processes and outcomes. Finally, it reveals the insufficiency of ASAP essay datasets to train generalizable AES models by examining the distributions of holistic and rubric scores. Findings show that the strength of agreement between human and machine graders on holistic scores does not translate into similar strength on rubric scores and that the learning made by the machine barely exceeds the performance of a naïve predictor.