A five-mode evaluation framework that decomposes VLM failures into perception, reasoning, and language-prior components across Biology and Chemistry visual question answering.