Natural Language Processing and Psychosis: On the Need for Comprehensive Psychometric Evaluation
AuthorCohen, Alex S.; Rodriguez, Zachary; Warren, Kiara K.; Cowan, Tovah; Masucci, Michael D.; Edvard Granrud, Ole; Holmlund, Terje Bektesevic; Chandler, Chelsea; Foltz, Peter W.; Strauss, Gregory P.
Background and Hypothesis: Despite decades of “proof of concept” findings supporting the use of Natural Language Processing (NLP) in psychosis research, clinical implementation has been slow. One obstacle reflects the lack of comprehensive psychometric evaluation of these measures. There is overwhelming evidence that criterion and content validity can be achieved for many purposes, particularly using machine learning procedures. However, there has been very little evaluation of test-retest reliability, divergent validity (sufficient to address concerns of a “generalized deficit”), and potential biases from demographics and other individual differences.
Study Design: This article highlights these concerns in development of an NLP measure for tracking clinically rated paranoia from video “selfies” recorded from smartphone devices. Patients with schizophrenia or bipolar disorder were recruited and tracked over a week-long epoch. A small NLP-based feature set from 499 language samples were modeled on clinically rated paranoia using regularized regression.
Study Results: While test–retest reliability was high, criterion, and convergent/divergent validity were only achieved when considering moderating variables, notably whether a patient was away from home, around strangers, or alone at the time of the recording. Moreover, there were systematic racial and sex biases in the model, in part, reflecting whether patients submitted videos when they were away from home, around strangers, or alone.
Conclusions: Advancing NLP measures for psychosis will require deliberate consideration of test-retest reliability, divergent validity, systematic biases and the potential role of moderators. In our example, a comprehensive psychometric evaluation revealed clear strengths and weaknesses that can be systematically addressed in future research.