Loading...
Please wait, while we are loading the content...
Using ICD-9 diagnostic codes for external validation of topic models derived from primary care electronic medical record clinical text data
| Content Provider | SAGE Publishing |
|---|---|
| Author | Meaney, Christopher Escobar, Michael Stukel, Therese A. Austin, Peter C. Kalia, Sumeet Aliarzadeh, Babak Moineddin, Rahim Greiver, Michelle |
| Copyright Year | 2023 |
| Abstract | Background/Objectives: Unsupervised topic models are often used to facilitate improved understanding of large unstructured clinical text datasets. In this study we investigated how ICD-9 diagnostic codes, collected alongside clinical text data, could be used to establish concurrent-, convergent- and discriminant-validity of learned topic models. Design/Setting: Retrospective open cohort design. Data were collected from primary care clinics located in Toronto, Canada between 01/01/2017 through 12/31/2020. Methods: We fit a non-negative matrix factorization topic model, with K = 50 latent topics/themes, to our input document term matrix (DTM). We estimated the magnitude of association between each Boolean-valued ICD-9 diagnostic code and each continuous latent topical vector. We identified ICD-9 diagnostic codes most strongly associated with each latent topical vector; and qualitatively interpreted how these codes could be used for external validation of the learned topic model. Results: The DTM consisted of 382,666 documents and 2210 words/tokens. We correlated concurrently assigned ICD-9 diagnostic codes with learned topical vectors, and observed semantic agreement for a subset of latent constructs (e.g. conditions of the breast, disorders of the female genital tract, respiratory disease, viral infection, eye/ear/nose/throat conditions, conditions of the urinary system, and dermatological conditions, etc.). Conclusions: When fitting topic models to clinical text corpora, researchers can leverage contemporaneously collected electronic medical record data to investigate the external validity of fitted latent variable models. |
| Related Links | https://journals.sagepub.com/doi/pdf/10.1177/14604582221115667?download=true |
| ISSN | 14604582 |
| Issue Number | 1 |
| Volume Number | 29 |
| Journal | Health Informatics Journal (JHI) |
| e-ISSN | 17412811 |
| DOI | 10.1177/14604582221115667 |
| Language | English |
| Publisher | Sage Publications UK |
| Publisher Date | 2023-01-13 |
| Publisher Place | London |
| Access Restriction | Open |
| Rights Holder | © The Author(s) 2023 |
| Subject Keyword | clinical text data convergent validity ICD-9 codes concurrent validity topic model discriminant validity electronic medical record non-negative matrix factorization external validation |
| Content Type | Text |
| Resource Type | Article |
| Subject | Health Informatics |