• Annals of family medicine · Apr 2022

    Using artificial intelligence to support rapid, mixed-methods analysis: Developing an automated qualitative assistant (AQUA).

    • Robert Lennon, William Calo, Erin Miller, Aleksandra Zgierska, Lauren Van Scoy, and Robert Fraleigh.
    • Ann Fam Med. 2022 Apr 1; 20 (20 Suppl 1).

    AbstractContext: Qualitative research - crucial for understanding human behavior - remains underutilized, in part due to the time and cost of annotating qualitative data (coding). Artificial intelligence (AI) has been suggested as a means to reduce those burdens. Older AI techniques (Latent Semantic Indexing / Latent Dirichlet Allocation (LSI/LDA)) have fallen short, in part because qualitative data is rife with idiom, non-standard expressions, and jargon. Objective: To develop an AI platform using updated techniques to augment qualitative data coding. Study Design and Analysis: We previously completed traditional qualitative analysis of a large dataset, with 11 qualitative categories and 72 subcategories (categories), and a final Cohen's kappa ≥ 0.65 as a measure of human inter-coder reliability (ICR) after coding. We built our Automated Qualitative Assistant (AQUA) using a semi-classical approach, replacing LSI/LDA with a graph-theoretic topic extraction and clustering method. AQUA was given the previously-identified qualitative categories and tasked with coding free-text data into those categories. Item coding was scored using cosine-similarity. Population Studied: Pennsylvanian adults. Instrument: Free-text responses to five open ended questions related to the COVID-19 pandemic (e.g. "What worries you most about the COVID-19 pandemic?"). Outcome Measures: AQUA's coding was compared to human coding using Cohen's kappa. This was done on all categories in aggregate, and also on category clusters to identify category groups amenable to AQUA support. AQUA's time to complete coding was compared to the time taken by the human coding team. Dataset: Five unlimited free-text survey answers from 538 responders. Results: AQUA's kappa for all categories was low (kappa~0.45), reflecting the challenge of automated analysis of diverse language. However, for several 3-category combinations (with less linguistic diversity), AQUA performed comparably to human coders, with an ICR kappa range of 0.62 to 0.72 based on test-train split. AQUA's analysis (including human interpretation) took approximately 5 hours, compared to approximately 30 person hours for traditional coding. Conclusions: AQUA enables qualitative researchers to identify categories amenable to automated coding, and to rapidly conduct that coding on the entirety of very large datasets. This saves time and money, and avoids limitations inherent in limiting qualitative analysis to limited samples of a given dataset.© 2021 Annals of Family Medicine, Inc.

      Pubmed     Copy Citation     Plaintext  

      Add institutional full text...

    Notes

     
    Knowledge, pearl, summary or comment to share?
    300 characters remaining
    help        
    You can also include formatting, links, images and footnotes in your notes
    • Simple formatting can be added to notes, such as *italics*, _underline_ or **bold**.
    • Superscript can be denoted by <sup>text</sup> and subscript <sub>text</sub>.
    • Numbered or bulleted lists can be created using either numbered lines 1. 2. 3., hyphens - or asterisks *.
    • Links can be included with: [my link to pubmed](http://pubmed.com)
    • Images can be included with: ![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")
    • For footnotes use [^1](This is a footnote.) inline.
    • Or use an inline reference [^1] to refer to a longer footnote elseweher in the document [^1]: This is a long footnote..

    hide…