Implementation of Generative AI for Qualitative Data Analysis in Canvs and ATLAS.tiApr 22, 2023
In this article, I give you a glimpse into one successful and one disappointing application of generative AI in supporting qualitative data analysis.
The first example I am presenting is the software Canvs. They describe the product themselves as:
‘… an AI-powered insights platform that turns open-ended text from surveys, product reviews, and social media into actionable knowledge.’
What's important in this exploration of AI usage is the type of data typically analyzed with Canvs and optimised for which it was: open-ended questions from a survey, product reviews, and social media data. These are unstructured qualitative data but unlike qualitative interviews, short, self-contained data segments per respondent.
The second thing that you need to know about Canvs is that they didn't just jump on the bandwagon since chat GPT came along. They have been using machine learning models for ten years and constantly evolved and improved their models. They have developed the world’s largest emotional English language ontology consisting of trillions of expressions to understand emotional reactions. Canvs also knows specific vocabularies of different industries. So, when you enter data, you can select the industry where your data is coming from, like the finance sector, consumer goods sector, health sector, etc.
Below, you see the raw data of two example studies. In the first study, people were asked about their eating-out experience in a restaurant. One open-ended question asks respondents: ‘How was your experience’. Plus, the data contains variables like the day of the week they were eating out, the location and the NPS score they gave to the restaurant. Based on the score, the respondents were divided into two groups: detractors and promoters.
Figure 1: Raw data restaurant experience
In the second data set, respondents were asked about their favourite ice cream flavour and why it is their favourite. Also, here you see additional quantitative information like gender and age.
Figure 2: Sample study about favourite ice cream flavours.
The first step of analysis with canvs
After importing the data to canvs, you see the analysis results in the dashboard. The top part gives you an overview of how many respondents there are and how much of that data went into producing the summaries, the topic and emotion coding. Then a few insights are highlighted based on the NPS score. The emotional insights are in orange; the content-based insights are in blue.
If you are interested in how emotional insights are generated, you find more information on the canvs website.
Figure 3: Canvs dashboard
Let’s look at the summary coding:
What you should note here is the software’s ability to aggregate data. It is not producing hundreds or thousands of codes.
Figure 4: Summary coding in canvs
Canvs features a section labelled ‘Topics,’ which contains extracted keywords and their synonyms derived from the text. Topics are different from codes. While understanding participants' words is beneficial, an effective code must be more conceptual.
Figure 5: Identified keywords and their distribution in canvs
Other programs discussed elsewhere use topics generated in this way to establish coding foundations, and this is where things go wrong. If you stay this close to the data with your ‘coding’, you fall into what is referred to as the coding trap or code swamp. This leads to the creation of hundreds or even thousands of codes, ultimately resulting in drowning in a sea of codes.
When teaching aspiring qualitative researchers, I use the N-C-T model to explain the process of qualitative data analysis and coding. The letters in the model stand for the following: N stands for Noticing, C for Collecting and T for Thinking. Learn more.
Topics are the components of text that software can notice. When coding data, it is crucial to collect similar occurrences of the identified topics and assign a name that is broader in scope. Failure to do so can lead to getting lost in the above-mentioned coding swamp and drowning.
Canvs is equipped with the capability to produce more abstract codes. It allows you to examine both the topics and codes assigned to each response and allows you to review and modify them.
Figure 6: List of codes (left-hand side) and view of coded and tagged data (right-hand side).
Users may manually review and revise the generated codes in cases with only 120 responses, as shown in the example above. However, it raises the question of whether this practice continues when dealing with a larger volume of responses, say 1000 or more. I guess that users then tend to rely on the codes produced by the machine unless they come across anything that appears highly improbable.
The dashboard gives you an idea of what’s in the data, but you still don’t know any of the specifics. This is where generative AI comes in.
Putting generative AI to good use
Canvs has just released their new helper: AI Story Assist. It will give you a written summary of all responses to a question.
Figure 7: An example of a summary written by AI Story Assist
You can set filters if you want to drill down to respondent groups or only responses that contain certain words. AI Story Assist will then generate a new overview of all responses in the subset:
Figure 8: Working with filters.
And yet another feature is to ask AI Story Assis questions about your data like:
- Did people mention nostalgia?
- Which ice cream would you recommend launching for lovers of nostalgia?
In the figure below, you find the answers:
Figure 9: Answers to questions formulated by analysts
This is what I call a useful application of generative AI and the new language models that became available through it. I had written earlier about the effectiveness of chat GPT for summarizing coded qualitative data content. Canvs.ai puts it into action.
Recognizing the boundaries
Can generative AI replace the coding of qualitative data?
You can watch a video showing all the tests I have been running here.
In a previous blog post, I explored how ATLAS.ti and NVivo use machine learning to identify concepts and topics in data, similar to Canvs' topic identification. But as we have seen, topics are not a very useful base for codes.
ATLAS.ti recently introduced an update featuring generative AI-assisted coding, which promises to deliver qualitative insights in mere minutes instead of weeks. Their confidence in this groundbreaking feature is evident as the company's website now exclusively highlights the AI coding functionality, ignoring other great features the software offers.
This is what they promise:
Figure 10: The ATLAS.ti promise
In the following discussion, I will explore each of these promises concerning the AI feature in depth. I want to clarify that this analysis focuses solely on this specific feature. I have a great appreciation for the ATLAS.ti software, having used it for nearly 30 years and witnessing its progression from version 4 to version 23. I have authored numerous books, book chapters, and articles about this software.
As someone well-versed in ATLAS.ti, its functionality, and qualitative data analysis, I have gained valuable insights and experience in this field. When ATLAS.ti introduced this new feature, I was both hopeful and intrigued by their approach to incorporating AI. Thus, I decided to test it from a user's perspective, utilizing various projects I had previously analyzed by manually applying codes. Here's what I found:
Promise No. 1
10x Analysis Speed
The initial pledge is a tenfold boost in analysis speed, achieved through fully automated coding, allowing analysts to dedicate more time to refining and scrutinizing the data.
The software can indeed attach codes to your data within minutes. Coding a 1-hour interview manually takes approximately three hours, making this a significant time-saver. But does this really increase analysis speed?
To find out, I tested the software on several datasets and in the following, I will walk you through some examples.
Displayed below are 41 codes derived from manually coding a 1-hour interview, which explores the wartime experiences of a Vietnam veteran.
Figure 11: Codes generated after manually working with one interview
The AI tool generated 142 codes for this interview, aggregated into six categories:
Figure 12: AI-generated codes for the same interview
The AI seems to identify the central themes in the interview.
The ‘Emotional coping’ category appears to be similar to the ‘dealing with’ codes in the manually generated code system.
The 'Peacefulness' category refers to the peace demonstrations and anti-war protests of that era, which I also coded.
As I know the data, I know the stories that are behind the category ‘Personal development’. It is captured in my codes under labels like maturation or self-aware.
Behind each of the AI-generated categories are about 20 ‘subcodes’. Most of them have only been applied one time. To understand what emotional coping or personal development means, I must go into the data and read them. Just knowing that the respondent talks about personal development is not sufficient. Why did that happen during the war? Or was it afterwards? And what type of development was the respondent talking about?
The interviewee discusses a period in the 1960s, describing his affectionate, nationalistic, and God-fearing family. By the way, this was categorized by the AI tool under 'military experience.' He also mentions departing from his family when he was 16 years old, which may seem perplexing, given his previous account of a warm family environment with solid family ties.
As the story unfolds, it becomes apparent that he confronted his homosexuality while serving in Vietnam. He suggests that the newfound liberty might have facilitated this, the looming uncertainty of the future, and the feeling that there may not be a tomorrow.
A few additional aspects led him to summarize his time in Vietnam as a maturational process. Just looking at a bunch of codes, I won’t know what the story is all about:
Figure 13: List of codes subsumed under the 'Personal development’ category.
Keep in mind that a 10x increase in speed is promised here. Navigating through the codes leads me on a winding journey across the dataset. For example, the two data segments categorized under 'Acceptance' are located in paragraphs 41 and 153, causing me to lose track of the narrative. It becomes increasingly challenging to comprehend the situation and determine proper coding when examining isolated data segments without a broader context. This issue is further exacerbated when coding multiple interviews as you bounce between various interviews and narratives.
Let's examine the 'Historical Violence' category, as its relevance to the interview topic was initially unclear to me. Upon closer inspection, it becomes evident that this category encompasses elements that I had coded under 'War Experience':
Figure 14: Subcodes of historical violence
The 'Negative historic event' refers to a significant revelation the interviewee experienced several years after returning from the war, which altered his perspective on the government and the conflict itself. This crucial turning point in the data would not typically be categorized as part of 'war experience'.
Let's examine the codes subsumed under 'Socio-Political Issues':
Figure 15: AI-generated subcodes for socio-political issues
Viewed from a societal perspective, alcohol abuse can be considered a social-political issue, which AI is capable of recognising. However, in the context of the interview, it is used by the respondent as a coping mechanism. To classify it accurately, also here one needs to understand the broader context beyond the single data segment.
Some of the other codes could fall under socio-political issues, such as inequality, prejudice, discrimination, or social injustice. But many others are clearly misplaced in this category. Additionally, there is a lack of aggregation. Take a look at the number behind each code. It indicates how often the code was applied.
The example above represents a common issue that I have seen across the data sets I used for testing.
One could also argue that it is too early to categorise the data after just coding one interview, which is true. You can see from my manually generated coding list that I had not yet formed categories either.
When you code data manually, you review your list of codes and the data behind them after coding one or two interviews. The goal is to ‘clean the kitchen while you cook’ to avoid ending up with hundreds of codes to sort and organise later or codes with an excessively high frequency due to being too broad. The metaphor about "cleaning the kitchen while you cook" by Johnny Saldaña (2021) is one I consistently share with my students, as it aptly illustrates the process.
While it may be tempting to have AI code the first interview and then clean up the data, the problem with this approach is that AI will not use the categories and codes you have generated during clean-up when coding subsequent interviews. Thus, you end up with a new set of categories and codes that you need to review and merge into the code system you started developing.
AI coded the second sample project interview, adding another 82 codes.
Sorting and organising the newly generated AI codes requires reading the data behind them. In doing so, the disadvantage is that you jump into the data at arbitrary points. You can read the coded segment but have yet to learn what was said a few lines earlier or later. As illustrated above, you need the context to evaluate how to code the data. So, you begin to read what was said earlier or later and wonder why on earth you are doing that instead of beginning from line 1 and going through the interview from beginning to end, forgetting about the AI-generated codes.
Therefore, the AI coding does not lead to any time savings here either.
Let's move on to the next test - and this is what users generally would do: Ask AI to code all interviews at once. The test project contains a total of three interviews. Thus, it is smaller than most user projects. For comparison, I also let AI code a project with 50 documents.
When coding the three interviews, the AI tool generated 241 codes compared to 141 codes when coding one interview was divided into ten categories plus 124 single codes (not shown here), of which 95 were applied only once.
Figure 16: Resulting categories when coding multiple interviews at once
It took about 2 minutes to code the data. Thus, I can now focus on refinement. As should be obvious by now – this will not beat the time it takes me to code the data from scratch.
Let’s begin with the first identified category: Emotions
AI correctly collects data segments that are about anger, frustration, gratitude, regret, or sadness. But to what are these emotions related? Does the respondent talk about his own emotions or other people’s emotions? What is the context? Here is an example quote:
Figure 17: AI-coded data segment
Reading the data segment, you might get an idea of what the respondent is discussing. He recounts his visit to the Washington DC memorial.
The AI's seeming ability to comprehend the subject matter of this data segment and its connection to a trauma response is exciting. However, it doesn't actually understand. It is inferred from the text that the moment was filled with sadness and remorse. However, the respondent did not say such things. On the contrary, he said he could not express his feelings.
When manually coding the data, I used the code ‘facilitator: proactive action’, whereby facilitator is a category under a meta-level code ‘Coping mechanisms. The description of the visit was part of a larger narrative around dealing with the war experience. This is why I was able to identify the visit as a facilitator.
Figure 18: Category building based on manual coding.
In the screenshot above, you can see that I found 11 instances in the data representing proactive actions as a facilitating coping mechanism. The AI only found one instance that it coded with ‘Coping mechanism: trauma response’.
Those data segments that I subsumed under ‘proactive actions’ were coded by AI under the categories and codes: cross-cultural factors, mental health issues, emotions, grief, loss, and connection. So, it would have been lots of work to even come up with the idea that these data segments have something in common and can be subsumed under the same subcode of a category that still needed to be determined.
You can see the type of work that is involved when letting AI code just three interviews. It is a misconception that solely tidying up the code labels is sufficient. The list of codes is not synonymous with analysis either. Codes are a means to an end, representing only a minor yet essential aspect of the analysis process.
I also tested what comes out if I code 50 interviews using the AI-tool. This took about 2.5 hours. Fair enough. It would have taken me much longer to read all the documents. However, the problem is the AI tool generated 4143 codes! I don't think I have to elaborate on this result. It has all been said above.
Conclusion first promise
When examining just one data segment individually, the deceptive aspect is that the coding seems logical. Nonetheless, coding entails constantly comparing and contrasting data segments within and across documents. A code must be capable of gathering numerous instances of data segments sharing the same meaning. The effort required to transform AI-generated tags into appropriate categories and subcodes surpasses the time saved during the data coding process.
Promise No. 2
This promise is defeated very quickly. I coded the three interviews that I have been discussing above again and got the following results:
Figure 19: Codes generated by the AI tool on the first iteration
This time, they were no independent codes. All codes were subsumed under the ten categories you see in the above figure. I did not double-check all 240 codes – but from sampling, most code labels seem to be the same but organized differently. Coping mechanism, for instance, was a category code in the first example and now became a subcode under Resilience.
As coding is quick, here is the third version:
Figure 20: Codes generated by the AI tool on the second iteration
This problem was initially solved with an update. As of August 2023 it seems to be worse than at the time of first writing this article in April 2023. I am not only getting different higher-order codes every time I have AI code the data, but I also get a larger number of codes that are different each time I code.
Promise No. 3
As we have seen above, the AI's coding approach requires extensive additional work to transform it into a meaningful code system. So, the AI coding alone won’t deliver insights yet, let alone lead to pattern recognition. Instead, what we end up with when we allow AI to code the data is a huge mess.
Returning to the metaphor of ‘clean the kitchen while you cook’ - this is not a mess that we have produced ourselves. It is a mess someone else has produced, namely the AI tool. And we have no idea yet what is behind all of the codes.
We won't see anything in the data if we still have apples, car keys, a pen, and a banana in one basket.
We first need to sort it out and put all fruits together in one basket, keys in another, pens in yet another basket, etc. The promised 10x accelerated analysis leading directly to insights and pattern recognition is misleading and false.
It is true that with manual coding methods, you may overlook some issues. However, the opposite is also true: human coders can see things that AI cannot, particularly when it comes to interpreting text within context.
Given the many ill-sorted codes that AI generates and the arbitrariness of it, it is definitely not worth the time and effort to wade through the many codes to find a few overlooked topics.
Conclusion third promise
It wasn't too long ago that ATLAS.ti proudly proclaimed on its website that qualitative researchers crafted the software for qualitative researchers. Now it seems to be software developed by tech guys who have a limited understanding of one of the essential aspects of software-supported qualitative data analysis – coding.
Why is canvs' automatic coding better than ATLAS.ti's AI coding?
The answer is twofold:
Firstly, canvs works with different types of data - responses to open-ended questions, social media data, and reviews. These are all short, self-contained pieces of information. I pointed this out at the very beginning. Going through response by response here works well. There is no larger context to consider that is mentioned a few paragraphs earlier or later in the response.
I doubt that the Canvs algorithm would perform significantly better than ATLAS.ti's AI coding when applied to interview data or longer documents that require a coherent understanding. Based on the tests I ran, my conclusion is that AI (currently) can only reasonably be used for coding fairly structured, unstructured (qualitative) data.
Secondly, canvs uses its own algorithms that they have trained and developed over the last ten years to generate topics and codes.
When I have the ATLAS.ti AI-tool code open-ended questions from a survey, I encountered the same problems as described above: I get around 150 codes coding one open-ended question for 99 respondents.
Learning from User Experience
I introduced my students to the AI coding tool during a recent class. However, they quickly dismissed it upon seeing the excessive number of codes generated, even for small amounts of data.
As the course advanced, we delved into sophisticated analysis tools and learned how to extract insights from data through memo writing. This procedure involves generating queries, reviewing the resulting coded data segments, and then summarizing and interpreting them.
When I said to the students: Imagine a button in the Quotation Reader that automatically generates a summary of the data and adds it to the memo – their eyes lit up.
The students were excited about the possibility of an AI Summary Assist, similar to the AI Story Assist offered by canvs. Such a tool would be much more helpful than a bunch of empty promises.
UPDATE UPDATE UPDATE UPDATE UPDATE UPDATE
Just a few days after publishing this article, MAXQDA - a program similar to ATLAS.ti - released their AI function. It is exactly what my students and I had wished for. They have called it AI Assist.
Figure 22: This is what users want.
Another application I know will make users happy is being able to code data based on previously manually coded data. This means you begin to code a few interviews and develop a code system, including code definitions.
Then you give it to AI to learn about your code system. Next, you give AI the data you have coded under each code so it can learn what you have coded and how you have coded it. Once AI has learned from your examples, you give AI new data and ask it to apply the codes from your manually generated code system.
I asked Chat GPT whether this is feasible for generative AI. This is what it answered:
“Hello! As an AI language model, I can try to help. The process you described involves initially manually coding a few interviews to create a code system, including code definitions. Once you've done that, you can provide the AI with your code system and the coded data under each code, allowing the AI to learn how to code effectively from your examples.
After the AI has learned from your examples, you can supply it with new data and ask it to apply the codes from your manually generated code system. Remember that while AI has advanced considerably, it may not always provide perfect results. Reviewing and verifying the AI-generated codes is essential to ensure accuracy and alignment with your established code system.”
The conclusion - for the time being
The current capabilities of AI for coding longer texts, such as interview transcripts, do not fully deliver on the three promises made: time-saving, increased reliability, and the discovery of deeper insights and patterns.
Instead, qualitative researchers would greatly benefit from AI assistance in summarising data. At this stage of the analysis process, the analyst is well-acquainted with the data, and any errors made by the AI tool can be swiftly rectified.
Another potential application could be to let AI learn from a human coder first. This would involve initiating a project by manually developing a code system, allowing AI to learn from it, and then delegating the subsequent coding tasks to AI.
This would save valuable time that I would rather spend on other things than refining AI-coded data, such as sipping on a good glass of wine while watching the sunset and chatting with a real-life friend.
Figure 23: Spending valuable time