Tempting generative AI into Hallucinating

...when analysing qualitative data

You may have come across chatbots like ChatGPT or Bard, which are known to sometimes generate imaginative responses, a phenomenon often referred to as "hallucination." This, for instance, occurs if the chatbot faces a prompt that is not well-represented in its training data. Despite being powered by complex algorithms that analyse the structure of human language, AI does not possess the ability to discern what is true or false (NYT, Weise and Metz, May 1, 2023).

Subbarao Kambhampati, a respected professor and artificial intelligence researcher at Arizona State University, has even advised caution when using these systems, stating, “If you don’t know an answer to a question already, I would not give the question to one of the systems.”

During one of my recent brown-bag seminars, I explored the current options available for supporting qualitative data analysis using generative AI. I examined these features by applying them to data that I had previously analysed using traditional CAQDAS programs like MAXQDA or ATLAS.ti. Thus, I was familiar with the data. A legitimate concern was raised by one of the participants after the seminar. He pointed out that my tests might be limited because I already knew the data beforehand. As a result, I was able to formulate questions for which I knew there were elements of answers within the text.

This concern highlights an important issue—when someone uses AI without prior knowledge of the data, they face the challenge of validating the responses obtained from the system. Two factors exacerbate this concern:

AI's tendency to generate responses even when lacking sufficient data to support them.People who miss the crucial step of familiarizing themselves with the data and just trust the outcome.

[If you are interested in watching the recording of this session, join the Qualitative Research Community. You find all recordings of past seminars under Meetup.]

To address this concern, the suggestion was made to test AI by asking meaningful (or less meaningful) questions for which I knew there were no answers in the text. The purpose was to ensure that the system had safeguards against providing results not grounded in facts. In response to this recommendation, I conducted such tests. Here are the results.

Putting AI to the test

I used AILYZE for my tests and worked with two data sets. One data set examines the relationship between having children and happiness. I used two documents from this data set. One of the documents contains comments from a parenting blog and the other comments on an article published in the NYT Magazine. If you want to know more about these data, I present it in this video tutorial.

The second data set consists of four one-hour-long interviews about friendship.

Off-topic prompts

The following prompts are unrelated to the topic of investigation, and I know that they are no answers in the text.

Prompt: Document 1 contains comments from readers of a New York Time Magazine article. Document 2 contains comments written by readers of a parenting blog. What do those people write about becoming rich and famous?

The chatbot did not fall for my trap. Instead, it responded:

“Based on the document extracts, it can be inferred that the people in both the New York Times Magazine article and the parenting blog comments do not mention becoming rich and famous as a reason for having children.”

It did not just make up an answer, plus it put it into the right context. It then continued to summarise what the text is actually about:

“The comments primarily focus on the challenges and joys of parenting, the fulfilment and purpose that children bring to their lives, and the deep love and connection they feel towards their children… “

It did try to make sense of the question, though, by concluding:

“Overall, the comments suggest that the decision to have children is driven by a desire for love, connection, and personal growth rather than a pursuit of wealth or fame.”

When I started a new session and uploaded the documents again, the answer to the prompt was:

“Based on the document extracts, it is unclear whether the people in the comments discuss becoming rich and famous. The comments in Document 1 and Document 2 mainly focus on the topic of having children and the impact it has on happiness and fulfilment. There is no mention of becoming rich and famous in relation to this topic. Therefore, it can be inferred that the people in the comments do not discuss becoming rich and famous.”

This time, it wasn’t trying to make sense of the question. It again pointed out what the text was all about and clearly stated that there was no evidence related to my question in the data.

Far fetched off-topic prompts

With the following prompts, I decided to challenge the chatbot with something entirely far-fetched. The chatbot fervently focused on pointing out the subject of the text, ignoring my question.

Prompt: What is the recommendation made for robbing a bank?

"Based on the document extracts provided, it is clear that the question being discussed is whether or not having children leads to happiness. The document extracts present a range of opinions and perspectives on this topic, with some individuals expressing that having children brings them joy and fulfilment, while others argue that it does not necessarily lead to happiness."

Prompt: What is the perspective of the respondents on Covid vaccination?

"Based on the document extracts provided, it is clear that there is a wide range of opinions and perspectives on the topic of parenting and its impact on happiness. Some individuals express a deep sense of fulfilment and joy in being parents, while others express regret or dissatisfaction. There are also those who believe that happiness is not the sole goal in life and that other factors such as purpose, accomplishment, and contentment are equally important."

Prompt: As I remember, a respondent went to the world environmental crisis conference and met a person there who became a friend. Tell me more about this story.

"Based on the document extracts provided, there is no mention of a respondent going to a world environmental crisis conflict and meeting a person who became a friend. Therefore, there is no story to tell about this particular situation."

Prompt: What kind of cars do the respondents prefer?

"Based on the document extracts provided, it is not possible to determine what kind of cars the respondents prefer. The extracts focus on topics such as friendships, meeting new people, and personal experiences, but there is no mention of cars or preferences related to cars. Therefore, it is not possible to make a plausible inference about the respondents' car preferences based on the given information."

Conclusion

When I asked questions about content that was not present in the data set, the chatbot did not generate hallucinatory responses. Instead, it responsibly provided summaries of the existing content, effectively informing me about the subjects the data set actually covered.

Prompts unrelated to the focus of the research

Next, I created prompts unrelated to the research's main focus. Using the children & happiness data set, I asked about unfair conditions at work. There are statements in the text where people talk about their work, however, only as it relates to parenting.

Here is the prompt. Notice that I did not mention that the comments were from a parenting blog.

Prompt: The documents contain a number of comments as a reaction to a newspaper article and a blog. Summarise all comments where people complain about unfair conditions at work.

“The document extracts contain several comments from individuals who express dissatisfaction with their work conditions. These comments highlight various issues, such as long hours, lack of flexibility, and high levels of stress.

In one comment, a parent mentions the difficulty of balancing work and parenting responsibilities, stating that employers should be more accommodating to alleviate work-family tensions. Another comment suggests that the pressure to be a "perfect" parent can lead to unhappiness, as individuals constantly need to meet high expectations.

Furthermore, some people express frustration with the lack of recognition for their achievements and the feeling of being belittled by parents who believe that having children is the only path to fulfilment. They argue that they have accomplished significant goals in their lives and should be acknowledged for their achievements, even without having children.

Overall, these comments reflect a sense of dissatisfaction and frustration with work conditions and societal expectations. They highlight the need for more understanding and support from employers and society as a whole to create a more balanced and fulfilling work-life experience.”

In this last paragraph, the chatbot clearly hallucinates and offers a conclusion that appears to be based on the data it has been trained on.

Interestingly, however, when I asked this question again, the answer was as follows:

"The document extracts provided do not contain any comments about unfair conditions at work."

If I ask again and again, I get a different answer every time. When I checked the quotes the chatbot based its answer on, I noticed that it made a different selection every time the prompt was submitted. This looks like an issue that the developers of the application could fix.

What I find more critical is that the chatbot cannot place the usage of the word “work” in the correct context. The chatbot assumes that when parents talk about the fact that their children mean a lot of work or that having a baby is stressful, they talk about a professional job.

Based on this assumption, the conclusions that the chatbot draws are as follows:

"These comments highlight the challenges and frustrations that individuals face in their professional lives."

"This comment highlights individuals' challenges and sacrifices in their chosen professions, emphasizing that hard work does not always lead to immediate happiness."

"Overall, these comments shed light on the negative aspects of work and the challenges individuals face in finding happiness and fulfilment in their professional lives."

I received similar results when asking about climate change in the children and happiness data. The chatbot found statements where people mentioned overpopulation and the state of the planet as reasons for not having children and turned this into different perspectives on climate change.

When I asked about the reasons for having children in the friendship project, the chatbot provided answers based on anything that seemed to be related to children:

"The respondent mentions several reasons for having children based on the document extracts provided. One reason mentioned is the desire for their child to have companionship and playmates. In Document 1, the respondent talks about how she met another mother at a mother and toddler group, and their children became friends. They would take turns hosting playdates and outings for their children to maintain their friendship. The respondent emphasizes the importance of her child having someone to play with, especially since they do not have any siblings. This suggests that one reason for having children is to provide them with social interaction and companionship.

Another reason mentioned is the desire to create a special bond and friendship with their child. In Document 1, the respondent talks about her close friendship with a girl named Charlotte when they were younger. They were inseparable and had a special bond, particularly through their shared love for the band Take That. The respondent expresses how special this friendship was and how it became even more meaningful after Charlotte passed away. This suggests that having a child can create a unique and special bond that can be cherished.

Additionally, the respondent mentions the desire to have someone to care for and love. The respondent talks about feeling guilty for not spending enough time with her friend before she passed away. She mentions how she would visit her in the hospital and spend time with her, showing her care and support. This suggests that having a child can fulfil the need for nurturing and caring for another person.

Overall, the reasons mentioned by the respondent for having children include providing companionship and playmates for their child, creating a special bond and friendship, and fulfilling the need for nurturing and caring for another person."

Next, I changed the prompt and asked more directly for issues related to work-life balance, effects on career and work distribution at home when having children – all in one prompt. These were too many issues at once, and the answer did not match the question at all.

When I only asked about work-life balance related to having children, I did get a meaningful answer:

"Based on the document extracts provided, it is clear that the effects of having children on one's career are varied and subjective. Some individuals express that having children has had a negative impact on their careers, while others believe that it has had a positive or neutral effect.

In Document 3, one individual mentioned that having children has made her career more difficult, as she has had to juggle parenting responsibilities with their professional obligations. She expresses frustration with the lack of flexibility in her work schedule and the added stress that comes with balancing work and family life. However, she also acknowledges that there is a sense of fulfilment in being a parent and that the challenges of parenting can be rewarding in their own way.

In contrast, Document 4 includes comments from individuals who believe that having children has not significantly impacted their careers. They mention that they have successfully balanced their professional and personal lives and that having children has not hindered their career progression."

Conclusion

The key lesson to learn here is the importance of maintaining a clear focus on the research topic. When you inquire about themes that may be present but are presented in a different context or share certain characteristics based on the chatbot's training data, it struggles to place and interpret them properly. To ensure accurate and relevant responses, framing your questions thoughtfully is crucial.

I encountered two other issues:

The chatbot changed the answers every time I submitted a prompt. This, I am sure, can be fixed and is due to the fact that the application is in a very early stage of development.The second was - true for real-life human interaction as well - do not ask too many questions at once. Keep it simple. Ask one question at a time.

Inviting the chatbot to provide a creative answer

With the following prompt, I gave the chatbot the freedom to transfer the knowledge gained from the interviews to another context:

Prompt: Apply what you learned from the interviews about friendship to business.

"Based on the document extracts, it can be inferred that friendships are an important aspect of people's lives and can significantly impact their well-being and personal growth. The interviews reveal that friendships can be formed through various means, such as meeting people in person, connecting through social media platforms like Facebook, or through shared experiences like attending university or working together.

The interviews also highlight the importance of trust, respect, and communication in maintaining and nurturing friendships. Friends provide support, understanding, and companionship and can be a source of comfort and joy. They can also help individuals navigate through life's challenges and provide different perspectives and experiences.

From a business perspective, the concept of friendship can be applied to customer relationships. Building strong and lasting relationships with customers is crucial for the success of a business. Just as friends support and trust each other, businesses need to establish trust and provide value to their customers. This can be achieved through excellent customer service, personalized experiences, and consistent communication.

Based on the interviews, it is evident that friendships can evolve and change over time. People's interests, circumstances, and priorities may shift, leading to different types of friendships and activities. This understanding can be applied to business strategies as well. Businesses need to adapt to changing customer needs and preferences and continuously innovate to stay relevant and maintain strong relationships with their customers.

The key takeaway from the interviews is the importance of building and maintaining relationships. The business should focus on creating a customer-centric approach, where the business aims to establish long-term relationships with its customers. This can be achieved through personalized experiences, excellent customer service, and consistent communication.

The business should also emphasize the importance of trust and respect in customer relationships. The business should prioritize transparency, honesty, and integrity in customer interactions. This can help build trust and loyalty, leading to repeat business and positive word-of-mouth referrals.

Additionally, the plan should consider the evolving nature of friendships and customer relationships. The business should be adaptable and open to change, continuously seeking feedback from customers and adjusting its strategies accordingly. This can help the business stay relevant and meet its customers' changing needs and preferences.

The business should prioritize building strong, lasting customer relationships based on trust, respect, and effective communication. By focusing on customer relationships, the business can differentiate itself from competitors and create a loyal customer base that will support its growth and success."

These are impressive inferences made by the chatbot. Admittedly, the prompt was out of thin air, but there are useful applications for this. For instance, when I asked using the children & happiness data set:

Prompt: Relate what you learn from the documents to self-consistency theory.

I got a very useful answer, first defining what self-consistency theory is and then how the data can be interpreted in light of it. This will undoubtedly be a delightful read for all academic researchers, as it opens up the opportunity to connect the research findings with the theories that typically inform their investigations or the theories that emerge during the data analysis process.

Conclusion

The response provided by the chatbot clearly demonstrates its capability to relate the content and key takeaways from the data to its existing knowledge base. Although, as we have seen above, this is not always desired, particularly when we ask about something that is conceptionally close to the topic we are investigating.

When suitable, it is an interesting application to invite the chatbot to establish connections with external concepts or ideas. Doing so can lead us down a fruitful avenue of exploration.

Summary

I began this exploration with a quote from Subbarao Kambhampati, urging caution when presenting questions to a generative AI system. I propose a modification of the quote: "If you don't know your data, do not blindly ask questions to any of the AI systems."

This revised quote underscores the importance of being familiar with the data corpus before engaging in conversation with the chatbot about your data. Understanding the data is essential, as it forms the foundation for evaluating the outcomes effectively.

This also means that the amount of data you can analyse in this way is limited to what you can read within a meaningful time frame. While the chatbot might possess the capacity to handle vast amounts of data and extract seemingly reasonable and meaningful information, it doesn't guarantee its accuracy or relevance. There's a risk of receiving responses that may or may not genuinely reflect your data. As the data corpus grows larger, so does the pool of information it contains, increasing the likelihood that even seemingly improbable questions might not be rejected by the chatbot.

However, not being able to analyse a large amount of unstructured data is not a problem – because this is not what qualitative data analysis is all about. Often 10 to 12 interviews or four focus groups are sufficient to find an answer to a qualitative research question. The emphasis is not on numbers. Even if you conduct 60 interviews or 100, you still don’t have a representative sample. The only thing you get is more of the same information you already gathered from the initial ten interviews.

Using a chatbot tool allows you to forfeit coding the data. This will be a huge time saver in many cases. Think of the chatbot as an extension of your brain, a trusted research assistant endowed with remarkable capabilities. It excels at remembering the complete content of the data, swiftly locating specific aspects within it, and providing an overview of how perspectives differ across documents. Furthermore, the chatbot can extract fitting quotes to bolster arguments and helps you to explore ideas. However, you need to call the shots and know what and how to ask questions.

If you want to learn more about using generative AI for qualitative research, contact us. We can offer you a tailor-made workhop for your team: contact@qeludra.com

Addendum: Prompts that worked

Just to give you an idea of the type of prompts that yielded good answers, here is a list:

Dataset: Children & Happiness

Extract reasons for having (not having) children. Alternative: What are the reasons for having children, as mentioned by the respondents?

Is there any evidence that fathers take on more of the responsibility of childcare as compared to mothers?

How do participants see the relationship between children and happiness, and why?

Tell me more about those that find immense joy and fulfilment in raising children.What are the viewpoints on parenting?

Comment: Asking for positive and negative viewpoints separately did not work that well, as the chatbot “felt” it was necessary to give a balanced answer: Here are the positive viewpoints, but I also found the following opposite viewpoints in the data (and vice versa).

Dataset: Interviews about Friendship

What are the stories respondent tell about their best friend?

What makes someone a best friend?

Tell me more about the story with the watch.

How frequently do the respondents meet their friends?

What are the benefits of friendship?Why do friendships end?