11/3/25
Tidsskriftet.no: The Doctor on Using Artificial Intelligence: "I Have More Time to Breathe"
General practitioner Benedicte Wardemann demonstrates how Noteless transcribes patient consultations in real-time and systematically organizes the content into a completed journal note draft, capturing both medical issues and emotional nuances.

More and more general practitioners are using artificially intelligent assistants that transcribe during patient consultations. One of these is the specialist in general medicine, Benedicte Wardemann.

– If you’d like, I can play a patient and present a problem? Then we can see how the program picks up the different parts?

General practitioner Benedicte Wardemann at Vest Helse og Trening in Bærum demonstrates how her artificially intelligent assistant works. She presses the "new consultation" button on the PC before she begins to speak:

– Hi, I would like help with losing weight. I have tried before and have attempted many different diets. I might manage to lose five kilos before I notice that it becomes too difficult to maintain. Then I break down and gain the weight back. This is, of course, shameful for me, and I feel like I am failing at something that everyone else takes for granted.

As Wardemann speaks, the words are transcribed in real-time on the PC screen – via a small, round, black microphone on the desk. The name of the program is Noteless, an artificially intelligent assistant designed to help healthcare personnel with documentation work.

Wardemann continues to role-play a typical patient before she then pretends to be the doctor in this fictional consultation.

Systematized Medical Journal Note

When the conversation is over, the general practitioner makes a few clicks on the computer. The result? A draft of a completed journal note. She reads the summary aloud:

Current situation: Wants help with weight loss. Has previously attempted various diets with only temporary effects. Manages to lose approximately five kilos before finding it too difficult to maintain the weight loss, then gains the weight back. Experiences this as psychologically distressing, with feelings of shame and lack of control.

The program systematically organizes the content of the patient consultation under the sections Current Situation, Findings, Assessment, and Measures. According to Wardemann, the note ends up being much longer than what she would have written herself.

I think it’s fascinating that it uses such well-structured sentences. If I had written this myself, I would have used just one or two lines – at most. I also probably wouldn’t have included the fact that she experiences feelings of shame and lack of control, which are also part of the complexity of losing weight.

After verifying that everything in the draft note is correct, the general practitioner copies the text into the patient’s official medical record, saving valuable minutes in the process.

AI Research on the Rise

The fact that Wardemann’s AI-generated medical notes are longer, more detailed, and capture nuances she might not have prioritized is an important consideration when examining research on different language models. Several studies indicate that AI-generated responses to health-related questions are often perceived as more empathetic than those from actual doctors.

Recently, Tidsskriftet published a study titled Artificial Intelligence and Doctors’ Responses to Health Questions (1). In this study, 192 health-related questions and their corresponding answers from doctors were sourced from the website Studenterspør.no. The language model GPT-4 was then used to generate a new set of responses to the same questions. In a blind test, both the doctors’ and AI-generated responses were evaluated by a group of respondents with healthcare backgrounds.

The results?

The AI-generated responses were perceived as more empathetic, knowledgeable, and helpful than those given by actual doctors.

One of the researchers behind the study, Ib Jammer, a Ph.D. and anesthesiologist at Haukeland University Hospital, explains the background of the research:

In one of his lectures, he had presented a similar study from the U.S. (2), which examined how ChatGPT responded to health-related questions.

The results published at the time showed that ChatGPT’s responses were often rated as significantly better than those from human doctors. We found that intriguing. Could a computer really be better than us? How is that possible? And does this also hold true in Norwegian?

KI-ASSISTENT: Den lille, runde mikrofonen registrerer lyden i rommet slik at KI-programmet kan lage et utkast til…
The small, round microphone captures the sound in the room, allowing the AI program to draft a journal note.
Photo: Leikny Havik Skjærseth

With this background, Jammer and his colleagues conducted their study. However, despite multiple studies showing that language model responses to health-related questions are often perceived as more knowledgeable, it is crucial to recognize the models' weaknesses.

Need for Regulation

There have been cases where language models have made statements they should not have, says Ishita Barua, a doctor with a Ph.D. in artificial intelligence.

I believe there was a case in Belgium where a man took his own life after following advice from a language model. It’s precisely this kind of unintended consequence that we must avoid. This needs regulation, and that’s what makes language models challenging—because communication is highly dynamic. Regulating this today is difficult, and it will continue to be difficult in the future. But people must understand that this is just a language model—it does not have real empathy.

Barua believes that language models can be a valuable tool for both doctors and patients—as long as they are properly understood and used correctly. However, she emphasizes that there are many aspects of their use that we still need to consider, and some we may not even be aware of yet.

There are many lonely people with no one to talk to who find great comfort in using language models. For example, we are currently unable to fully meet these needs within psychiatry. So, given that this technology is here to stay, we must do what we can to ensure these models are as safe as possible. But again—I don’t know if it’s possible to regulate this completely.

AI Better Than Doctors – Or Not?

The study published in Tidsskriftet is one of many recent studies examining the use of language models in healthcare. Several of these studies suggest that AI models perform better than doctors themselves, says Ishita Barua.

In the past six months, multiple studies have tested language models on medical questions and cases. Typically, ChatGPT has been tested, and several results indicate that it performs better than doctors alone—even better than doctors who use language models. That last point has been particularly surprising.

However, a Swedish study, recently published in BMJ Open (3), produced opposite findings. Barua emphasizes that this study is more comparable to Norwegian conditions since it was conducted in a neighboring country and used case studies that closely resemble real clinical situations.

In that study, doctors outperformed language models. So, the findings point in the opposite direction. Perhaps we should give more weight to studies that focus on real clinical scenarios, where language models fail to match doctors’ performance.

It’s fascinating that we now have studies with conflicting results.

Jumping on the AI Train

Despite the varying study results, there is no doubt that artificial intelligence is here to stay—not just in healthcare but across many areas of society. While some have already embraced AI tools, others remain skeptical. For some, this powerful technology may even feel like a threat.

I don’t think healthcare professionals will be replaced by AI, but we could be replaced by people who know how to leverage AI tools effectively. I always say that we won’t become obsolete—we just need to find new niches to work in. Resisting AI won’t help. It’s coming. We have a choice: embrace it and jump on the train, or stay behind at the station, says Jammer.

Ishita Barua compares today’s AI revolution to the introduction of the internet.

This will become deeply integrated into all aspects of society. It’s similar to asking why we needed to learn how to use the internet. We need everyone to have a basic understanding of what AI is. When doctors receive recommendations from AI models, they must uphold their professional integrity. They need to understand what this technology truly is and recognize when it crosses into a gray area that requires human intervention. Anticipating potential errors and pitfalls before they happen is crucial.

According to Barua, the fields that have advanced the most in AI adoption within healthcare are those that rely heavily on imaging and pattern recognition—such as radiology, cardiology, dermatology, and gastroenterology.

Additionally, there is significant focus on generative AI and language models, she adds.

Pitfalls of AI in Healthcare

Back at Benedicte Wardemann’s general practice at Vest Helse og Trening in Bærum, notes are posted around the office informing patients that AI is used during consultations. According to Wardemann, no patients have reacted negatively, even though they can see their words being transcribed live on the computer screen.

She clarifies that the microphone does not record the conversation but transcribes what is being said. Additionally, the AI system operates independently of the official medical records system.

It’s not common practice to say a patient’s name or birth number aloud during a consultation. That’s why the system remains completely independent of personal data.

Once the consultation is finished, Wardemann spends about one minute reviewing the journal note to ensure accuracy.

You always need to check. The AI model is a ‘pleaser’—it’s always positive and doesn’t recognize the limitations of its knowledge base.

This need for oversight is crucial, according to Ishita Barua.

I believe automation bias is one of the biggest pitfalls. We tend to over-trust machines, which leads us to rely too heavily on AI-generated decisions and recommendations. The second issue is algorithmic bias—some AI models may not be trained on a sufficiently diverse dataset.

We have an excessive trust in machines, making us overly dependent on AI-generated decisions and recommendations.

When it comes to saving time, Ib Jammer points out that the effectiveness of AI language models greatly depends on how well instructions are formulated.

If we hadn’t provided specific instructions in our study—just presented a question—we would have received answers that required significant editing, he explains.

Doctors Are Still in Charge

For general practitioner Benedicte Wardemann, who manages a patient list of 1,150 people, AI-powered technology has given her more breathing room in her daily work.

How much time do you think you save during an average workday?

More than an hour. And it hasn’t led me to cram in more patient appointments—it has simply given me more breathing space. I feel better about myself.

She believes that embracing new technology is essential.

I think we need to dare to use new technology that can help us in our daily work. I feel like I’m getting my time back and spending more of it on my actual role as a doctor. So my advice to colleagues is to give it a try, be open to new technology—but, of course, always remember: we are still in charge.