Voice Dictation for Session Notes: A Therapist's Workflow

You just finished a 50-minute session. Your next client is in 10 minutes. You have exactly enough time to use the bathroom, check your phone, and not write a session note.

So you don't. You tell yourself you'll write it later. And later becomes the end of the day, when you're sitting at your desk trying to reconstruct five sessions from memory. The details blur. The clinical nuance fades. By the time you're writing the note for your 2pm client at 7pm, you're documenting what you remember, not what happened.

Voice dictation breaks this cycle. Instead of writing, you talk. Instead of sitting at your desk, you use your phone. Instead of reconstructing from memory hours later, you capture the session while it's fresh — in the time it takes to walk to the bathroom and back.

How voice dictation works for therapists

The workflow is three steps.

Step one: speak. After a session, you open the app on your phone and talk through what happened. No structure required — just narrate the session as if you were consulting with a colleague. "Client reported increased anxiety around work. We worked on cognitive restructuring for catastrophizing. She mentioned the panic attack from Tuesday. PHQ-9 was 14, up from 11. Plan is to continue CBT focus and add grounding techniques for acute episodes."

Step two: transcribe. Speech-to-text converts your dictation into text. Modern transcription engines handle clinical terminology, assessment abbreviations (PHQ-9, GAD-7, BIRP), proper nouns, and the natural pauses and corrections that happen in spoken language. The transcription happens in seconds.

Step three: structure. AI takes the transcription and organizes it into your chosen note format — SOAP, DAP, BIRP, GIRP, or whatever you use. Subjective information goes into S. Objective data goes into O. Your clinical reasoning goes into A. Next steps go into P. The draft is ready for your review.

The total time from finishing a session to having a structured note draft: two to three minutes.

Why speaking is better than typing (for most therapists)

Therapists are verbal processors. You spend your entire working day listening and talking. Sitting down to type after a day of sessions requires a cognitive gear shift that many therapists find exhausting.

Speaking doesn't require that shift. You're doing what you do all day — talking through a clinical encounter. The difference is that instead of talking to a client, you're talking to your phone. The thinking process is the same.

There's also a completeness advantage. When therapists type notes, they tend to abbreviate. Short phrases, incomplete thoughts, clinical shorthand that makes sense in the moment but loses meaning later. When they speak, they naturally include more context, more clinical reasoning, more of the "why" behind their observations. Spoken notes tend to be richer than typed ones.

And then there's the timing advantage. You can dictate while walking between rooms, sitting in your car between sites, or standing in the hallway with 90 seconds before your next client. You can't type a full note in those moments, but you can speak one.

What good dictation sounds like

You don't need to dictate in a formal clinical voice. The AI handles the formatting. What you need to do is cover the key elements:

What the client reported. Their mood, their week, significant events, symptoms, concerns. This becomes the Subjective section.

What you observed. Affect, engagement, behavioral changes, assessment scores if they completed one between sessions. This becomes Objective.

Your clinical thinking. What the data means, how the client is progressing, what's working and what isn't. This becomes Assessment.

What comes next. Interventions to continue, new strategies to introduce, homework, referrals, next assessment schedule. This becomes Plan.

A good dictation covers all four in about 60-90 seconds of talking. It doesn't need to be polished. It doesn't need to be in order. The AI sorts it out.

Here's what a raw dictation might sound like:

"Session with Jordan today. He came in talking about the argument with his partner over the weekend — said it triggered some of the old patterns we've been working on. Affect was flat, more withdrawn than usual. GAD-7 came back at 16, which is up from 12 two weeks ago. We spent most of the session on the argument — used the thought record to identify the catastrophizing pattern. He was able to see it by the end of the session but expressed frustration that he can't catch it in the moment yet. Plan is to continue cognitive restructuring, I'm going to assign a daily thought record for the next two weeks, and we'll revisit the GAD-7 at next session. No safety concerns."

That's about 40 seconds of speaking. The AI turns it into a structured SOAP note. You review, edit the parts that need adjustment, approve, and move on.

The review step matters

Voice dictation saves time. It does not save you from reviewing the output.

Transcription errors happen. A clinical term gets misheard. A number gets transposed. A sentence gets garbled because you were walking and the microphone picked up background noise. These are normal and expected.

AI structuring errors also happen. A subjective statement might get placed in the Objective section. A clinical interpretation might end up in the Plan. The AI is very good at sorting information into the right sections, but it's not perfect.

The review step — reading the drafted note, correcting errors, adjusting language, adding nuance — is where your clinical expertise shows up. It takes two to three minutes and it's non-negotiable. A note that saves 15 minutes of writing but introduces an error you don't catch is worse than a slow note.

Think of the review as editing, not writing. The hard part — getting the content out of your head and into a structured format — is already done. You're just polishing.

When to dictate vs. when to type

Voice dictation isn't always the best choice. Here's when each approach works.

Dictate when:

  • You're between sessions with limited time
  • You're mobile (walking, driving, between locations)
  • The session was straightforward and you can narrate it quickly
  • You process better verbally than in writing
  • You want to capture details while they're fresh

Type when:

  • The session was complex and you need to think carefully about your formulation
  • You're in a shared space and can't speak freely about client information
  • The note requires precise clinical language that you want to craft word by word
  • You're writing a note that will be shared externally (referral summary, legal documentation)

Most therapists end up using a mix. Quick, routine sessions get dictated. Complex cases get typed. The flexibility to choose is the point.

Privacy and confidentiality

Speaking about client sessions out loud raises legitimate privacy concerns. A few practical guidelines:

Don't dictate in public spaces. The waiting room, a coffee shop, a shared office with thin walls — these are not appropriate places to speak about client information. Use a private room, your car with the windows up, or a quiet hallway where you won't be overheard.

Use a HIPAA-compliant tool. The dictation, transcription, and AI processing should all happen within a HIPAA-compliant infrastructure. Your phone's built-in voice memo app is not HIPAA compliant. A dedicated clinical documentation tool with BAAs, encryption, and access controls is.

Delete local recordings. If the tool stores a temporary audio recording on your device during transcription, make sure it's deleted after processing. You don't want client audio sitting on your phone.

The compounding effect

The real value of voice dictation isn't any single note. It's what happens over weeks and months.

A therapist who dictates notes between sessions finishes their documentation during working hours. They don't take notes home. They don't spend Sunday evening catching up on a week's worth of charts. Their notes are written the same day — often within minutes of the session — which means the clinical detail is better, the documentation is more accurate, and the therapist is less burned out.

That's the compounding effect: faster notes lead to timelier documentation, which leads to better clinical records, which leads to better session preparation, which leads to better therapy. And the therapist gets their evenings back.


Theracharts includes voice dictation powered by Whisper, with AI structuring into SOAP, DAP, BIRP, GIRP, and DBT note formats. Speak it, review it, done. Get started free.

Previous
Previous

The Case for a Client Portal: Why Your Clients Should See Their Own Data

Next
Next

Building Custom Assessment Forms: When Standard Instruments Aren't Enough