Doctora turns your spoken exam into a structured clinical note. The better the audio, the better the note. This guide covers practical ways to improve recording quality, work with speaker identification, and get the most accurate AI output with the least editing.
Microphone setup
Your computer's built-in microphone works, but it picks up everything in the room equally--keyboard clicks, the air handler, the autorefractor cycling. A dedicated external microphone makes a noticeable difference.
What works well:
- A USB lapel mic clipped to your coat. These run $20-40 and dramatically improve voice clarity because they stay close to your mouth regardless of where you turn.
- A desktop USB condenser mic positioned 12-18 inches away. Good for doctors who stay at a workstation during most of the exam.
- AirPods or similar earbuds with built-in microphones. Surprisingly effective and convenient.
What to avoid:
- Recording from a laptop across the room. The further the mic is from your voice, the more ambient noise competes with your speech.
- Bluetooth speakers used as microphones--they often have poor input quality even when output sounds fine.
Before your first recording, do a quick test: record 30 seconds, stop, and check that the transcript captures your words accurately. If it does, your setup is good.
Speaking naturally
You do not need to dictate like you are leaving a voicemail. Talk to the patient as you normally would. Doctora's AI is trained on clinical conversation--it understands context, extracts the relevant findings, and ignores the small talk.
Say things the way you would say them to a colleague:
- "Lids and lashes are clean, conjunctiva is white and quiet."
- "There's a nasal pterygium on the right, maybe grade two. We'll watch it."
- "Cup-to-disc ratio is point three both eyes, nice and healthy."
The AI handles abbreviations, medical shorthand, and conversational phrasing. You do not need to spell out "oculus dexter" instead of "right eye" or avoid contractions. Just be clear.
How speaker identification works
Doctora uses speaker diarization--the system identifies distinct voices in the recording and labels them (doctor, patient, staff, family member). This matters because the AI needs to know who said what. When a patient says "my eyes have been burning for two weeks," that becomes part of the chief complaint. When you say "there's mild blepharitis bilaterally," that becomes an examination finding.
The system identifies speakers by voice characteristics, not by any pre-registration. It works best when:
- Each person's voice is distinct (pitch, cadence, volume)
- Speakers take turns rather than talking over each other
- The microphone position stays consistent throughout the exam
After transcription, the AI uses contextual clues--medical terminology, question-asking patterns, clinical observations--to determine which speaker is the doctor and which is the patient. A confidence score is assigned to each identification.
What to say during the exam
Think of the recording as your clinical narrator. Anything you want in the note, say it out loud. The AI cannot document what it does not hear.
Findings and observations: Narrate what you see at the slit lamp, what the fundus looks like, what the OCT shows. "Angles are open to the scleral spur, no neovascularization." "Vitreous is clear, disc is flat and pink, macula looks good, vessels are normal caliber."
Assessment and plan: State your diagnoses and what you are recommending. "This is dry eye, probably aqueous deficient. I want to start her on a warm compress routine and Restasis, follow up in six weeks."
Pretesting results: If your tech did not enter readings, narrate them. "Autorefraction right eye is minus two minus one at ninety. Left eye minus one seventy-five minus point seven five at eighty-five."
You do not need to narrate in any special order. The AI maps your words to the correct sections of the note regardless of when you say them.
What NOT to say
The AI captures everything. It is good at filtering clinical content from casual conversation, but a clean recording produces a cleaner note.
Be mindful of:
- Personal anecdotes unrelated to the exam--the AI may try to contextualize them as history
- Side conversations with staff about scheduling, billing, or other patients
- Corrections mid-sentence can sometimes confuse the output ("the cup-to-disc is point four--no wait, point three")
If you need to have a non-clinical conversation during the exam, you can pause the recording and resume when you are ready. The pause button is always visible during an active recording.
Transcription modes
Doctora offers two recording modes:
Real-time transcription streams your audio to the transcription engine as you speak. You see words appearing on screen in near real-time. This is useful when you want to verify the system is capturing your speech, or when you want the note available seconds after you finish the encounter. Real-time mode uses Deepgram's streaming API with speaker diarization active throughout.
Batch transcription records the full audio locally, then sends the complete file for processing after you stop. This mode can be slightly more accurate because the engine processes the entire audio as a single unit, giving it more context for speaker identification and medical terminology. Batch mode is the better choice in environments with inconsistent internet connectivity, since the recording is saved in chunks as you go and will not be lost if the connection drops momentarily.
Both modes produce the same structured clinical note. The difference is timing--real-time gives you the transcript sooner, batch gives the engine more context to work with.
Handling multiple speakers
A typical eye exam involves the doctor, the patient, and often a technician. Doctora handles this well, but a few habits help:
- Let people finish speaking. Overlapping speech is the hardest thing for any transcription system to parse. A half-second pause between speakers makes a big difference.
- Identify roles when helpful. If a tech is reporting pretesting results and the AI might not recognize their role, a quick "go ahead with the readings" establishes context.
- Family members in the room. If a parent is answering questions for a child, or an adult child is providing history for an elderly patient, the AI will pick up on contextual cues. The system can identify family members as a distinct speaker role.
Recording duration
The average recording across Doctora practices runs about 12 minutes. You do not need to record the entire patient visit--just the clinically relevant portions.
Start recording when the clinical conversation begins (chief complaint, history-taking). Pause or stop during extended non-clinical segments (fitting frames, discussing insurance). Resume for the exam, assessment, and plan discussion.
Longer recordings are not inherently worse, but they increase the amount of non-clinical audio the AI has to filter through. If your notes are coming back with irrelevant content, shorter and more focused recordings usually fix that.
Background noise
The transcription engine (Deepgram Nova-3 Medical) is trained to handle typical clinical environments, but persistent loud noise degrades accuracy. Common culprits:
- HVAC vents directly above the exam chair
- Phoropter and autorefractor motors during operation
- Hallway noise from an open exam room door
- Music playing in the office
You do not need a silent room. Normal office sounds are fine. But if you notice accuracy dropping, check whether something in the environment changed--a new fan, a louder piece of equipment, or a door that used to stay closed.
Common issues and fixes
"The AI missed something I said." The most common cause is speaking too quickly or too quietly during a specific finding. The AI captures the vast majority of natural speech, but mumbled or rapid-fire observations can be lost. Try narrating key findings at your normal conversational volume. If background noise is the issue, repositioning the microphone closer to you usually resolves it.
"It attributed my words to the patient." This is a diarization issue. It happens most often when the microphone is equidistant from both speakers, making it harder to distinguish voices. Consistent microphone positioning helps--if the mic is always closer to you, the system learns the volume and tonal difference between your voice and the patient's. Avoiding overlapping speech is the single most effective fix.
"The output is too verbose" or "too brief." This is not a transcription problem--it is an extraction preference. Use custom instructions to control how much detail the AI includes in each section. For example, you can instruct the anterior segment section to "use brief normal notation, no full sentences" or tell the plan section to "include specific medication dosing and frequency." See the Custom Instructions guide for setup details.
"The transcript looks fine but the note is wrong." If the raw transcript is accurate but the structured note has errors, the issue is in the AI extraction step, not the recording. Custom instructions are the right tool here--they let you correct systematic patterns without changing how you speak.