API Key
VaniScribe uses Sarvam AI's speech-to-text API. You bring your own key. ImpactMojo does not charge for this tool.
How to get your free API key:
1. Go to dashboard.sarvam.ai and create an account
2. Navigate to API Keys and generate a new key
3. Copy the key (starts with
1. Go to dashboard.sarvam.ai and create an account
2. Navigate to API Keys and generate a new key
3. Copy the key (starts with
sk_) and paste it below
What does it cost?
Every new Sarvam account gets ₹1,000 in free credits (roughly 33 hours of transcription). After that, speech-to-text costs ₹30/hour of audio. A typical 45-minute interview costs about ₹23 (~$0.27). See Sarvam's pricing page for current rates.
Every new Sarvam account gets ₹1,000 in free credits (roughly 33 hours of transcription). After that, speech-to-text costs ₹30/hour of audio. A typical 45-minute interview costs about ₹23 (~$0.27). See Sarvam's pricing page for current rates.
Your key stays in your browser and is never sent to ImpactMojo. It is only used to call Sarvam's API directly.
Transcription Settings
Choose how your audio should be transcribed. Each mode produces different output from the same recording.
Translate
English output
Transcribe
Native script
Translit
Romanized
Codemix
Mixed scripts
Verbatim
Raw with fillers
✦ Example input: "mujhe lagta hai ki yeh policy bahut important hai for the poor families"
I think this policy is very important for the poor families
Audio Input
Upload a recording or use your microphone. Supports MP3, WAV, AAC, OGG, FLAC, M4A, WebM. Files longer than 30 seconds are automatically split into chunks and stitched back together.
Note on speaker labels: This web app uses the REST API, which does not support speaker diarization (labelling who said what). For interviews and FGDs where you need speaker turns, use the Colab Notebook below.
Drag and drop audio files here, or click to browse
or record directly
0:00
Preparing...
Transcripts
When to Use What
VaniScribe has two interfaces. This web app is for quick work; the Colab Notebook is for full-scale transcription with speaker labels.
This Web App
✓ Quick transcription of short recordings
✓ Single-speaker field notes or memos
✓ Recording directly from your microphone
✓ Pilot testing before fieldwork
✓ Sharing with RAs who are not comfortable with code
⚠ No speaker diarization
⚠ Browser chunking is less reliable for very large files
Colab Notebook
✓ Speaker diarization (who said what)
✓ Long interviews and FGDs up to 60 minutes
✓ Batch processing up to 20 files at once
✓ Timestamped, turn-by-turn transcripts
✓ CSV structured for NVivo, Atlas.ti, Dedoose
✓ Handles noisy telephony audio robustly
⚠ Requires Google Colab (runs Python, but no coding needed)
✦ The Colab Notebook
Use this for your main transcription work. It uses Sarvam's Batch API with the Saaras v3 model, which supports speaker diarization (up to 8 speakers), handles files up to 60 minutes, and produces structured output with timestamps. This is what you want for interviews and focus group discussions.
Output looks like this:
[00:00:12] Speaker 1: I think this policy is very important for the poor families
[00:00:18] Speaker 2: But what about the implementation challenges?
[00:00:25] Speaker 1: That is exactly the problem, the block-level officers are not trained
The notebook produces both readable TXT files and a combined CSV with columns for
file, speaker, start_time, end_time, text, and language.
Recommended workflow: run the same audio twice. First with translate for your primary analysis, then with translit as a reference to check nuance in the original language.
Open the downloaded file in Google Colab via File → Upload Notebook.