← impactmojo.in ✦ Premium Tool

VaniScribe

Voice to text for Indian language field research
Transcribe interviews, FGDs, and field recordings in 22 Indian languages. Powered by Sarvam AI's Saaras v3 model with code-mixed speech recognition.
API Key
VaniScribe uses Sarvam AI's speech-to-text API. You bring your own key. ImpactMojo does not charge for this tool.
How to get your free API key:
1. Go to dashboard.sarvam.ai and create an account
2. Navigate to API Keys and generate a new key
3. Copy the key (starts with sk_) and paste it below
What does it cost?
Every new Sarvam account gets ₹1,000 in free credits (roughly 33 hours of transcription). After that, speech-to-text costs ₹30/hour of audio. A typical 45-minute interview costs about ₹23 (~$0.27). See Sarvam's pricing page for current rates.
Your key stays in your browser and is never sent to ImpactMojo. It is only used to call Sarvam's API directly.
Transcription Settings
Choose how your audio should be transcribed. Each mode produces different output from the same recording.
Translate
English output
Transcribe
Native script
Translit
Romanized
Codemix
Mixed scripts
Verbatim
Raw with fillers
✦ Example input: "mujhe lagta hai ki yeh policy bahut important hai for the poor families"
I think this policy is very important for the poor families
Audio Input
Upload a recording or use your microphone. Supports MP3, WAV, AAC, OGG, FLAC, M4A, WebM. Files longer than 30 seconds are automatically split into chunks and stitched back together.
Note on speaker labels: This web app uses the REST API, which does not support speaker diarization (labelling who said what). For interviews and FGDs where you need speaker turns, use the Colab Notebook below.
📁
Drag and drop audio files here, or click to browse
or record directly
0:00
Preparing...
Transcripts
When to Use What
VaniScribe has two interfaces. This web app is for quick work; the Colab Notebook is for full-scale transcription with speaker labels.
🌐
This Web App
Quick transcription of short recordings
Single-speaker field notes or memos
Recording directly from your microphone
Pilot testing before fieldwork
Sharing with RAs who are not comfortable with code
No speaker diarization
Browser chunking is less reliable for very large files
📓
Colab Notebook
Speaker diarization (who said what)
Long interviews and FGDs up to 60 minutes
Batch processing up to 20 files at once
Timestamped, turn-by-turn transcripts
CSV structured for NVivo, Atlas.ti, Dedoose
Handles noisy telephony audio robustly
Requires Google Colab (runs Python, but no coding needed)
✦ The Colab Notebook
Use this for your main transcription work. It uses Sarvam's Batch API with the Saaras v3 model, which supports speaker diarization (up to 8 speakers), handles files up to 60 minutes, and produces structured output with timestamps. This is what you want for interviews and focus group discussions.
Output looks like this:
[00:00:12] Speaker 1: I think this policy is very important for the poor families
[00:00:18] Speaker 2: But what about the implementation challenges?
[00:00:25] Speaker 1: That is exactly the problem, the block-level officers are not trained
The notebook produces both readable TXT files and a combined CSV with columns for file, speaker, start_time, end_time, text, and language.
Recommended workflow: run the same audio twice. First with translate for your primary analysis, then with translit as a reference to check nuance in the original language.
Open the downloaded file in Google Colab via File → Upload Notebook.