VaniScribe — ImpactMojo Premium

API Key

VaniScribe uses Sarvam AI's speech-to-text API. You bring your own key. ImpactMojo does not charge for this tool.

How to get your free API key:
1. Go to dashboard.sarvam.ai and create an account
2. Navigate to API Keys and generate a new key
3. Copy the key (starts with sk_) and paste it below

What does it cost?
Every new Sarvam account gets ₹1,000 in free credits (roughly 33 hours of transcription). After that, speech-to-text costs ₹30/hour of audio. A typical 45-minute interview costs about ₹23 (~$0.27). See Sarvam's pricing page for current rates.

Your Sarvam AI API Key

Your key stays in your browser and is never sent to ImpactMojo. It is only used to call Sarvam's API directly.

Transcription Settings

Choose how your audio should be transcribed. Each mode produces different output from the same recording.

Output Mode

Translate

English output

Transcribe

Native script

Translit

Romanized

Codemix

Mixed scripts

Verbatim

Raw with fillers

✦ Example input: "mujhe lagta hai ki yeh policy bahut important hai for the poor families"

I think this policy is very important for the poor families

Language

Model

Audio Input

Upload a recording or use your microphone. Supports MP3, WAV, AAC, OGG, FLAC, M4A, WebM. Files longer than 30 seconds are automatically split into chunks and stitched back together.

Note on speaker labels: This web app uses the REST API, which does not support speaker diarization (labelling who said what). For interviews and FGDs where you need speaker turns, use the Colab Notebook below.

📁

Drag and drop audio files here, or click to browse

or record directly

0:00

Preparing...

When to Use What

VaniScribe has two interfaces. This web app is for quick work; the Colab Notebook is for full-scale transcription with speaker labels.

🌐

This Web App

✓ Quick transcription of short recordings

✓ Single-speaker field notes or memos

✓ Recording directly from your microphone

✓ Pilot testing before fieldwork

✓ Sharing with RAs who are not comfortable with code

⚠ No speaker diarization

⚠ Browser chunking is less reliable for very large files

📓

Colab Notebook

✓ Speaker diarization (who said what)

✓ Long interviews and FGDs up to 60 minutes

✓ Batch processing up to 20 files at once

✓ Timestamped, turn-by-turn transcripts

✓ CSV structured for NVivo, Atlas.ti, Dedoose

✓ Handles noisy telephony audio robustly

⚠ Requires Google Colab (runs Python, but no coding needed)

✦ The Colab Notebook

Use this for your main transcription work. It uses Sarvam's Batch API with the Saaras v3 model, which supports speaker diarization (up to 8 speakers), handles files up to 60 minutes, and produces structured output with timestamps. This is what you want for interviews and focus group discussions.

Output looks like this:

[00:00:12] Speaker 1: I think this policy is very important for the poor families

[00:00:18] Speaker 2: But what about the implementation challenges?

[00:00:25] Speaker 1: That is exactly the problem, the block-level officers are not trained

The notebook produces both readable TXT files and a combined CSV with columns for file, speaker, start_time, end_time, text, and language.

Recommended workflow: run the same audio twice. First with translate for your primary analysis, then with translit as a reference to check nuance in the original language.

Open the downloaded file in Google Colab via File → Upload Notebook.