I got tired of juggling three or four different sites every time I needed to download a video and grab the transcript. TikTok downloaders are plastered with fake buttons. YouTube converters redirect you through five pages. And actual transcription costs money.
So I built Videolyti over a few months. You paste a URL from YouTube, TikTok, Instagram, Twitter, Facebook, Reddit, or Vimeo — it gives you the video file and a text transcript.
The transcription runs OpenAI Whisper (large-v3) on my own server. No API calls to OpenAI, no per-minute billing. It handles 90+ languages and does a surprisingly good job with mixed-language audio (I test it regularly with Ukrainian-English conversations).
Tech details for those curious:
- Frontend: Next.js (App Router, server components)
- Backend: Express + Socket.IO for real-time progress
- Downloads: yt-dlp + FFmpeg
- Transcription: Whisper large-v3, running locally
- The TikTok pipeline is a bit different — uses TikWM API first, falls back to yt-dlp
The 5 downloads/day limit is a practical thing — Whisper on CPU takes real compute time and I'm paying for the server out of pocket. Not a growth hack.
Feedback welcome on the UX. I know the mobile experience could be better (30% of traffic is mobile right now). Would also love input on what subtitle export formats would be most useful — SRT, VTT, or plain text with timestamps.
Source code isn't open yet but I'm considering it. Happy to discuss the architecture.