This is really cool. I think what I really wanna see though is a full multimodal... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Serenacula 60 days ago \| parent \| context \| favorite \| on: Nvidia PersonaPlex 7B on Apple Silicon: Full-Duple... This is really cool. I think what I really wanna see though is a full multimodal Text and Speech model, that can dynamically handle tasks like looking up facts or using text-based tools while maintaining the conversation with you.

sigmoid10 60 days ago [–]

OpenAI has been offering this for a while now, featuring text and raw audio input+output and even function calling. Google and xAI also offer similar models by now, only Anthropic still relies on TTS/STT engine intermediates. Unfortunately the open-weight front is still lagging behind on this kind of model.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact