Papers - Audio
updated
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper
• 2310.00704
• Published
• 21
Structural Similarities Between Language Models and Neural Response
Measurements
Paper
• 2306.01930
• Published
• 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper
• 2006.14941
• Published
• 2
NU-GAN: High resolution neural upsampling with GAN
Paper
• 2010.11362
• Published
• 2
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper
• 2403.10493
• Published
• 18
A Multimodal Approach to Device-Directed Speech Detection with Large
Language Models
Paper
• 2403.14438
• Published
• 2
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
Predictions
Paper
• 1712.05884
• Published
• 3
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper
• 2403.16973
• Published
• 3
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper
• 2401.04577
• Published
• 45
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper
• 2404.00656
• Published
• 11
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
for Text-to-Speech Synthesis
Paper
• 2404.03204
• Published
• 10
Qwen-Audio: Advancing Universal Audio Understanding via Unified
Large-Scale Audio-Language Models
Paper
• 2311.07919
• Published
• 10
Custom Data Augmentation for low resource ASR using Bark and
Retrieval-Based Voice Conversion
Paper
• 2311.14836
• Published
• 2
MuPT: A Generative Symbolic Music Pretrained Transformer
Paper
• 2404.06393
• Published
• 16
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper
• 2404.07616
• Published
• 16
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
Direct Preference Optimization
Paper
• 2404.09956
• Published
• 12
Long-form music generation with latent diffusion
Paper
• 2404.10301
• Published
• 27
Paper
• 2404.13358
• Published
• 14
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
Sound
Paper
• 2405.00233
• Published
• 17
LLM-AD: Large Language Model based Audio Description System
Paper
• 2405.00983
• Published
• 22
Images that Sound: Composing Images and Sounds on a Single Canvas
Paper
• 2405.12221
• Published
• 1
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
• 2406.08407
• Published
• 28
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized
Sounds
Paper
• 2407.01494
• Published
• 15
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of
Audio Events in Text-to-audio Generation
Paper
• 2407.02869
• Published
• 21
FunAudioLLM: Voice Understanding and Generation Foundation Models for
Natural Interaction Between Humans and LLMs
Paper
• 2407.04051
• Published
• 40
Qwen2-Audio Technical Report
Paper
• 2407.10759
• Published
• 64
Audio Conditioning for Music Generation via Discrete Bottleneck Features
Paper
• 2407.12563
• Published
• 7
Facing the Music: Tackling Singing Voice Separation in Cinematic Audio
Source Separation
Paper
• 2408.03588
• Published
• 8
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual
Dexterous Robot Hands
Paper
• 2408.11048
• Published
• 4
Foundation Models for Music: A Survey
Paper
• 2408.14340
• Published
• 44
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio
Language Modeling
Paper
• 2408.16532
• Published
• 50
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
Representations
Paper
• 2006.11477
• Published
• 9