Guide:Audio Formats

From VNDev Wiki

The aim of this guide is to provide information on what audio formats to use for your visual novel. This guide is based on the Visual;Conference talk I gave in 2022. You can watch the full talk here: https://www.youtube.com/watch?v=vUla74Vf-us

Intro - Ren’Py and audio codecs

Note that I mainly used Ren’Py as a reference since it’s a fairly popular VN engines and the one I’ve had the most experience with. These points will likely also apply to other engines since the file types I’ll be mentioning are common, but make sure to read your engine’s documentation to make sure.

So, what codecs does Ren’Py support? According to the documentation[1]

  • Opus
  • Ogg Vorbis
  • MP3
  • MP2
  • FLAC
  • WAV

On a previous project 24- and 32-bit .wav files files did sometimes successfully play back, but I couldn’t get them to work while preparing this essay. They might be fully unsupported now. A bud of mine (shoutout to Alfred) also found an old Ren’Py documentation that lead back to when SDL_mixer compliant audio file formats were theoretically supported, which included .midi and .mod files.[2] It would be interesting to see these file formats make a return in Ren’Py, though I can imagine that they might not be relevant for many Ren’Py devs.

You can categorize audio files depending on two factors: Whether they’re compressed or uncompressed and whether they’re lossy or lossless.

Compressed Uncompressed
Lossy MP3 (.mp3), Ogg Vorbis (.ogg), Opus (.opus)
Lossless FLAC (.flac) PCM (.wav)

Audio encoding formats with their usual containers/file endings in brackets. For more information see: https://producerhive.com/ask-the-hive/audio-file-formats-explained

Since most are familiar with the file endings of audio formats when talking about them, I’ll use those throughout this essay. This means that I’ll use the wording ".wav files" instead of "PCM encoded files" or ".opus files" instead of "opus encoded files". But keep in mind that opus encoded files, for example, can use the ogg container and have the .ogg file ending nevertheless, even if the .opus file ending is recommended.

PCM (.wav) and FLAC (.flac)

.wav and .flac files are both lossless. The difference is that .flac files are compressed, meaning they take up only about half the space. On top of that it’s easier to add metadata to .flac files, like for example the name of the composer or the name of the piece. Sounds like a pretty big advantage, right? Does that mean you should start using .flac files instead of .wav files for your game? Well, it’s unfortunately not quite that easy. The problem with compressed audio files in general is that they labor the CPU more than uncompressed audio files. We’ll get to the implications of CPU labor later, but for now you should know that .flac files aren’t a good choice for your game’s audio. They’re bigger than lossy file formats and don’t save as much CPU power as .wav files. What they’re great for, though, is archiving. For example: You might need to edit their volume because players are complaining that a sound effect or piece of music is too loud. You can edit your archived .flac file without relying on your audio person to still have the file backed up. Editing a lossy audio file like .ogg or .mp3 and then re-exporting it in a lossy file format lowers its quality. So you generally want to use lossless files if you need to edit an audio file.

Wav Flac
Takes up more memory/disk space. Takes up less memory/disk space.
Less metadata friendly. Metadata friendly.
More CPU friendly (because uncompressed). Less CPU friendly (because compressed).
Can be edited without loosing quality (because lossless). Can be edited without loosing quality (because lossless).
-> Can be used for games if CPU labor is a factor. -> Not great for games but great for archiving since they're smaller than .wav files and handle metadata better.

This is a bit complicated by one article’s mention that .flac files might not be ideal for editing due to their limit on bit depth and sample rate,[3] though many seem to agree that it doesn’t really matter.[4] Another factor is that applying effects, like for example Audacity’s "Amplify", which increases or reduces the volume of a selected section, isn’t a lossless procedure. But the loss is apparently (at least in the case of Audacity) negligible.[5]

MP3 (.mp3)

Many people new to game development usually go straight for .mp3 files since it’s the file format they are used to. You might have heard of licensing fees being a problem. But the license for it hasn’t been renewed, meaning that you don’t have to pay any fees anymore.[6] I still think there’s no reason to use .mp3 files except if, for example, .opus or .ogg files aren’t supported.

.mp3s oftentimes don’t loop seamlessly because encoders usually add a bit of silence to the beginning. Even if you load the .mp3 file into the audio editor Audacity, you might not see that silence since Audacity automatically hides it. If you open it in an audio editor that doesn’t ignore the silent gap or if you play it back in Ren’Py, you’ll hear that gap and it will usually not sound great if your piece of music or ambiance is supposed to loop seamlessly.

How the beginning of an .mp3 file is displayed in Audacity (above) vs. how it actually looks in a DAW (Cubase, below).

Some decoders can skip this gap. This is called gapless playback. But as far as I know Ren’Py doesn’t support that feature.

The PS1 had to factor in seek speed for recorded audio (as opposed to its MIDI system), which meant that the laser had to seek the beginning of an audio file as soon as it was done playing to loop it. This introduced some delay (similar to the gap that is inserted at the beginning of mp3 files). Many PS1 soundtracks like Spyro or Megaman X3 faded the audio out at the end since they couldn’t reliably use seamless loops because of that.

Ogg Vorbis (.ogg) and Opus (.opus)

Many of you likely know of .ogg files (which are usually vorbis encoded, also called ogg vorbis). This is usually the go to file format for Ren’Py devs when it comes to music from my experience. It doesn’t take up much space and supports seamless looping. Why would you not want to use .ogg? Let me introduce opus.

Opus is essentially the successor of ogg vorbis. While it can use an ogg container, meaning it can have the .ogg file ending, the file ending that is recommended to be used for opus encoded files is .opus. This is also the file ending you’ll see if you’re trying to export the file in Audacity.

.opus files are quite a bit smaller than .ogg files and have better quality preservation at lower settings. Sounds like they’re essentially better than .ogg files, which is also what the FAQ on the Xiph.Org Foundation wiki bluntly states (the Xiph.Org Foundation is the developer of both formats).[7]

The problem is the CPU usage, which is a factor I’ve mentioned at the very beginning. .opus files are usually more CPU intensive than .ogg files.[8]

You can check out Wwise’s audio format documentation for information on WEM Opus, which can make use of hardware acceleration to reduce CPU strain. This might be a Wwise-only format, at least I couldn’t find any information on it outside their site. Ren’Py doesn’t support the audio middleware Wwise yet, so this doesn’t really concern people using Ren’Py.[9]

In the end choosing between .wav, .ogg and .opus is a balancing act between CPU and memory/disk space.

Delay (time gap between the sound being triggered and the sound being played) can also be a factor. A more complicated format like .opus needs longer to decompress, which can cause a bigger delay compared to .ogg files, while .wav files don’t need to be decompressed. Since .wav files are bigger, though, it might take longer to transfer them into memory, again leading to delay, depending on how big the file is. In practice many developers load audio into memory before it is needed and that way reduce the delay between the sound being triggered and the sound being played. Ren’Py can kind of do that with audio queue as far as I know but it’s pretty finicky. So for engines that play back audio on the fly this can lead to a balancing act between the loading times of big files and the decompressing time of compressed files.

Ogg Opus
Small file size. Even smaller file size.
Uses up CPU for decompression. Uses up more CPU for decompression.
Has a slight delay. Can have a bigger delay.
-> Good if .opus files labor the CPU too much or if the delay is too big. -> Great if left doesn't apply.

Simplification

All of that probably sounded complicated. And that is why we’re getting to the "simplify" part of the guide title. A lot of visual novels made in Ren’Py aren’t necessarily CPU intensive and barely any of them use a complicated dynamic audio setup. There oftentimes aren’t multiple SFX playing at the same time. There oftentimes aren’t five or more instrument layers playing simultaneously while the reverb amount is being manipulated in real time. There oftentimes are no audio snippets that are rearranged on the fly to create an ever changing audio texture. So unless your game is otherwise really laboring the CPU, there’s usually no reason not to use .opus files for all your audio. SFX included, even if .wav files are more common for SFX in other kinds of games. You of course can take old PCs into account. It is a reason why almost 3/4 of Titanfall was taken up by uncompressed audio files.[10] But that game is likely more CPU intensive than many visual novels made with Ren’Py.

Uncompressed audio files were sometimes also used out of necessity. RPG Maker 2003 only supported .mp3, .wav and .midi files, for example.[11] For a game like Yume Nikki for example, which was reliant on the music looping seamlessly, this meant that .wav files had to be used to create the seamless loops necessary.


Delay is usually not a big factor either since it doesn’t make that much of a difference whether the SFX are played with a 20ms delay or a 40ms delay. At 60 fps this is a difference between one and a little over two frames. For action games like shooters this might be noticeable, for typical visual novels probably not. A visual novel counter example can be Danganronpa’s class trial scenes for example. Its “normal” visual novel sequences probably wouldn’t have required uncompressed SFX. The class trials on the other hand are way more action heavy, all the visual effects might labor the CPU more and many different sound sources are layered on top of each other, which is likely why Danganronpa uses lossless audio files for its SFX.

This essentially means that unless you’re running into CPU problems or you’re having delay problems, .opus is the way to go for pretty much anything. Music, ambiance, SFX and voice acting. And if you’re running into any problems then .ogg is a safe alternative.

I, of course, did some self experiments with my own game I’m working on at the moment: Mycorrhiza, a horror manga inspired visual novel. You might remember me mentioning elaborate dynamic audio systems. We’re using those for this game. Using .ogg files didn’t particularly result in any problems except for the fact that playing a randomized sequence of eight bar music snippets plus three layers lead to the audio layers desynching. Using .opus files made that problem worse, using .wav files didn’t make it much better. We settled on abusing the queue audio function in combination with .ogg files. If we would have been able to use .opus files instead, we would have been able to save about 50% of disk space. Either way I’d advise you to get playtesters with low end machines to playtest your game to be on the safe side.

When I looked into other visual novels like The Letter, DDLC, Shinrai, VA11 Hall-A, How to Date a Magical Girl, I found that they also all used .ogg files for all their audio, including SFX. Many Nintendo Switch games on the other hand make use of .opus files. I unfortunately couldn’t look into newer, big PC VNs since I only owned older ones. But it’d be interesting to see if the developers had switched to .opus files. If anyone has any info on that, let me know!

Outro

To sum things up:

Wav Flac Ogg Opus
Good if CPU labor or decompression delay are factors. Good audio format for archival/editing since it’s lossless while still being smaller than .wav files and being able to handle metadata better. Good audio format for all in-game audio. Good audio format for all in-game audio.
Big file size. Bigger than .ogg and .opus, smaller than .wav. Small file size. Even smaller than .ogg files.
CPU labor is not a factor. Labors CPU less than .opus files. Labors CPU more than .ogg files.
Doesn't need to be decompressed but can take more time to transfer to memory. Less decompression delay compared to .opus files. Bigger decompression delay compared to .ogg files.

As I mentioned before, most people are familiar with .ogg files. But if you want to be more efficient with the game’s disk space while preserving the audio quality, then upgrading to .opus might be something to think about.

There are a lot of other audio file formats out there. Like .mod or .xm, which are both tracker file formats. Newer games that are retro inspired might use them. .midi files are also theorized to make a comeback. There’s also .hca, which is associated with CriWare’s middleware and which Danganronpa 3 uses. As you can probably tell, the world of video game audio formats is more complicated than one might think. But I hope I was able to simplify it to some degree. If you have any questions, don’t hesitate to contact me!