Get the duration of audio input in torchaudio
When working with torchaudio
for audio-oriented neural network implementations, one of the very first things we do is to get the metadata of the audio input using torchaudio.info
.
For example, here's to find the metadata of an audio input:
metadata = torchaudio.info(audio_path_or_url_or_file_like_obj)
Let's check an example output:
print(metadata) AudioMetaData(sample_rate=44100, num_frames=12700800, num_channels=2, bits_per_sample=0, encoding=MP3)
So this is an MP3-encoded (lossy compression) audio with 2 channels (stereo), 12700800 frames/samples per channel, and the sample rate is 44100 Hz/44.1 kHz (CD quality).
To get the duration of this audio input (in seconds), we can divide the number of frames/samples (for a channel) by the sample rate i.e.:
import math math.ceil(metadata.num_frames / metadata.sample_rate)
Returns:
288
To get a minutes/seconds answer, we can get the quotient and remainder by dividing the duration in seconds by 60; perfect use case for divmod
:
divmod(math.ceil(metadata.num_frames / metadata.sample_rate), 60)
Returns:
(4, 48)
So, the audio input has a duration of 4 minutes and 48 seconds.
Comments
Comments powered by Disqus