Analysis of MP3 Bitrate on the Accuracy of Academic Audio Transcription Using Whisper large-v3

Selta Jaya Putra; Ardi  Wijaya; RG. Guntur  Alam

doi:10.37396/jsc.v8i2.528

Selta Jaya Putra Universitas Muhammadiyah Bengkulu
Ardi Wijaya Muhammadiyah University of Bengkulu
RG. Guntur Alam Muhammadiyah University of Bengkulu

DOI: https://doi.org/10.37396/jsc.v8i2.528

Keywords: Whisper, bitrate, MP3, audio transcription, WER

Abstract

In the digital era, automatic transcription is a crucial solution for converting audio content into text, especially in the context of academic documentation. The main challenge in this process is transcription accuracy, which can be affected by the quality of the audio file, including its bitrate and file size. This study aims to analyze the impact of MP3 bitrate and file size on transcription accuracy using the Whisper large-v3 model. Five academic audio files were converted into five different bitrate levels, ranging from 64 kbps to 320 kbps, and then transcribed automatically using the Whisper model. Evaluation was conducted by calculating the Word Error Rate (WER) as an indicator of transcription accuracy. In addition, processing time and file size were recorded to analyze transcription efficiency. The results show that increasing bitrate does not always lead to higher accuracy. Bitrates of 128–192 kbps provided the best balance between transcription accuracy, processing efficiency, and file size. This study makes a significant contribution to the development of automatic transcription systems based on ASR models, particularly for audio documentation needs in educational institutions. These findings serve as a technical reference for developing efficient and accurate audio documentation systems in academic environments.

Downloads

Download data is not yet available.

References

S. Zakiya, K. Nisa, and D. Darmawan, “Pengaruh Media Pembelajaran Terhadap Hasil Belajar Siswa Setingkat Sekolah Dasar,” vol. 04, no. 01, p. 2025, 2025, doi: 10.9000/jpt.v4i1.2087.g519.

M. I. Rizal, M. Yusron, M. El-Yunusi, and D. Darmawan, “Literasi Digital, Pemanfaatan Media Pembelajaran dan Kemandirian Belajar: Kontribusinya terhadap Prestasi Akademik di SMA Intensif Taruna Pembangunan Surabaya,” Journal of Basic Educational Studies, vol. 4, no. 2, p. 905, 2024.

J. Mutaqin, I. Amirudin, R. Rizky, S. Fauziah, and far Amirudin, “PENDIDIKAN DI ERA REVOLUSI INDUSTRI 4.0: TANTANGAN DAN SOLUSI EDUCATION IN THE ERA OF THE FOURTH INDUSTRIAL REVOLUTION: CHALLENGES AND SOLUTIONS”, [Online]. Available: https://jicnusantara.com/index.php/jiic

Z. Chik, A. H. Abdullah, A. Zawawi, M. Noor, and S. Ismail, “Journal of Economics, Finance and Management Studies Impact of Industrial Revolution 4.0 (IR4.0) Knowledge, Application Learning, University Policy, Commitment to Study and Motivation on Assimilate IR4.0 in Education”, doi: 10.47191/jefms/v7-i7-06.

H. Maulida, E. Putry, V. Nuzulul ’adila, R. Sholeha, and D. Hilmi, “VIDEO BASED LEARNING SEBAGAI TREN MEDIA PEMBELAJARAN DI ERA 4.0.”

D. Akademi, K. Pelamonia, B. Rahmat, and ] Darmiati, “Pengembangan Media Pembelajaran dengan Video Based Learning,” 2021.

S. Osmani and D. Tartari, “The Impact of Digital Technology on Learning and Teaching: A Case Study of Schools in Durrës, Albania,” Journal of Educational and Social Research, vol. 14, no. 6, pp. 193–209, Nov. 2024, doi: 10.36941/jesr-2024-0165.

K. Amin, L. Elvitaria, and L. Trisnawati, “Jurnal Politeknik Caltex Riau Artificial Intelligence Automatic Speech Recognition (ASR) untuk pencarian potongan ayat Al-Qu’ran,” 2022. [Online]. Available: https://jurnal.pcr.ac.id/index.php/jkt/

B. H. Juang and L. R. Rabiner, “Automatic Speech Recognition-A Brief History of the Technology Development,” 2004. [Online]. Available: http://www.recording-history.org/

A. K. Katuri, S. Salugu, G. Tharuni, and C. S. Gouri, “Conversion of Acoustic Signal (Speech) Into Text By Digital Filter using Natural Language Processing,” Int J Eng Adv Technol, vol. 12, no. 1, pp. 14–18, Oct. 2022, doi: 10.35940/ijeat.A3802.1012122.

H. Wijaya, “Teknologi Pengenalan Suara tentang Metode, Bahasa dan Tantangan: Systematic Literature Review,” bit-Tech, vol. 7, no. 2, pp. 533–544, Dec. 2024, doi: 10.32877/bt.v7i2.1888.

V. Chemudupati et al., “On the Transferability of Whisper-based Representations for ‘In-the-Wild’ Cross-Task Downstream Speech Applications,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14546

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. Mcleavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” in Proceedings of the 40th International Conference on Machine Learning, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, Eds., in Proceedings of Machine Learning Research, vol. 202. PMLR, May 2023, pp. 28492–28518. [Online]. Available: https://proceedings.mlr.press/v202/radford23a.html

A. Pratiwi, “Perancangan Aplikasi Kompresi File Audio Dengan Menerapkan Algoritma Additive Code,” Journal Global Tecnology Computer, vol. 1, no. 3, pp. 92–100, 2022.

R. S. A. Pratama and A. Amrullah, “ANALYSIS OF WHISPER AUTOMATIC SPEECH RECOGNITION PERFORMANCE ON LOW RESOURCE LANGUAGE,” Jurnal Pilar Nusa Mandiri, vol. 20, no. 1, pp. 1–8, Mar. 2024, doi: 10.33480/pilar.v20i1.4633.

Prof. S. H. Chaflekar et al., “YouTube Transcript Summarizer using Natural Language Processing,” International Journal of Advanced Research in Science, Communication and Technology, no. 1, pp. 108–113, Apr. 2022, doi: 10.48175/ijarsct-3034.

R. F. Khoiroh et al., “Implementasi Speech Recognition Whisper pada Debat Calon Wakil Presiden Republik Indonesia,” Jul. 2024.

Y. Gong, S. Khurana, L. Karlinsky, and J. Glass, “Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers,” Jul. 2023, doi: 10.21437/Interspeech.2023-2193.

I. Sonata, “Automatic Speech Recognition in Indonesian Using the Transformer Model,” in 2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS), 2023, pp. 263–266. doi: 10.1109/ICIMCIS60089.2023.10349042.

M. F. Fadlilah, A. R. Atmadja, and M. D. Firdaus, “Pemanfaatan Transformer untuk Peringkasan Teks: Studi Kasus pada Transkripsi Video Pembelajaran,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 3, pp. 2111–2119, Dec. 2024, doi: 10.47065/bits.v6i3.6342.

D. Macháček, R. Dabre, and O. Bojar, “Turning Whisper into Real-Time Transcription System,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.14743

H. Sajati and A. Pujiastuti, “THE AUDIO VIDEO OF WEB-BASED COMPRESSION WITH FFMPEG,” 2018.

S. Wang, C.-H. H. Yang, J. Wu, and C. Zhang, “Can Whisper perform speech-based in-context learning?,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.07081

D. K. Gete et al., “Whispering in Amharic: Fine-tuning Whisper for Low-resource Language,” Mar. 2025, [Online]. Available: http://arxiv.org/abs/2503.18485

D. Ferdiansyah, C. Sri Kusuma Aditya, J. Raya Tlogomas No, K. Lowokwaru, K. Malang, and J. Timur, “Implementasi Automatic Speech Recognition Bacaan Al-Qur’an Menggunakan Metode Wav2Vec 2.0 dan OpenAI-Whisper.” [Online]. Available: https://journal.trunojoyo.ac.id/triac

J. Bellver-Soler et al., “Multimodal Audio-Language Model for Speech Emotion Recognition.”

A. Loubser, P. De Villiers, and A. De Freitas, “End-to-end automated speech recognition using a character based small scale transformer architecture,” Expert Syst Appl, vol. 252, Oct. 2024, doi: 10.1016/j.eswa.2024.124119.

M. Ihsanudin Syaifullah, “Penerapan Teknologi Automatic Speech Recognition Menggunakan Model Wav2vec2.0 Sebagai Alat Bantu Untuk Mendeteksi Kesalahan Dalam Membaca Al-Qur’an Berbasis Mobile,” JSI : Jurnal Sistem Informasi (E-Journal, vol. 16, no. 2, p. 2024.

A. NOERCHOLIS, T. DWIANDINI, and F. S. MUKTI, “Optimasi Teknologi WAV2Vec 2.0 menggunakan Spectral Masking untuk meningkatkan Kualitas Transkripsi Teks Video bagi Tuna Rungu,” ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika, vol. 12, no. 4, p. 877, Dec. 2024, doi: 10.26760/elkomika.v12i4.877.

S. Seyedi et al., “Using HIPAA (Health Insurance Portability and Accountability Act)–Compliant Transcription Services for Virtual Psychiatric Interviews: Pilot Comparison Study,” JMIR Ment Health, vol. 10, no. 1, 2023, doi: 10.2196/48517.

	Editorial Team
	Reviewers Acknowledgement
	Focus and Scope
	Publication Ethics
	Online Submissions
	Author Guidelines
	Revision Guidelines
	Template Paper
	Call Papers
	Citedness in Scopus
	APC
	Journal History
	Indexing
	Contact Us

Analysis of MP3 Bitrate on the Accuracy of Academic Audio Transcription Using Whisper large-v3

Abstract

Downloads

References

Partnership