
I have been working on an automatic subtitle evaluation project recently and have had to compare the output of different speech recognition and machine translation providers.
When time, effort or volume does not allow you to use a professional subtitler for an in-depth evaluation of the output, but you simply want to quickly check which output it best, what do you do? You use an edit distance metric of course, comparing the automated subtitles to a reference file that has been professionally created and checking how many differences there are between the two.
There are several edit distance metrics available for checking regular text, but subtitles are more than that. The subtitle text also needs to be presented in a certain format, segmented in a way that makes sense and improves readability, and also timed in sync with the corresponding video and audio that it complements.
SubER is the only metric that does just that. It was developed by two brilliant AppTek scientists, Patrick Wilken and Evgeny Matusov, with a bit of prompting by me about the special nature of subtitles and feedback from several professional subtitlers. It was first presented as a paper at the 2022 IWSLT conference, and has since been adopted as the conference’s primary metric for its subtitling track.
The code for SubER is released as part of an open-source subtitle evaluation toolkit to encourage its use by both the research community as well as the media industry and promote further research in automatic subtitling systems.

Read more about it in this blog post: https://www.apptek.ai/post/understanding-the-suber-metric-a-standard-for-automatically-evaluating-subtitle-quality

Or read the full paper: https://arxiv.org/abs/2205.05805

Access the code: https://github.com/apptek/SubER

And tell me how much you like it in the comments!