Hardcoded to soft subtitles

Learning Chinese - Cantonese

Hello, all!

Whoever is learning Cantonese and wants to use YouTube as a source for materials is undoubtfully aware of the fact that many (if not most) of the videos have hadrcoded subtitles burned onto the video, rather than soft (CC) subtitles which can be easily extracted.

After a bit of a research, I managed to find a solution how to extract the hardcoded subtitles in order to put them in a .txt file and use for reading along with listening to the audio here on LanguageCrush. Here is my method:

I used the following apps: https://cobalt.tools/ (to download the video. Of course it could be any other video downloading app) 

VideoSubFinder (to rip the subtitles as separate image files as described in this video - https://youtu.be/y7gbDMMsLTg?si=jROzUBGoxKzDYSZE

ABBYY FineReader (to convert the image files into text and combine the text into a single .txt file)

As the video I used has both Cantonese and English subtitles, I did two separate extractions in order to put a translation in the passage as well. The final result you can find here : https://languagecrush.com/reading/course/1888 

The series has a total of 10 videos which I will eventually turn into pasages and upload here as part of the Easy Cantonese course I created.

Overall, I will be doing my best to help increase the Cantonese database here on the site :)

Happy learning!

Posts1726Likes1136Joined18/3/2018LocationBellingham / US
Learning Italian
Other Chinese - Mandarin, French, German, Japanese, Korean, Portuguese, Russian, Spanish, Swahili, Tagalog, Thai

Interesting technique - I wondered when extracting hard subs would finally become "trivial". Depending on your definition of trivial, that day may be here already. Incidentally, I think you can first imbed the video into a passage by using the youtube tool here, then replace the bad/missing subtitles with the ones you are creating, if having the video embedded is important to you.

Edit - I take that back. I see he's completely disabled subtitles, which foils our tool.

Learning German every day!
