Hello, all!
Whoever is learning Cantonese and wants to use YouTube as a source for materials is undoubtfully aware of the fact that many (if not most) of the videos have hadrcoded subtitles burned onto the video, rather than soft (CC) subtitles which can be easily extracted.
After a bit of a research, I managed to find a solution how to extract the hardcoded subtitles in order to put them in a .txt file and use for reading along with listening to the audio here on LanguageCrush. Here is my method:
I used the following apps: https://cobalt.tools/ (to download the video. Of course it could be any other video downloading app)
VideoSubFinder (to rip the subtitles as separate image files as described in this video - https://youtu.be/y7gbDMMsLTg?si=jROzUBGoxKzDYSZE)
ABBYY FineReader (to convert the image files into text and combine the text into a single .txt file)
As the video I used has both Cantonese and English subtitles, I did two separate extractions in order to put a translation in the passage as well. The final result you can find here : https://languagecrush.com/reading/course/1888
The series has a total of 10 videos which I will eventually turn into pasages and upload here as part of the Easy Cantonese course I created.
Overall, I will be doing my best to help increase the Cantonese database here on the site :)
Happy learning!