微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何从字幕中提取文本?在python中

如何解决如何从字幕中提取文本?在python中

我要转换这个:

1
00:00:01,710 --> 00:00:03,830
Now react came out in 2013.

2
00:00:03,840 --> 00:00:07,890
But what do we have before then before we act.

3
00:00:07,890 --> 00:00:15,040
Well the front fronting landscape was very different initially back in the 90s and early 2000s.

像这样:

thisdict = {
  "1": "Now react came out in 2013.","1time": '00:00:01,830'
}

有人可以帮忙吗?

解决方法

你的意思是这样吗?

with open('subtitle.srt') as file:
    subtitle = file.readlines()
    
    sub_list = [subtitle[i : i+4] for i in range(0,len(subtitle),4)]
    
    this_dict = {}
    
    for item in sub_list:
        number = item[0].strip('\n')
        this_dict[number] = item[2].strip('\n')
        this_dict[f"{number}time"] = item[1].strip('\n')
        
    print(this_dict)

输出:

{'1': 'Now react came out in 2013.','1time': '00:00:01,710 --> 00:00:03,830','2': 'But what do we have before then before we act.','2time': '00:00:03,840 --> 00:00:07,890','3': 'Well the front fronting landscape was very different initially back in the 90s and early 2000s.','3time': '00:00:07,890 --> 00:00:15,040'}

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。