我正在尝试从下面的文本(.srt字幕文件)中的每个重复集创建对象:
1
00:02:12,446 --> 00:02:14,406
The Hovitos are near.
2
00:02:15,740 --> 00:02:18,076
The poison is still fresh,
three days.
3
00:02:18,076 --> 00:02:19,744
They're following us.
例如,我可以使用三行或四行并将它们分配给新对象的属性。所以对于第一组,我可以Sentence.create(number: 1, time_marker: '00:02:12', content: "The Hovitos are near.")
从script.each_line
开始,还有哪些其他一般结构可能会让我走上正轨?我很难用这个,任何帮助都会很棒!
修改
到目前为止,我所遇到的一些杂乱未完成的代码如下。它确实有效(我认为)。你会采取完全不同的路线吗?我对此没有任何经验。
number = nil
time_marker = nil
content = []
script = script.strip
script.each_line do |line|
line = line.strip
if line =~ /^\d+$/
number = line.to_i
elsif line =~ /-->/
time_marker = line[0..7]
elsif line =~ /^\b\D/
content << line
else
if content.size > 1
content = content.join("\n")
else
content = content[0]
end
Sentence.create(movie: @movie, number: number,
time_marker: time_marker, content: content)
content = []
end
end
答案 0 :(得分:1)
假设字幕位于以下变量中:
subtitles = %q{1
00:02:12,446 --> 00:02:14,406
The Hovitos are near.
2
00:02:15,740 --> 00:02:18,076
The poison is still fresh,
three days.
3
00:02:18,076 --> 00:02:19,744
They're following us.}
然后,你可以这样做:
def split_subs subtitles
grouped, splitted = [], []
subtitles.split("\n").push("\n").each do |sub|
if sub.strip.empty?
splitted.push({
number: grouped[0],
time_marker: grouped[1].split(",").first,
content: grouped[2..-1].join(" ")
})
grouped = []
else
grouped.push sub.strip
end
end
splitted
end
puts split_subs(subtitles)
# output:
# ➲ ruby 23025546.rb [10:00:07] ▸▸▸▸▸▸▸▸▸▸
# {:number=>"1", :time_marker=>"00:02:12", :content=>"The Hovitos are near."}
# {:number=>"2", :time_marker=>"00:02:15", :content=>"The poison is still fresh, three days."}
# {:number=>"3", :time_marker=>"00:02:18", :content=>"They're following us."}
答案 1 :(得分:1)
这是一种可以做到的方法:
File.read('subtitles.srt').split(/^\s*$/).each do |entry| # Read in the entire text and split on empty lines
sentence = entry.strip.split("\n")
number = sentence[0] # First element after empty line is 'number'
time_marker = sentence[1][0..7] # Second element is 'time_marker'
content = sentence[2..-1].join("\n") # Everything after that is 'content'
end