我想转换由PTB-style标记生成器生成的语素数组:
["The", "house", "is", "n't", "on", "fire", "."]
一句话:
"The house isn't on fire."
实现这一目标的合理方法是什么?
答案 0 :(得分:2)
如果我们接受@ sawa关于撇号的建议并制作你的阵列:
["The", "house", "isn't", "on", "fire", "."]
您可以通过以下方式获得您所寻找的内容(使用标点符号支持!):
def sentence(array)
str = ""
array.each_with_index do |w, i|
case w
when '.', '!', '?' #Sentence enders, inserts a space too if there are more words.
str << w
str << ' ' unless(i == array.length-1)
when ',', ';' #Inline separators
str << w
str << ' '
when '--' #Dash
str << ' -- '
else #It's a word
str << ' ' unless str[-1] == ' ' || str.length == 0
str << w
end
end
str
end