我有一个通过语音转文本生成的json文件,该文件返回所有带有标点符号的检测到的单词。现在,我想从中创建句子。
我可以进行while循环,直到检测到一个点,然后将所有单词附加到列表中并从中返回一个句子。但是,这个while循环在第一个点停止。如何使该循环继续到json文件结束?
with open(json_file) as f:
data = json.load(f)
for word in data['words']:
while not data['words'][i]['name'] == '.':
sentenceList.append(data['words'][i]['name'])
i +=1
sentence = ' '.join(word for word in sentenceList)
print (sentence)
json示例:
"words": [
{
"duration": "0.18",
"confidence": "0.990",
"name": "Is",
"time": "0.80"
},
{
"duration": "0.27",
"confidence": "1.000",
"name": "dit",
"time": "0.99"
},
{
"duration": "0.24",
"confidence": "1.000",
"name": "met",
"time": "1.50"
},
{
"duration": "0.54",
"confidence": "0.990",
"name": "vaart",
"time": "1.86"
},
{
"duration": "0.33",
"confidence": "0.990",
"name": ".",
"time": "2.40"
},
{
"duration": "0.06",
"confidence": "0.910",
"name": "We",
"time": "2.73"
},
{
"duration": "0.21",
"confidence": "1.000",
"name": "hebben",
"time": "2.79"
},
{
"duration": "0.09",
"confidence": "1.000",
"name": "het",
"time": "3.00"
},
{
"duration": "0.42",
"confidence": "1.000",
"name": "vandaag",
"time": "3.09"
},
{
"duration": "0.30",
"confidence": "1.000",
"name": "over",
"time": "3.51"
},
{
"duration": "0.60",
"confidence": "1.000",
"name": "België",
"time": "3.81"
},
{
"duration": "0.18",
"confidence": "1.000",
"name": ".",
"time": "4.50"
}
答案 0 :(得分:2)
我认为解决方案是直接的。您说:“但是while循环在第一个点停止。”那是一会儿,直到满足条件为止。因此,只需将其替换为if结构。
with open(json_file) as f:
data = json.load(f)
for word in data['words']:
# Check if it's a word or a dot
if not data['words'][i]['name'] == '.':
# If word, add it to the array
sentenceList.append(data['words'][i]['name'])
i +=1
# All words are appended, now join.
sentence = ' '.join(word for word in sentenceList)
print(sentence)
答案 1 :(得分:1)
在您的情况下,简单的if
语句就足以检查句子的结尾(因为输入结构中的每个 words序列以"name": "."
结尾):< / p>
sentenceList = []
for word in data['words']:
if word['name'] == '.':
sentence = ' '.join(word for word in sentenceList)
sentenceList = []
print(sentence)
else:
sentenceList.append(word['name'])
输出:
Is dit met vaart
We hebben het vandaag over België
答案 2 :(得分:1)
使用itertools.groupby
:
data = '''{"words": [
{
"duration": "0.18",
"confidence": "0.990",
"name": "Is",
"time": "0.80"
},
{
"duration": "0.27",
"confidence": "1.000",
"name": "dit",
"time": "0.99"
},
{
"duration": "0.24",
"confidence": "1.000",
"name": "met",
"time": "1.50"
},
{
"duration": "0.54",
"confidence": "0.990",
"name": "vaart",
"time": "1.86"
},
{
"duration": "0.33",
"confidence": "0.990",
"name": ".",
"time": "2.40"
},
{
"duration": "0.06",
"confidence": "0.910",
"name": "We",
"time": "2.73"
},
{
"duration": "0.21",
"confidence": "1.000",
"name": "hebben",
"time": "2.79"
},
{
"duration": "0.09",
"confidence": "1.000",
"name": "het",
"time": "3.00"
},
{
"duration": "0.42",
"confidence": "1.000",
"name": "vandaag",
"time": "3.09"
},
{
"duration": "0.30",
"confidence": "1.000",
"name": "over",
"time": "3.51"
},
{
"duration": "0.60",
"confidence": "1.000",
"name": "België",
"time": "3.81"
},
{
"duration": "0.18",
"confidence": "1.000",
"name": ".",
"time": "4.50"
}
]}'''
import json
from itertools import groupby
d = json.loads(data)
lst = [' '.join(i['name'] for i in g) + '.' for v, g in groupby(d['words'], lambda w: w['name'] != '.') if v]
print(lst)
打印:
['Is dit met vaart.', 'We hebben het vandaag over België.']