我有一份清单清单。我能够在内部列表中生成bigrams,如下所示:
[[('bacteria','agricultur'),('agricultur','soil'),('soil','presenc'),('presenc','sampl')],[('细菌' ','农业'),('农'','土壤'),('土壤','presenc'),('presenc','sampl')],[('nodul','uragensi')], [('nodul','stem'),('stem','nodul')],[('变形','morphoid')]]
现在,我需要将bigram元组中的逗号替换为我无法执行的下划线。所以,结果看起来应该是
[[(bacteria_agricultur),(agricultur_soil),(soil_presenc),(presenc_sampl)],[(bacteria_agricultur),(agricultur_soil),(soil_presenc),(presenc_sampl)],[(nodul_uragensi)],[(nodul_stem) ,(stem_nodul)],[('变形'_'形态')]]
当我使用join时,它会给我错误
texts = ["_".join(word) for word in texts]
错误:
TypeError: sequence item 0: expected str instance, tuple found
如何生成上述输出?感谢
答案 0 :(得分:1)
您可以使用嵌套列表理解:
In [446]: [['_'.join(y) for y in x] for x in lst]
Out[446]:
[['bacteria_agricultur', 'agricultur_soil', 'soil_presenc', 'presenc_sampl'],
['bacteria_agricultur', 'agricultur_soil', 'soil_presenc', 'presenc_sampl'],
['nodul_uragensi'],
['nodul_stem', 'stem_nodul'],
['deform_morphoid']]
如果你坚持使用括号,你也可以创建单元素元组:
In [447]: [[('_'.join(y), ) for y in x] for x in lst]
Out[447]:
[[('bacteria_agricultur',),
('agricultur_soil',),
('soil_presenc',),
('presenc_sampl',)],
[('bacteria_agricultur',),
('agricultur_soil',),
('soil_presenc',),
('presenc_sampl',)],
[('nodul_uragensi',)],
[('nodul_stem',), ('stem_nodul',)],
[('deform_morphoid',)]]
答案 1 :(得分:0)
NewData=[]
for bigrams in lists:
for grams in bigrams:
NewData.append(str(grams).replace("'","").replace(", ","_")))