我有一个文本文件,其格式如下:
1 1089874 108992 PCCW's chief operating officer. Current Chief Operating Officer Mike.
1 3019446 3019327 The world's two largest. late summer sales frenzy caused more of an industry backlash than expected.
为清楚起见,有一个标签(1)+用tab + id1(1089874)分隔+用空格+ id2(1089925)分隔+用空格+ text1 +用tab + text2分隔
我想读取文本文件,并在python的不同列表中提取label
,text1
和text2
。我该怎么办?谢谢
答案 0 :(得分:1)
假设每一行都在变量line
中,请执行以下操作:
cols = line.split() # Splits by any white space
label = cols[0]
text1 = cols[1]
text2 = ' '.join(cols[2:])
或者,重新阅读您的要求,我认为您实际上是想要的:
cols = line.split('\t')
label = cols[0]
text1 = ' '.join(cols[1].split()[2:])
text2 = cols[2]