我在一个文件夹中有文字列表:
My O
name O
is O
Alex B
. O
I O
am O
from O
London B
. O
这是我的代码:
import re
def read_file(filename):
file = open(filename).read().strip().split("\n\n")
lines = []
for line in file:
lines.append(re.split(r'\t|\n', line))
return lines
train_sents = read_file(("train.txt"))
train_sents [0]
输出结果为:
[ 'My',
'O',
'name',
'O',
"is',
'O',
'Alex',
'B',
'.',
'O']
我的问题是..是否可以拆分\ t而不将它拆分为新行?对于示例,输出将如下:
[('My', 'O'),
('name', 'O'),
("is', 'O'),
('Alex', 'B'),
('.', 'O')]
答案 0 :(得分:3)
分开每一行:
with open(filename) as f:
print([tuple(line.split()) for line in f])
[('My', 'O'), ('name', 'O'), ('is', 'O'), ('Alex', 'B'), ('.', 'O')]
要用空行分隔这些行,请附加到最后一个子列表,否则如果我们遇到一个空行,则添加一个新列表:
with open(infile) as f:
l = [[]]
for line in f:
if line.strip():
l[-1].append(tuple(line.split()))
else:
l.append([])
print(l[0])
print(l[1])
[('My', 'O'), ('name', 'O'), ('is', 'O'), ('Alex', 'B'), ('.', 'O')]
[('I', 'O'), ('am', 'O'), ('from', 'O'), ('London', 'B'), ('.', 'O')]
您还可以使用空行作为分隔符进行i tertools.groupby分组:
from itertools import groupby
with open(infile) as f:
print([list(map(str.split, v))
for k, v in groupby(f, key=lambda x: x.strip() != "") if k])
[[['My', 'O'], ['name', 'O'], ['is', 'O'], ['Alex', 'B'], ['.', 'O']], [['I', 'O'], ['am', 'O'], ['from', 'O'], ['London', 'B'], ['.', 'O']]]
如有必要,您可以映射到元组。
答案 1 :(得分:1)
你可以试试这个,
def read_file(filename):
fil = open(filename).read().strip().split("\n\n")
lines = []
for line in fil:
s = []
m = line.split('\n')
for i in m:
s.append(tuple(re.split(r'\t', i)))
lines.append(s)
return lines
train_sents = read_file("file")
print train_sents[0]
<强>输出:强>
[('My', 'O'), ('name', 'O'), ('is', 'O'), ('Alex', 'B'), ('.', 'O')]