希望您能回答我的问题。 我是python的新手,所以请您帮忙。 我想打开一个包含以下几行的文件。我想读取每一行并将其每个字符存储为字符串
A B 2
A E 2
A W 1
B D 5
B W 4
B C 2
B F 3
C F 7
C V 9
D E 1
D J 7
E K 3
F L 2
F M 7
F R 3
F Y 1
G K 8
G J 5
我想像这样存储有关每一行的信息: [A B 2],[A E 2]将为['A','B','2'],['A','E','2']
答案 0 :(得分:3)
您可以执行以下操作:
with open('testfile.txt') as fp:
content = [elem
for line in fp.readlines()
for elem in [line.split()]
if elem]
print(content)
这产生
[['A', 'B', '2'], ['A', 'E', '2'], ['A', 'W', '1'], ['B', 'D', '5'], ['B', 'W', '4'], ['B', 'C', '2'], ['B', 'F', '3'], ['C', 'F', '7'], ['C', 'V', '9'], ['D', 'E', '1'], ['D', 'J', '7'], ['E', 'K', '3'], ['F', 'L', '2'], ['F', 'M', '7'], ['F', 'R', '3'], ['F', 'Y', '1'], ['G', 'K', '8'], ['G', 'J', '5']]
答案 1 :(得分:2)
或者,作为一个显式循环:
data = []
with open(filename) as f:
for line in f:
line = line.rstrip()
if line == '':
continue
data.append(line.split())
答案 2 :(得分:2)
我在这里比较了建议(3个具有列表理解功能,另外3个具有for循环迭代功能并附加到列表中):
def f_jan(filename):
with open(filename) as f:
return [
elem
for line in f.readlines()
for elem in [line.split()]
if elem]
def f_mateen_ulhaq_1(filename):
with open(filename) as f:
return [
elem.split()
for elem in map(str.rstrip, f)
if elem]
def f_ralf_1(filename):
with open(filename) as f:
return [
line.split()
for line in f
if line != '\n']
def f_mateen_ulhaq_2(filename):
data = []
with open(filename) as f:
for line in f:
line = line.rstrip()
if line == '':
continue
data.append(line.split())
return data
def f_mateen_ulhaq_3(filename):
data = []
with open(filename) as f:
for line in f:
if line == '\n':
continue
data.append(line.split())
return data
def f_ralf_2(filename):
data = []
with open(filename) as f:
for line in f:
if line != '\n':
data.append(line.split())
return data
我创建了2个文件,一个文件包含问题中提供的100行示例输入,另一个文件包含100.000行相同的输入。
我测试了它们都返回相同的数据:
filename_1 = 'test_100_lines.txt'
assert (f_jan(filename_1)
== f_mateen_ulhaq_1(filename_1)
== f_ralf_1(filename_1)
== f_mateen_ulhaq_2(filename_1)
== f_mateen_ulhaq_3(filename_1)
== f_ralf_2(filename_1))
然后,我使用timeit
比较了速度(对大文本文件使用了较少的重复次数):
for fn, number in[
('test_100_lines.txt', 10000),
('test_100000_lines.txt', 100),
]:
for func in [
f_jan,
f_mateen_ulhaq_1,
f_ralf_1,
f_mateen_ulhaq_2,
f_mateen_ulhaq_3,
f_ralf_2,
]:
t = timeit.timeit('func(fn)', 'from __main__ import fn, func', number=number)
print('{:25s} {:20s} {:10.4f} seconds'.format(fn, func.__name__, t))
无论输入大小如何,最快的解决方案是f_ralf_1
(不使用.strip()
的列表理解,只需与\n
进行比较):
test_100_lines.txt f_jan 0.5019 seconds
test_100_lines.txt f_mateen_ulhaq_1 0.4483 seconds
test_100_lines.txt f_ralf_1 0.3657 seconds
test_100_lines.txt f_mateen_ulhaq_2 0.4523 seconds
test_100_lines.txt f_mateen_ulhaq_3 0.3854 seconds
test_100_lines.txt f_ralf_2 0.3886 seconds
test_100000_lines.txt f_jan 3.1178 seconds
test_100000_lines.txt f_mateen_ulhaq_1 2.6396 seconds
test_100000_lines.txt f_ralf_1 1.8084 seconds
test_100000_lines.txt f_mateen_ulhaq_2 2.7143 seconds
test_100000_lines.txt f_mateen_ulhaq_3 2.0398 seconds
test_100000_lines.txt f_ralf_2 2.0246 seconds