这是我的文本文件,每行包含一个元组:
(1, 2)
(3, 4)
(5, 6)
阅读上述文件并生成如下结构列表的最粗略和最优化的视角是什么:
[[1,2],[3,4],[5,6]]
这是我目前的做法,不是我想要的那样:
with open("agentListFile.txt") as f:
agentList = [agentList.rstrip('\n') for line in f.readlines()]
答案 0 :(得分:3)
您可以使用ast.literal_eval
安全地评估元组并将这些元组转换为list-comp中的列表,例如:
import ast
with open("agentListFile.txt") as f:
agent_list = [list(ast.literal_eval(line)) for line in f]
有关详细信息,请阅读doc of ast.literal_eval
和this thread。
答案 1 :(得分:3)
这是迄今为止我能够提出的最快的解决方案。
def re_sol1():
''' re.findall on whole file w/ capture groups '''
with open('agentListFile.txt') as f:
numpairs = [[int(numstr)
for numstr in numpair]
for numpair in re.findall(r'(\d+), (\d+)', f.read())]
return numpairs
它使用re.findall
以及所有值都只是正整数的事实。通过将正则表达式中的捕获组与re.findall
结合使用,您可以有效地获取正整数字符串对并将它们映射到列表推导中的整数
要处理负整数,您也可以使用r'-?\d+'
作为正则表达式。
当我在Linux 2.7.6的默认版本上运行以下代码时,似乎表明re_sol1
是最快的:
with open('agentListFile.txt', 'w') as f:
for tup in zip(range(1, 1001), range(1, 1001)):
f.write('{}\n'.format(tup))
funcs = []
def test(func):
funcs.append(func)
return func
import re, ast
@test
def re_sol1():
''' re.findall on whole file w/ capture groups '''
with open('agentListFile.txt') as f:
numpairs = [[int(numstr)
for numstr in numpair]
for numpair in re.findall(r'(\d+), (\d+)', f.read())]
return numpairs
@test
def re_sol2():
''' naive re.findall on whole file '''
with open('agentListFile.txt') as f:
nums = [int(numstr) for numstr in re.findall(r'\d+', f.read())]
numpairs = [nums[i:i+2] for i in range(0, len(nums), 2)]
return numpairs
@test
def re_sol3():
''' re.findall on whole file w/ str.split '''
with open('agentListFile.txt') as f:
numpairs = [[int(numstr)
for numstr in numpair.split(', ')]
for numpair in re.findall(r'\d+, \d+', f.read())]
return numpairs
@test
def re_sol4():
''' re.finditer on whole file '''
with open('agentListFile.txt') as f:
match_iterator = re.finditer(r'(\d+), (\d+)', f.read())
numpairs = [[int(ns) for ns in m.groups()] for m in match_iterator]
return numpairs
@test
def re_sol5():
''' re.match line by line '''
with open('agentListFile.txt') as f:
numpairs = [[int(ns)
for ns in re.match(r'\((\d+), (\d+)', line).groups()]
for line in f]
return numpairs
@test
def re_sol6():
''' re.search line by line '''
with open('agentListFile.txt') as f:
numpairs = [[int(ns)
for ns in re.search(r'(\d+), (\d+)', line).groups()]
for line in f]
return numpairs
@test
def sss_sol1():
''' strip, slice, split line by line '''
with open("agentListFile.txt") as f:
agentList = [map(int, line.strip()[1:-1].split(', ')) for line in f]
return agentList
@test
def ast_sol1():
''' ast.literal_eval line by line '''
with open("agentListFile.txt") as f:
agent_list = [list(ast.literal_eval(line)) for line in f]
return agent_list
### Begin tests ###
def all_equal(iterable):
try:
iterator = iter(iterable)
first = next(iterator)
return all(first == rest for rest in iterator)
except StopIteration:
return True
if all_equal(func() for func in funcs):
from timeit import Timer
def print_timeit(func, cnfg={'number': 1000}):
print('{}{}'.format(Timer(func).timeit(**cnfg), func.__doc__))
for func in funcs:
print_timeit(func)
else:
print('At least one of the solutions is incorrect.')
单次运行的示例输出:
1.50156712532 re.findall on whole file w/ capture groups
1.53699707985 naive re.findall on whole file
1.71362090111 re.findall on whole file w/ str.split
1.97333717346 re.finditer on whole file
3.36241197586 re.match line by line
3.59856200218 re.search line by line
1.71777415276 strip, slice, split line by line
12.8218641281 ast.literal_eval line by line
答案 2 :(得分:2)
以下代码依赖于假设,即您的行遵循相同的格式(number1, number2)
def strip_slice_split_solution():
with open("agentListFile.txt") as f:
agentList = [map(int, line.strip()[1:-1].split(', ')) for line in f]
return agentList
s[1:-1]
将省略s
的第一个和最后一个字符(括号)。
我将Shashank's solution(已从函数中移除import
)和Jon's solution并将其放入文件中并决定进行一些测试。我生成了一些带有5000-1000
行的文件来进行测试。
摘自测试
In [3]: %timeit re_solution()
100 loops, best of 3: 2.3 ms per loop
In [4]: %timeit strip_slice_split_solution()
100 loops, best of 3: 2.28 ms per loop
In [5]: %timeit ast_solution()
100 loops, best of 3: 14.1 ms per loop
所有3个函数都会产生相同的结果
In [6]: ast_solution() == re_solution() == strip_slice_split_solution()
Out[6]: True