我想将文件中的所有整数读入一个列表。所有数字由空格(一个或多个)或结束线字符(一个或多个)分隔。这样做最有效和/或最优雅的方法是什么?我有两个解决方案,但我不知道它们是否好。
检查数字:
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
my_list.append(int(i))
处理例外:
for line in open("foo.txt", "r"):
for i in line:
try:
my_list.append(int(i))
except ValueError:
pass
示例数据:
1 2 3
4 56
789
9 91 56
10
11
答案 0 :(得分:6)
执行此操作的有效方法是使用with
语句进行少量更改来打开文件的第一种方法,示例 -
with open("foo.txt", "r") as f:
for line in f:
for i in line.split():
if i.isdigit():
my_list.append(int(i))
通过与其他方法的比较完成时间测试 -
功能 -
def func1():
my_list = []
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
my_list.append(int(i))
return my_list
def func1_1():
return [int(i) for line in open("foo.txt", "r") for i in line.strip().split(' ') if i.isdigit()]
def func1_3():
my_list = []
with open("foo.txt", "r") as f:
for line in f:
for i in line.split():
if i.isdigit():
my_list.append(int(i))
return my_list
def func2():
my_list = []
for line in open("foo.txt", "r"):
for i in line.split():
try:
my_list.append(int(i))
except ValueError:
pass
return my_list
def func3():
my_list = []
with open("foo.txt","r") as f:
cf = csv.reader(f, delimiter=' ')
for row in cf:
my_list.extend([int(i) for i in row if i.isdigit()])
return my_list
时间测试的结果 -
In [25]: timeit func1()
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 204 µs per loop
In [26]: timeit func1_1()
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 207 µs per loop
In [27]: timeit func1_3()
The slowest run took 5.46 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 191 µs per loop
In [28]: timeit func2()
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 212 µs per loop
In [34]: timeit func3()
The slowest run took 4.38 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 202 µs per loop
鉴于将数据存储到列表中的方法,我认为上面的func1_3()
最快(如时间所示)。
但鉴于此,如果您真的处理的是非常大的文件,那么最好使用生成器而不是将完整列表存储在内存中。
更新:正如评论中所说,func2()
比func1_3()
更快(尽管在我的系统上它甚至比func1_3()
更快仅限整数),更新foo.txt
以包含除数字以外的其他内容并进行计时测试 -
foo.txt的
1 2 10 11
asd dd
dds asda
22 44 32 11 23
dd dsa dds
21 12
12
33
45
dds
asdas
dasdasd dasd das d asda sda
测试 -
In [13]: %timeit func1_3()
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 210 µs per loop
In [14]: %timeit func2()
1000 loops, best of 3: 279 µs per loop
In [15]: %timeit func1_3()
1000 loops, best of 3: 213 µs per loop
In [16]: %timeit func2()
1000 loops, best of 3: 273 µs per loop
答案 1 :(得分:5)
如果您可以将整个文件作为字符串读取,那将非常简单。 (即,这不是太大了)
fileStr = open('foo.txt').read().split()
integers = [int(x) for x in fileStr if x.isdigit()]
read()
将其转换为长字符串,split
基于空格(即空格和换行符)拆分为字符串列表。因此,您可以将其与列表推导相结合,如果它们是数字,则将它们转换为整数。
正如Bakuriu所指出的,如果保证文件只有空格和数字,那么你不必检查isdigit()。在这种情况下,使用list(map(int, open('foo.txt').read().split()))
就足够了。如果任何东西是无效的整数,那么该方法将引发错误,而另一方法将跳过任何不是可识别数字的东西。
答案 2 :(得分:4)
谢谢大家。我混合了你发布的一些解决方案。这对我来说似乎非常好:
with open("foo.txt","r") as f:
my_list = [int(i) for line in f for i in line.split() if i.isdigit()]
答案 3 :(得分:3)
你可以使用list comprehension
这样做my_list = [int(i) for j in open("1.txt","r") for i in j.strip().split(" ") if i.isdigit()]
或with open() method
:
with open("1.txt","r") as f:
my_list = [int(i) for j in f for i in j.strip().split(" ") if i.isdigit()]
<强>过程:强>
1.首先,你将迭代
2.然后你将迭代这些单词并看到它们是数字,如果是这样我们将它们添加到列表中
修改强>
您需要将strip()
添加到行,因为行的每一行(除了最后一行)都会在其中包含新的行空格(&#34; \ n&#34;)并尝试{{1} }
<强>即)强>
is.digit("number\n") you will get false
<强> EDIT2:强>
<强>输入:强>
>>> "1\n".isdigit()
False
阅读时的文件数据:
1
qw 2
23 we 32
您可以看到它会影响流程的a=open("1.txt","r")
repr(a.read())
"'1\\nqw 2\\n23 we 32'"
新行
当我使用"\n"
运行该功能时,它不会将strip()
作为数字,因为它包含新行字符
1 and 2
从输出中可以清楚地看到1和2缺失。如果我们使用my_list = [int(i) for j in open("1.txt","r") for i in j.split(" ") if i.isdigit()]
my_list
[23, 32]
答案 4 :(得分:3)
为什么不使用yield
关键字?代码将如...
def readInt():
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
yield int(i)
然后你可以阅读
for num in readInt():
list.append(num)
答案 5 :(得分:3)
my_list = []
with open('foo.txt') as f:
for line in f:
for s in line.split():
try:
my_list.append(int(s))
except ValueError:
pass
答案 6 :(得分:3)
试试这个:
with open('file.txt') as f:
nums = []
for l in f:
l = l.strip()
nums.extend([int(i) for i in l.split() if i.isdigit() and l])
如果新行(&#39; \ n&#39;)存在,则需要 l.strip()
,因为i.isdigit('6\n')
无法正常工作。
list.extend在这里派上用场了
最后的and l
确保丢弃任何空列表结果
str.split在空格上分割。并且with块将在执行代码后自动关闭文件。 我也使用了list comprehensions
答案 7 :(得分:0)
这是我找到的最快的方法:
import re
regex = re.compile(r"\D+")
with open("foo.txt", "r") as f:
my_list = list(map(int, regex.split(f.read())))
虽然结果可能取决于文件的大小。