Question

我想将文件中的所有整数读入一个列表。所有数字由空格（一个或多个）或结束线字符（一个或多个）分隔。这样做最有效和/或最优雅的方法是什么？我有两个解决方案，但我不知道它们是否好。

检查数字：

for line in open("foo.txt", "r"):
    for i in line.strip().split(' '):
        if i.isdigit():
            my_list.append(int(i))

处理例外：

for line in open("foo.txt", "r"):
    for i in line:
        try:
            my_list.append(int(i))
        except ValueError:
            pass

示例数据：

1   2     3
 4 56
    789         
9          91 56   

 10 
11

Answer 1

执行此操作的有效方法是使用with语句进行少量更改来打开文件的第一种方法，示例 -

with open("foo.txt", "r") as f:
    for line in f:
        for i in line.split():
            if i.isdigit():
                my_list.append(int(i))

通过与其他方法的比较完成时间测试 -

功能 -

def func1():
    my_list = []
    for line in open("foo.txt", "r"):
        for i in line.strip().split(' '):
            if i.isdigit():
                my_list.append(int(i))
    return my_list

def func1_1():
    return [int(i) for line in open("foo.txt", "r") for i in line.strip().split(' ') if i.isdigit()]

def func1_3():
    my_list = []
    with open("foo.txt", "r") as f:
        for line in f:
            for i in line.split():
                if i.isdigit():
                    my_list.append(int(i))
    return my_list

def func2():            
    my_list = []            
    for line in open("foo.txt", "r"):
        for i in line.split():
            try:
                my_list.append(int(i))
            except ValueError:
                pass
    return my_list

def func3():
    my_list = []
    with open("foo.txt","r") as f:
        cf = csv.reader(f, delimiter=' ')
        for row in cf:
            my_list.extend([int(i) for i in row if i.isdigit()])
    return my_list

时间测试的结果 -

In [25]: timeit func1()
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 204 µs per loop

In [26]: timeit func1_1()
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 207 µs per loop

In [27]: timeit func1_3()
The slowest run took 5.46 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 191 µs per loop

In [28]: timeit func2()
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 212 µs per loop

In [34]: timeit func3()
The slowest run took 4.38 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 202 µs per loop

鉴于将数据存储到列表中的方法，我认为上面的func1_3()最快（如时间所示）。

但鉴于此，如果您真的处理的是非常大的文件，那么最好使用生成器而不是将完整列表存储在内存中。

更新：正如评论中所说，func2()比func1_3()更快（尽管在我的系统上它甚至比func1_3()更快仅限整数），更新foo.txt以包含除数字以外的其他内容并进行计时测试 -

foo.txt的

1 2 10 11
asd dd
 dds asda
22 44 32 11   23
dd dsa dds
21 12
12
33
45
dds
asdas
dasdasd dasd das d asda sda

测试 -

In [13]: %timeit func1_3()
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 210 µs per loop

In [14]: %timeit func2()
1000 loops, best of 3: 279 µs per loop

In [15]: %timeit func1_3()
1000 loops, best of 3: 213 µs per loop

In [16]: %timeit func2()
1000 loops, best of 3: 273 µs per loop

Answer 2

如果您可以将整个文件作为字符串读取，那将非常简单。（即，这不是太大了）

fileStr = open('foo.txt').read().split() 
integers = [int(x) for x in fileStr if x.isdigit()]

read()将其转换为长字符串，split基于空格（即空格和换行符）拆分为字符串列表。因此，您可以将其与列表推导相结合，如果它们是数字，则将它们转换为整数。

正如Bakuriu所指出的，如果保证文件只有空格和数字，那么你不必检查isdigit（）。在这种情况下，使用list(map(int, open('foo.txt').read().split()))就足够了。如果任何东西是无效的整数，那么该方法将引发错误，而另一方法将跳过任何不是可识别数字的东西。

Answer 3

谢谢大家。我混合了你发布的一些解决方案。这对我来说似乎非常好：

with open("foo.txt","r") as f:
    my_list = [int(i)  for line in f for i in line.split() if i.isdigit()]

Answer 4

你可以使用list comprehension

这样做

my_list = [int(i)  for j in open("1.txt","r") for i in j.strip().split(" ") if i.isdigit()]

或with open() method：

with open("1.txt","r") as f:
    my_list = [int(i)  for j in f for i in j.strip().split(" ") if i.isdigit()]

<强>过程：

1.首先，你将迭代

2.然后你将迭代这些单词并看到它们是数字，如果是这样我们将它们添加到列表中

修改

您需要将strip()添加到行，因为行的每一行（除了最后一行）都会在其中包含新的行空格（＆＃34; \ n＆＃34;）并尝试{{1} }

<强>即）

is.digit("number\n") you will get false

<强> EDIT2：

<强>输入：

>>> "1\n".isdigit() False

阅读时的文件数据：

1 qw 2 23 we 32

您可以看到它会影响流程的a=open("1.txt","r") repr(a.read()) "'1\\nqw 2\\n23 we 32'"新行

当我使用"\n"运行该功能时，它不会将strip()作为数字，因为它包含新行字符

1 and 2

从输出中可以清楚地看到1和2缺失。如果我们使用my_list = [int(i) for j in open("1.txt","r") for i in j.split(" ") if i.isdigit()] my_list [23, 32]
，这可以避免

Answer 5

为什么不使用yield关键字？代码将如...

def readInt():
    for line in open("foo.txt", "r"):
        for i in line.strip().split(' '):
            if i.isdigit():
                yield int(i)

然后你可以阅读

    for num in readInt():
        list.append(num)

Answer 6

my_list = []
with open('foo.txt') as f:
    for line in f:
        for s in line.split():
            try:
                my_list.append(int(s))
            except ValueError:
                pass

Answer 7

试试这个：

with open('file.txt') as f:
    nums = []
    for l in f:
        l = l.strip()
        nums.extend([int(i) for i in l.split() if i.isdigit() and l])

如果新行（＆＃39; \ n＆＃39;）存在，则需要

l.strip()，因为i.isdigit('6\n')无法正常工作。

list.extend在这里派上用场了

最后的and l确保丢弃任何空列表结果

默认情况下，

str.split在空格上分割。并且with块将在执行代码后自动关闭文件。我也使用了list comprehensions

Answer 8

这是我找到的最快的方法：

import re
regex = re.compile(r"\D+")

with open("foo.txt", "r") as f:
    my_list = list(map(int, regex.split(f.read())))

虽然结果可能取决于文件的大小。

从文件

8 个答案: