Question

我有一个制表符分隔文件：

这是一句话。 ABB 什么是这个foo酒吧。 BEV 你好foo bar blah black sheep。 ABB

我可以在unix终端中使用cut -f1和cut -f2分割成两个文件：

this is a sentence.
what is this foo bar.
hello foo bar blah black sheep.

和

abb
bev
abb

但是有可能在python中做同样的事情吗？会更快吗？

我一直在这样做：

[i.split('\t')[0] for i in open('in.txt', 'r')]

Answer 1

但是可以在python中做同样的事情吗？

是的，你可以：

l1, l2 = [[],[]]

with open('in.txt', 'r') as f:
    for i in f:
        # will loudly fail if more than two columns on a line
        left, right = i.split('\t')
        l1.append(left)
        l2.append(right)

print("\n".join(l1))
print("\n".join(l2))

会更快吗？

它不太可能， cut 是一个针对这种处理进行优化的C程序，python是一种通用语言，具有很大的灵活性，但不一定快。

尽管如此，通过使用我编写的算法可以获得的唯一优势是，您只读取文件一次，而使用剪切，您只需阅读两次。这可能会有所不同。

虽然我们需要将一些基准测试运行为100％。

这是我的笔记本电脑上的一个小基准，它的价值是什么：

>>> timeit.timeit(stmt=lambda: t("file_of_606251_lines"), number=1)
1.393364901014138

VS

% time cut -d' ' -f1 file_of_606251_lines > /dev/null
cut -d' ' -f1 file_of_606251_lines > /dev/null  0.74s user 0.02s system 98% cpu 0.775 total
% time cut -d' ' -f2 file_of_606251_lines > /dev/null
cut -d' ' -f2 file_of_606251_lines > /dev/null  1.18s user 0.02s system 99% cpu 1.215 total

这是1.990秒。

所以python版本确实比预期的更快; - ）

如何使用分隔符将csv文件拆分为多个文件？蟒蛇

1 个答案: