Question

我有一个5000000行的文本文件，我想从每个1000中提取一行并将它们写入一个新的文本文件。新文本文件应为5000行。

你能帮帮我吗？

Answer 1

我会使用python脚本来执行此操作。但是，您的shell也可以使用相同的逻辑。这是python代码。

input_file = 'path/file.txt'
output_file = 'path/output.txt'
n = 0

with open(input_file, 'r') as f:
    with ope(output_file, 'w') as o:
        for line in f:
            n += 1
            if n == 1000:
                o.write(line)
                n = 0

基本上，你初始化一个计数器然后你逐行迭代文件，你增加每一行的计数器，如果计数器达到1000，你在新文件中写入行并重新计算器。

Here是如何使用Bash shell迭代文件的行。

Answer 2

尝试：

awk 'NR%1000==1' infile > outfile

请参阅此链接以获取更多选项：remove odd or even lines from text file in terminal in linux

Answer 3

您可以使用head或tail，取决于您要提取哪一行。

从每个文件中提取第一行（例如*.txt个文件）：

head -n1 *.txt | grep -ve ^= -e ^$ > first.txt

要从每个文件中提取最后一行，只需使用tail代替head。

有关提取特定行的信息，请参阅：How do I use Head and Tail to print specific lines of a file。

从shell脚本中的文件中提取行

3 个答案: