Question

我的大文件名为file.txt，其中包含如下数据：

所以我想要这样的输出，如果在第一列中重复1，那么它应该这样分割文件：---

a.txt：

b.txt：

c.txt：

1  2.5  
2  2.8  
3  3.1

Answer 1

OP问题的解决方案： ：能否请您尝试遵循（OP在他/她的帖子中提到输出文件应为a.txt或{{1 }}等）。由于OP并未提及一旦创建了所有字母输出文件，该怎么办，所以我编写了程序，其中一旦发生27出现1，那么它将再次使用b.txt中的文件并继续追加到已经存在的文件中。

EDIT（OP的注释中，OP希望以awk ' BEGIN{ split("a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z",array,",") } $1==1{ close(file) file=array[++count]".txt" count=count==26?0:count } { print >> file } ' Input_file，1.txt等格式输出文件）：输出2.txt，1.txt等文件，然后尝试执行以下操作。每当2.txt进入第一字段时，它将开始将输出写入新的输出文件。

添加上述命令的说明：

awk '$1==1{close(file);file=++count".txt"}  {print > file}'  Input_file

以上命令将创建3个输出文件（根据您的示例），如下所示：

awk '                        ##Starting awk program here.
$1==1{                       ##Checking condition if $1(first field) of current line is equal to 1 then do following.
  close(file)                ##Using close awk function to close output file whose name is stored in variable named file.
  file=++count".txt"         ##Creating a variable named file whose value is increment variable count value with .txt string.
}                            ##Closing BLOCK for condition here.
{
  print > file               ##Printing all lines to output file whose names is stored in variable file here.
}
'   Input_file               ##Mentioning Input_file name here.

PS： ：我已经通过在程序中使用cat 1.txt 1 1.1 2 1.2 3 1.3 4 1.4 5 1.5 cat 2.txt 1 2.1 2 2.2 3 2.3 4 2.4 cat 3.txt 1 2.5 2 2.8 3 3.1命令来解决“打开太多文件”的错误。

Answer 2

如果您不太在意文件名，那么它们可以只是数字

 awk '(NR==1)||($1<t) { close(f); f=sprintf("%0.5d",i++)}{print > f; t=$1}'

Answer 3

假设您可以使用python，请尝试以下操作：

counter = 1
output = None
with open('file.txt', 'r') as input:
    while True:
        line = input.readline()
        if line is None or len(line) == 0:
            break
        if line[0] == '1':
            if output is not None:
                output.close()
                output = None
        if output is None:
            output = open(str(counter) + '.txt', 'w')
            counter = counter + 1
        output.write(line)

Answer 4

这可能对您有用（GNU csplit和并行）：

csplit -sz file '/^1 /' '{*}'
parallel mv ::: xx?? :::+ {a..z}.txt

Answer 5

这里是bash的替代方法

#!/bin/bash
count=96                                                 # char before 'a'
while read line; do                                      # loop over all lines
   tag=$(echo $line | cut -d " " -f1)                    # get line tagger
   if [ "$tag" == "1" ]; then                            # group change on 1
       let "count = count + 1"                           # count file
       filename="$(printf "\\$(printf %o $count)").txt"  # create filename
       >$filename                                        # initial file
   fi
   echo "$line" >> $filename                             # append to file
done < file.txt                                          # input from file.txt

如何将一个文件拆分为多个文件，如果重复的术语出现在第1列中？

5 个答案: