Question

我正在尝试做这样的事情

输入文件

输出文件1

123 09
355 07
765 01

输出文件2

123 10
765 03
765 05

我的意思是。如果第1列中有重复值，我想要两个消除（整行），但实际上我想将这些值放在另一个文件中。

我知道我可以用

获得输出1

awk '!a[$1]++' file

但是有可能获得输出2 ???

我对python脚本开放。

Answer 1

使用awk的一种方法

awk '{print >("file"(!a[$1]++?1:2))}' file

或

awk '{print >("file"(a[$1]++?2:1))}' file

Answer 2

这是一个简单易读的python脚本，可以完成这项工作。如果您有任何疑问，请发表评论。

# open all the files
with open('output_1.txt','w') as out_1:
    with open('output_2.txt', 'w') as out_2:
        with open('input.txt', 'r') as f:
            #make list that stores intermediate results
            tmp = []
            #iterate over each row of the input file
            for row in f:
                #extract the data contained in the row
                col_1, col_2 = row.split('  ') #split the line at double space

                #check if you have met col_1 before
                #if not, write the row in output_1
                if col_1 not in tmp:
                    tmp.append(col_1)
                    out_1.write(row)
                #otherwise write the row in output_2
                else:
                    out_2.write(row)

Answer 3

对于第一和第二输出，您可以使用此awk命令：

awk '!seen[$1]++{print > "output1"; next} {print > "output2"}' file

cat output1
123  09
355  07
765  01

cat output2
123  10
765  03
765  05

Answer 4

使用Python：

seen = set()
with open('data.txt') as fin, open('f1.txt', 'w') as fout1, open('f2.txt', 'w') as fout2:
    for line in fin:
        col = line.split()[0]
        if col in seen:
            fout2.write(line)
        else:
            seen.add(col)
            fout1.write(line)

Answer 5

试

awk '{if($1 in a){ print > "Output2" }else{ print > "Output1"} a[$1]=true}' input

你进入Output1文件

123  09
355  07
765  01

你进入Output2文件

123  10
765  03
765  05

如果，您只想获得输出2，请删除代码中的!

awk 'a[$1]++' input

Answer 6

您可以直接在-init...中完成这项工作。例如：

bash

它可以通过各种方式缩短;为了简洁起见，我写了一些清晰度和一点灵活性。

理解#!/bin/bash # file names file=input.in dupes=dupes.out uniques=uniques.out # an (associative) array to track seen keys declare -a keys # extracts a key from an input line via shell word splitting get_key() { key=$1 } # Removes old output files [ -e "$dupes" ] && rm "$dupes" [ -e "$uniques" ] && rm "$uniques" # process the input line by line while read line; do get_key $line if [ -n "${keys[$key]}" ]; then # a duplicate echo "$line" >> "$dupes" else # not a duplicate keys[$key]=1 echo "$line" >> "$uniques" fi done < "$file"本身就是一个非常强大的编程环境，这一点非常重要。减慢许多shell脚本的一个原因是使用了许多外部命令。使用外部命令不是固有坏，有时它是完成工作的最佳或唯一方法，但在不是这种情况下，你应该认真考虑避免它们。

如果重复列，则删除并保留整个行

6 个答案: