我是unix的新手你可以帮我找到重复的记录
基于姓名,EmpId和指定重复
输入文件:
"Name" , "Address", ËmpId"," designation", "office location"
"NameValue","AddressValue",ËmpIdValue","designationValue","office locationValue"
"NameValue1","AddressValue1",ËmpIdValue1","designationValue1","office locationValue1"
"NameValue","AddressValue1",ËmpIdValue","designationValue","office locationValue"
"NameValue","AddressValue2",ËmpIdValue","designationValue","office locationValue"
"NameValue","AddressVal4ue",ËmpIdValue1","designationValue","office locationValue"
输出文件:
"NameValue","AddressValue",ËmpIdValue","designationValue","office locationValue"
"NameValue","AddressValue1",ËmpIdValue","designationValue","office locationValue"
"NameValue","AddressValue2",ËmpIdValue","designationValue","office locationValue"
答案 0 :(得分:0)
可能python脚本最适合这个:
import fileinput
dict = {}
for line in fileinput.input():
tokens = line.split(",")
key = tokens[0] + "###" + tokens[2] + "###" + tokens[3]
if key in dict:
# print the previous duplicate, if it wasn't printed yet
if len(dict[key]):
print dict[key],
dict[key] = ""
print line,
else:
dict[key] = line
对于生产用途,您可能希望使用更复杂的算法来使密钥唯一,但总体思路是一样的。