Question

我需要识别CSV1的A列中的副本，其中CSV列为CSV。如果识别出欺骗名，则需要将CSV2中的整行复制到新的CSV3。有人可以帮助python吗？

CSV1

Adam                 
Eve                    
John     
George

CSV2

Steve
Mark
Adam Smith 
John Smith

CSV3

Adam Smith
John Smith

Answer 1

这是一个快速回答。它是O（n ^ 2），其中n是csv中的行数，并假定两个相等长度的CSV。如果您需要O（n）解决方案（明显最佳），请告诉我。诀窍是构建一组csv1列的元素。

lines1 = open('csv1.txt').read().split('\n')
delim = ', '
fields1 = [line.split(delim) for line in lines1]
lines2 = open('csv2.txt').read().split('\n')
fields2 = [line.split(delim) for line in lines2]
duplicates = []
for line1 in fields1:
    for line2 in fields2:
        if line1[0] == line2[0]:
            duplicates.append(line2)

打印重复

Answer 2

使用3个单行中的任何一个：

选项1：解析BEGIN块中的file1

perl -lane 'BEGIN {$csv2 = pop; $seen{(split)[0]}++ while <>; @ARGV = $csv2 } print if $seen{$F[0]}' csv1 csv2

选项2：使用三元

perl -lane 'BEGIN {($csv1) = @ARGV } $ARGV eq $csv1 ? $seen{$F[0]}++ : ($seen{$F[0]} && print)' csv1 csv2

选项3：使用单个if

perl -lane 'BEGIN {($csv1) = @ARGV } print if $seen{$F[0]} += $ARGV eq $csv1 and $ARGV ne $csv1' csv1 csv2

说明：

切换：

-l：启用行结束处理
-a：拆分空间线并将其加载到数组@F
-n：为输入文件中的每一行创建一个while(<>){..}循环。
-e：告诉perl在命令行上执行代码。

Answer 3

以解决问题的清洁和python方式

words_a = set([])
words_b = set([])
with open('csv1') as a:
    words_a = set([w.strip() 
                   for l in a.readlines()
                   for w in l.split(" ")
                   if w.strip()])

with open('csv2') as b:
    words_b = set([ w.strip() 
                    for l in b.readlines()
                    for w in l.split(" ")
                    if w.strip()])    

with open('csv3','w') as wf:
    for w in words_a.intersection(words_b):
        wf.write(w)
        wf.write('\n')

识别两个不同CSV行的重复项

3 个答案:

说明：