Python:如果第二个文件中存在值,如何比较两个csv文件并在第一个文件中添加分类器

时间:2016-12-19 13:50:12

标签: python csv

我尝试比较两个csv文件。第一个文件(movements.csv)有14列,第二个csv(LCC.csv)有一列。我想检查movements.csv中第8列的条目(字符串)是否出现在LCC.csv第1列的某处。如果是这样,在第14栏a'是'应该写,如果不是' No'。我到目前为止尝试的代码是我收到的错误消息:

import csv

f1 = file('LCC.csv', 'rb') 
f2 = file('movements.csv', 'rb')
f3 = ('output.csv', 'wb') 

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

movements = list(c2)

for LCC_row in c1:
    row = 0
    found = False
    for movements_row in movements:
        output_row = movements_row
        if movements_row[7] == LCC_row[0]
            output_row.append('Yes')
            found = True
            break
        row += 1
    if not found:
        output_row.append('No')
    c3.writerow(output_row)

f1.close()
f2.close()
f3.close()

enter image description here

我是一个完整的python初学者,所以任何建议都表示赞赏!最理想的是,两列之间的检查也会忽略字符串是否用大写字母书写。

之后出现错误消息
c3.writerow(output_row)

作为

追踪(最近一次呼叫最后一次):

  File "<stdin>", line 1, in <module>
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
>>> 

LCC.csv(无标题):

Air Ab  
Jamb  
Sw  
AIRF  
EURO   

movements.csv(有一个标题):

ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC  
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu,   
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A,   
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu,   
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,   

如前所述,最后一列(LCC)目前完全是空的

3 个答案:

答案 0 :(得分:1)

它有很多问题。在看了一下代码之后我发现的很少:

  1. 您的行中有无效引用'

    f2 = file('movements.csv', ,rb')
    #                          ^
    

    应该是:

    f2 = file('movements.csv', 'rb')
    
  2. 在您分享的代码中,您在各个地方都有后引用,而不是单引号 '。例如,您的行应为:

    f1 = file('LCC.csv', 'rb') 
    f3 = file('output.csv', 'wb')    
    #     ^ also missing file here
    
  3. :后缺少冒号if。它应该是:

    if movements_row[7] == LCC_row[0]:
    #                           Here ^
    
  4. 此外,要初始化字符串,您不需要括号。只需指定它:

    output_row[13] = 'Yes'
    #                ^ As simple string
    

答案 1 :(得分:0)

你试图同时做太多事情。将其拆分为不同的任务。首先,我们将LCC.csv的内容读入一个集合(我们可以使用列表,但设置更适合确定成员资格)。然后我们将通过movements.csv重写它。

import csv

with open('LCC.csv', 'rb') as lcc:
    lcc_set = set()
    lcc_r = csv.reader(lcc)
    for l in lcc_r:
        for i in l:
            lcc_set.add(i)

with open('movements.csv', 'rb') as movements:
    mov_r = csv.reader(movements)
    with open('output.csv', 'wb') as output:
        out_w = csv.writer(output)
        for l in mov_r:
            #l.pop()
            if l[7] in lcc_set:
                l.append('Yes')
            else:
                l.append('No')
            out_w.writerow(l)

我不清楚你是想添加一个列还是替换最后一个列。我已注释掉将导致最后一列替换为YesNo

的行

答案 2 :(得分:0)

您的代码中存在相当多的错误。他们在这里被指出:https://stackoverflow.com/a/41224147/3027854

moment.csv的一个问题

ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC 
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu, 
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A, 
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu, 
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,

除标题行外,每行还有一个额外的列。因为他们以&#34;,&#34;结束。我在我的代码

中添加了处理
import csv

f1 = open('LCC.csv', 'rU') 
f2 = open('movements.csv', 'rU')
f3 = open('output.csv', 'w') 

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

# first we will read all LCC values into a set.
LCC_row_values = set()
for LCC_row in c1:
    LCC_row_values.add(LCC_row[0].strip())

row = 0
for movements_row in c2:
    row += 1
    if row == 1:
        # movements_row.append('is_present')
        # c3.writerow(movements_row)
        # skip header of moments.csv file
        continue
    # Remove last extra column from output row
    output_row = movements_row[:-1]
    if movements_row[7] in LCC_row_values:
        output_row.append('Yes')
    else:
        output_row.append('No')
    c3.writerow(output_row)

f1.close()
f2.close()
f3.close()

这里的示例文件是

LCC.csv

Air Ab 
Jamb 
Sw 
AIRF 
EURO

movements.csv

ap,ic,year,y_m,pas,da,ty,airl,ic_a,dest_orig,ic_d,coun,cont,LCC 
Zue,LSZH,2005,200501,25,1/1/2005,Dep,"EURO",EUJ,"Mans C",EG,Gb,Eu, 
Zue,LSZH,2005,200501,204,1/1/2005,Arr,"Sw",SWR,"Dar",HA,Tans,A, 
Ba,LSZM,2005,200501,191,1/1/2005,Arr,"AIRF",AFR,"PG",LG,Fr,Eu, 
Zue,LSZH,2005,200501,228,1/1/2005,Dep,"THA",THA,Bang,VD,Th,As,

output.csv

Zue,LSZH,2005,200501,25,1/1/2005,Dep,EURO,EUJ,Mans C,EG,Gb,Eu,Yes
Zue,LSZH,2005,200501,204,1/1/2005,Arr,Sw,SWR,Dar,HA,Tans,A,Yes
Ba,LSZM,2005,200501,191,1/1/2005,Arr,AIRF,AFR,PG,LG,Fr,Eu,Yes
Zue,LSZH,2005,200501,228,1/1/2005,Dep,THA,THA,Bang,VD,Th,As,No