两个csv文件之间的相似之处

时间:2014-07-09 16:11:22

标签: python csv

我对Python很新,并且在尝试使用这个程序寻找两个csv文件之间的匹配时遇到了很多麻烦。例如,我有两个csv文件。第一个被称为" list"第二个被称为"例子"

文件"列表"在第一行包含这个: 腿,膝盖,大腿,小腿,踝关节,髋关节,脚,脚趾,小腿,脚,膝盖骨,胫骨,腓骨

文件"示例包含: 昨天学生摔断了腿,今天学生摔断了胳膊,学生今天伤了大腿,学生扭了肘,今天学生踝关节

所以基本上如果csv文件"示例"包含csv文件中的任何单词" list"它应该输出一个新的csv文件,其中包含示例中的句子,但不是。

到目前为止,这是我的代码:`

import csv

   with open("list.csv", "U") as file1, open("example.csv", "rb") as
   file2,open("finalOutput.csv", "wb") as outputfile:
   reader1 = csv.reader(file1,delimiter=';')
   reader2 = csv.reader(file2,delimiter='|')
   writer = csv.writer(outputfile,delimiter='|')

   rows2 = [row for row in reader2]
   for row1 in reader2:
       for row2 in rows2:
           if row1[0] == row2[0]:
               data = [row1[0],row2[0]]
               print data
               writer.writerow(data)

2 个答案:

答案 0 :(得分:1)

为什么不尝试这样的事情(假设您希望打印整行,如果任何单词与第二个文件中的单词匹配。基本上您将第二行作为字符串,然后检查第一个文件中的任何单词是否在那个字符串。如果是这样,写出来。

with open("list.csv", "U") as file1, open("example.csv", "rb") as file2, open("output.csv", "wb+") as file3:
    reader1 = csv.reader(file1)
    reader2 = csv.reader(file2)
    writer = csv.writer(file3)

    reader1_rows = [row for row in reader1]
    reader2_rows = [row for row in reader2]

    for num, row in enumerate(reader1_rows):
        if ([word for word in row if word in ' '.join(reader2_rows[num])]):
            writer.writerow([row, reader2_rows[num]])

根据你的调整评论,我相信这个应该能得到你想要的输出:

with open("list.csv", "U") as file1, open("example.csv", "rb") as file2, open("output.csv", "wb+") as file3:
    reader1 = csv.reader(file1)
    reader2 = csv.reader(file2)
    writer = csv.writer(file3)

    reader1_rows = [row for row in reader1]
    reader2_rows = [row for row in reader2]

    for num, row in enumerate(reader1_rows):
        for word in reader2_rows[num]:
            for item in row:
                if item in word:
                    writer.writerow([item, word])

更多'pythonic'方式可能如下:

with open("list.csv", "U") as file1, open("example.csv", "rb") as file2, open("output.csv", "wb+") as file3:
    reader1 = csv.reader(file1)
    reader2 = csv.reader(file2)
    writer = csv.writer(file3)

    reader1_rows = [row for row in reader1]
    reader2_rows = [row for row in reader2]

    for rowA, rowB in zip(reader1_rows, reader2_rows):
        for word in rowA:
            for item in (item for item in rowB if word in item):
                writer.writerow([word, item])

<小时/> 如果你想将列中的所有数据对齐(这听起来像你应该这样),数据看起来像这样:

leg
knee
thigh
shin
ankle
hip
foot
toe
calf
feet
patella
tibia
fibula

.. ..和

Student broke leg yesterday
Student broke arm today
Student hurt thigh today
Student twisted elbow
Student rolled ankle today

..然后你可以这样做:

with open("example.csv") as file1, open("list.csv") as file2, open("output.csv", "wb+") as file3:
    writer = csv.writer(file3)
    key_words = [word.strip() for word in file2.readlines()]
    for row in file1:
        row = row.strip()
        for key in (key for key in key_words if key in row):
            writer.writerow([key, row])

答案 1 :(得分:0)

据我所知csv文件的结构,我认为你不应该使用csv-reader来加载你的examples文件和你的文字......

import csv

with open("list.csv", "U") as file1, open("example.csv", "rb") as
    file2,open("finalOutput.csv", "wb") as outputfile:

    writer = csv.writer(outputfile,delimiter='|')

    words = set(file1.read().split(','))

    # examples are split by "," so read the whole file and split it by ","
    examples = file2.read().split(',')

    for word in file1:
        for example in examples:
            # if the word happens to be within the example
            if word in example:
                   # add it to your output file
                   data = [word,example]
                   print data
                   writer.writerow(data)