如果存在元素匹配,则从一个CSV文件中读取Python从另一个csv中搜索相应的行

时间:2017-09-13 15:00:45

标签: python python-3.x list csv file-read

我想通读第一个file1.csv,如果策略存在于file2.csv中,请获取策略的特定ID,并从file3.csv获取该策略ID的计数。 所以我有3个csv文件file1.csv file2.csv file3.csv,如下所示,其中有数千个类似的行

file2.csv
Name   Policies
Raj    12345, 676, 909
Sam    786
Lucy   899, 7676, 09

file2.csv
Policies       ID
676, 8787      212
909,898,707    342
89, 98,09      345

file3.csv
ID  Count
212 56
342 23
345 07

所以最后我的最终输出看起来像存储在文件或csv中。可以使用熊猫,numpy或任何东西

Final.csv
Name  tuple of [Policies, ID, Count]
Raj     [676,212,56]
Raj     [909, 342, 23]
Lucy    [09, 345, 07]

我坚持使用以下代码:

policyid = csv.reader( 'file2.csv', delimiter=',')
with open('file1.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        data = row['Policies'].split(",")
        if data:
            for policy in data:
                for policy, id in policyid:
                    data2 = policy.split(",")
                        if policy in data2:
                            print id

1 个答案:

答案 0 :(得分:1)

执行此操作的一种方法是读入所有三个CSV文件,从file1中获取值,然后扫描file2和file3以获取这些值。这是一个额外的难度,因为字段中以逗号分隔的列表是反模式,迫使我们做一些额外的工作来解析文本。

另一种方法是将所有三个CSV文件加载到SQL表或数据框中并执行一些JOIN,但逗号分隔的列表仍然会使这很困难。

以下是我所描述的一个例子,虽然这无疑是凌乱的:

import csv

with open('file1.csv') as f:
    reader = csv.DictReader(f)
    next(reader)  # Skip header
    file1 = [row for row in reader]
with open('file2.csv') as f:
    reader = csv.DictReader(f)
    next(reader)  # Skip header
    file2 = [row for row in reader]
with open('file3.csv') as f:
    reader = csv.DictReader(f)
    next(reader)  # Skip header
    file3 = [row for row in reader]


def get_policy_id(policy):
    for line in file2:
        policies = line['policies'].split(', ')
        if policy in policies:
            return line['ID']


def get_id_count(id):
    for line in file3:
        if id == line['id']:
            return line['count']


output = []
for line in file1:
    policies = line['policies'].split(', ')
    for policy in policies:
        id = get_policy_id(policy)
        count = get_id_count(id)
        output.append({'name': line['name'],
                       'policy': policy,
                       'id': id,
                       'count': count})