我想通读第一个file1.csv,如果策略存在于file2.csv中,请获取策略的特定ID,并从file3.csv获取该策略ID的计数。 所以我有3个csv文件file1.csv file2.csv file3.csv,如下所示,其中有数千个类似的行
file2.csv
Name Policies
Raj 12345, 676, 909
Sam 786
Lucy 899, 7676, 09
file2.csv
Policies ID
676, 8787 212
909,898,707 342
89, 98,09 345
file3.csv
ID Count
212 56
342 23
345 07
所以最后我的最终输出看起来像存储在文件或csv中。可以使用熊猫,numpy或任何东西
Final.csv
Name tuple of [Policies, ID, Count]
Raj [676,212,56]
Raj [909, 342, 23]
Lucy [09, 345, 07]
我坚持使用以下代码:
policyid = csv.reader( 'file2.csv', delimiter=',')
with open('file1.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
data = row['Policies'].split(",")
if data:
for policy in data:
for policy, id in policyid:
data2 = policy.split(",")
if policy in data2:
print id
答案 0 :(得分:1)
执行此操作的一种方法是读入所有三个CSV文件,从file1中获取值,然后扫描file2和file3以获取这些值。这是一个额外的难度,因为字段中以逗号分隔的列表是反模式,迫使我们做一些额外的工作来解析文本。
另一种方法是将所有三个CSV文件加载到SQL表或数据框中并执行一些JOIN,但逗号分隔的列表仍然会使这很困难。
以下是我所描述的一个例子,虽然这无疑是凌乱的:
import csv
with open('file1.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file1 = [row for row in reader]
with open('file2.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file2 = [row for row in reader]
with open('file3.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file3 = [row for row in reader]
def get_policy_id(policy):
for line in file2:
policies = line['policies'].split(', ')
if policy in policies:
return line['ID']
def get_id_count(id):
for line in file3:
if id == line['id']:
return line['count']
output = []
for line in file1:
policies = line['policies'].split(', ')
for policy in policies:
id = get_policy_id(policy)
count = get_id_count(id)
output.append({'name': line['name'],
'policy': policy,
'id': id,
'count': count})