我有一个包含三个主要栏目的CSV,我需要注入。
其中一个是名为“Material”的产品名称 其中一个是名为“Serial”的组名 决赛是“相关”,将Martial与Serial
相匹配目前,CSV将如下所示: (例如,有更多字段和不同数据)
Martial | Serial | Related
ExOne | GroupOne |
ExTwo | GroupOne |
ExThree | GroupOne |
ExFour | GroupTwo |
ExFive | GroupTwo |
ExSix | GroupThree |
我需要通过序列将每个军事匹配到每个军事但限于五个(并以“///”分隔
示例结果应如下所示:
Martial | Serial | Related
ExOne | GroupOne | ExOne///ExTwo///ExThree
ExTwo | GroupOne | ExOne///ExTwo///ExThree
ExThree | GroupOne | ExOne///ExTwo///ExThree
ExFour | GroupTwo | ExFour///ExFive
ExFive | GroupTwo | ExFour///ExFive
ExSix | GroupThree | ExSix
这是我在Python上的第一次尝试,我现在尝试过的代码只涉及我所说的内容。我正在构建代码的方式是一点一滴,第一位(目标)是匹配串行组并列出所有军事项目,例如:
GroupOne
ExOne
ExTwo
ExThree
GroupTwo
ExFour
ExFive
GroupSix
ExSix
然后从那里我可以制作案例并按因子组合(如果超过5等)
import csv
import sys
with open('EGLOINDOORCSV.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
Materials = []
Serials = []
for row in readCSV:
Material = row[0]
Serial = row[4]
Materials.append(Material)
Serials.append(Serial)
if Serial == Serial:
print(Serial)
print(Material, end = "///")
print("\n")
break
print("Done")
答案 0 :(得分:2)
首先让我们重新创建一个示例文件:
data = '''\
Martial|Serial|Related
ExOne|GroupOne|
ExTwo|GroupOne|
ExThree|GroupOne|
ExFour|GroupTwo|
ExFive|GroupTwo|
ExSix|GroupThree|'''
with open('test.csv', 'w') as f:
f.write(data)
现在使用Pandas的实际代码(Pandas与Anaconda软件包一起提供)。使用pip install pandas
在没有anaconda的情况下安装它。
import pandas as pd
df = pd.read_csv('test.csv', sep='|')
df['Related'] = df['Serial'].map(df.groupby('Serial')['Martial']
.apply(lambda x: '///'.join(x)))
df.to_csv('output.csv', index=False)
返回:
Martial Serial Related
0 ExOne GroupOne ExOne///ExTwo///ExThree
1 ExTwo GroupOne ExOne///ExTwo///ExThree
2 ExThree GroupOne ExOne///ExTwo///ExThree
3 ExFour GroupTwo ExFour///ExFive
4 ExFive GroupTwo ExFour///ExFive
5 ExSix GroupThree ExSix
答案 1 :(得分:1)
这是使用收件箱itertools的方法,您无需安装任何额外的包。然后,这就是如何使用字典和列表理解以 pythonistic方式编写它。
一步一步的方法:
#reading all file at once
import csv
with open('EGLOINDOORCSV.csv') as csvfile:
l=[r for r in csv.reader(csvfile, delimiter=r',')][1:] #skip header
#itertools requires sorted data. Sorting by second field.
key=lambda x: x[1]
l = sorted( l, key = key)
#grouping to an aux dictionary
from itertools import groupby
d={ k: "///".join( x[0] for x in g) for k,g in groupby( l, key) }
#updating third column from aux dictionary
for x in l:
x[2]=d[x[1]]
Etvoilà!
#this is the content of l, ready to go back to a new csv
[
['ExOne', 'GroupOne', 'ExOne///ExTwo///ExThree'],
['ExTwo', 'GroupOne', 'ExOne///ExTwo///ExThree'],
['ExThree', 'GroupOne', 'ExOne///ExTwo///ExThree'],
['ExSix', 'GroupThree', 'ExSix'],
['ExFour', 'GroupTwo', 'ExFour///ExFive'],
['ExFive', 'GroupTwo', 'ExFour///ExFive'],
]
免责声明:这是一个完整的解决方案,但请记住,pandas是您处理数据的朋友,请记住安装它并转移到如果您需要管理大量数据,请使用pandas解决方案。
原始数据
$cat EGLOINDOORCSV.csv
Martial,Serial,Related
ExOne,GroupOne,
ExTwo,GroupOne,
ExThree,GroupOne,
ExFour,GroupTwo,
ExFive,GroupTwo,
ExSix,GroupThree,
答案 2 :(得分:1)
我的方法是两次读取CSV。在第一遍中,我收集相关信息,在第二遍中,输出:
import csv
# Pass 1: gather related materials
with open('EGLOINDOORCSV.csv') as csvfile:
reader = csv.reader(csvfile)
related = {}
for row in reader:
material = row[0]
serial = row[1]
related.setdefault(serial, set()).add(material)
# print(related) # for debugging
# Pass 2: print
with open('EGLOINDOORCSV.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
material = row[0]
serial = row[1]
print('%s | %s | %s' % (material, serial, '///'.join(sorted(related[serial]))))
输出:
ExOne | GroupOne | ExOne///ExThree///ExTwo
ExTwo | GroupOne | ExOne///ExThree///ExTwo
ExThree | GroupOne | ExOne///ExThree///ExTwo
ExFour | GroupTwo | ExFive///ExFour
ExFive | GroupTwo | ExFive///ExFour
ExSix | GroupThree | ExSix
我假设你的CSV文件没有标题。如果你这样做,你将需要跳过它:
reader = csv.reader(csvfile)
next(reader) # Skip the header, then move on
row[0]
分配给material
,请调整索引编号以匹配您的文件related
字典这本字典是我保持关系的地方,它看起来像这样:
{
"GroupTwo": set(["ExFour", "ExFive"]),
"GroupOne": set(["ExOne", "ExThree", "ExTwo"]),
"GroupThree": set(["ExSix"])
}
在我的代码中,声明:
related.setdefault(serial, set()).add(material)
是:
的简写 if serial not in related:
related[serial] = set()
related[serial].add(material)