我需要转换
Name | Org
a | 5
a | 6
b | 5
c | 7
到
Name | Org
a | 5,6
b | 5
c | 7
我的第一次尝试是使用此代码
while i < len(nameColumn):
if nameColumn[i] not in resultC1:
resultC1.append(nameColumn[i])
while l < len(nameColumn):
if nameColumn[l] == nameColumn[i]:
tempdata += organizationColumn[l] + ','
l += 1
resultC2.append(tempdata[:-1])
tempdata = ''
k += 1
i += 1
最后以结果
结束Name | Org
a |
b |
c |
非常感谢任何帮助。我还没有找到任何相关的东西。我正在将.CSV文件中的数据读入列表并使用该数据并将结果存储到resultC1和resultC2
答案 0 :(得分:1)
以下是使用collections.OrderedDict
的解决方案:
import csv
from collections import OrderedDict
data = OrderedDict()
with open('test.csv') as f:
reader = csv.reader(f)
for i, line in enumerate(reader):
if i == 0:
continue
if line[0] not in data:
data[line[0]] = []
data[line[0]].append(line[1])
for k, v in data.items():
print(k, '|', ', '.join(v))
OrderedDict
保留其订单。密钥是Name
s,值是与之关联的所有Org
的列表。
输出:
a | 5, 6
b | 5
c | 7
如果你的csv与逗号有不同的分隔符,那么你必须指定该分隔符。我在我的例子中假设了逗号。
这是一个更简单的pandas
解决方案:
In [443]: df.head()
Out[443]:
Name Org
0 a 5
1 a 6
2 b 5
3 c 7
In [445]: for k, v in df.groupby('Name').apply(lambda x: list(x['Org'])).iteritems():
...: print(k, '|', ', '.join(map(str, v)))
...:
a | 5, 6
b | 5
c | 7
答案 1 :(得分:0)
假设您从示例代码中隐含的两个数组开始,我会选择以下内容:
from collections import defaultdict
nameColumn = ['a', 'a', 'b', 'c']
organizationColumn = ["5", "6", "5", "7"]
merged = defaultdict(list)
for name, org in zip(nameColumn, organizationColumn):
merged[name].append(org)
for k, v in merged.items():
print(f'{k} | {v}'))
答案 2 :(得分:0)
使用OrderedDict
调用setdefault
空list
和csv
模块:
import csv
from collections import OrderedDict
organizations = OrderedDict()
with open(filename) as infile:
for name, org in csv.reader(infile, delimiter='|'):
organizations.setdefault(name, []).append(org)
然后你可以写字典:
with open(filename, 'w') as outfile:
writer = csv.writer(outfile, delimiter='|')
for name, orgs in organizations.items():
writer.writerow([name, ','.join(orgs)])
答案 3 :(得分:0)
使用itertools.groupby()
函数的解决方案:
import csv, itertools
with open('yourfile.csv', 'r') as f:
reader = csv.reader(f, delimiter='|', skipinitialspace=True)
head = next(reader) # header line
items = [list(g) for k,g in itertools.groupby(sorted(reader), key=lambda x: x[0])]
fmt = '{0[0]:<5} | {0[1]:^5}' # format spec
print(fmt.format(head))
for item in items:
print(fmt.format([item[0][0], ','.join(i[1] for i in item)] if len(item) > 1 else item[0]))
输出:
Name | Org
a | 5,6
b | 5
c | 7
答案 4 :(得分:0)
这是另一种解决方案,它可以是通用的,具有输入和输出文件的分隔符。
def parseData(fileName, delimiter):
dictionary={}
with open(fileName, 'r') as iFile:
for line in iFile.readlines():
row = line.split(delimiter)
values = []
if (row[0] in dictionary.keys()):
values = dictionary[row[0]]
values.append(row[1].replace('\n',''))
else:
values.append(row[1].replace('\n',''))
dictionary[row[0]] = values
dictionary[row[0]] = values
## print for debugging purpose
print(dictionary)
return dictionary
def writeData(fileName, odelimiter, idelimiter, dictionary):
with open(fileName, 'w') as oFile:
for key, values in dictionary.items():
data=""
for value in values:
data = data + value + idelimiter
data=data[:-1]
## print for debugging purpose
print(key, data)
oFile.write(key + odelimiter + data + "\n")
## main
dictionary=parseData('inputPipe.txt', "|")
writeData('output.txt', "|", ",", dictionary)
inputPipe.txt
a|5
a|6
b|5
c|7
output.txt的
a|5,6
b|5
c|7
示例运行
{'a': ['5', '6'], 'b': ['5'], 'c': ['7']}
a 5,6
b 5
c 7