加入CSV字段

时间:2017-07-08 20:10:04

标签: python python-3.x csv

我需要转换

Name | Org
a    | 5
a    | 6
b    | 5
c    | 7

Name | Org
a    | 5,6
b    |  5
c    |  7

我的第一次尝试是使用此代码

    while i < len(nameColumn):
    if nameColumn[i] not in resultC1:
        resultC1.append(nameColumn[i])
        while l < len(nameColumn):
            if nameColumn[l] == nameColumn[i]:
                tempdata += organizationColumn[l] + ','
            l += 1
        resultC2.append(tempdata[:-1])
        tempdata = ''
        k += 1
    i += 1

最后以结果

结束
Name | Org
a    |
b    |
c    |

非常感谢任何帮助。我还没有找到任何相关的东西。我正在将.CSV文件中的数据读入列表并使用该数据并将结果存储到resultC1和resultC2

5 个答案:

答案 0 :(得分:1)

以下是使用collections.OrderedDict的解决方案:

import csv
from collections import OrderedDict

data = OrderedDict()
with open('test.csv') as f:
    reader = csv.reader(f)
    for i, line in enumerate(reader):
        if i == 0:
            continue

        if line[0] not in data:
            data[line[0]] = []

        data[line[0]].append(line[1])

for k, v in data.items():
    print(k, '|', ', '.join(v))

OrderedDict保留其订单。密钥是Name s,值是与之关联的所有Org的列表。

输出:

a | 5, 6
b | 5
c | 7

如果你的csv与逗号有不同的分隔符,那么你必须指定该分隔符。我在我的例子中假设了逗号。

这是一个更简单的pandas解决方案:

In [443]: df.head()
Out[443]: 
  Name  Org
0    a    5
1    a    6
2    b    5
3    c    7

In [445]: for k, v in df.groupby('Name').apply(lambda x: list(x['Org'])).iteritems():
     ...:     print(k, '|', ', '.join(map(str, v)))
     ...:        
a | 5, 6
b | 5
c | 7

答案 1 :(得分:0)

假设您从示例代码中隐含的两个数组开始,我会选择以下内容:

from collections import defaultdict 

nameColumn = ['a', 'a', 'b', 'c']
organizationColumn = ["5", "6", "5", "7"]

merged = defaultdict(list)
for name, org in zip(nameColumn, organizationColumn):
    merged[name].append(org)

for k, v in merged.items():
    print(f'{k} | {v}'))

答案 2 :(得分:0)

使用OrderedDict调用setdefaultlistcsv模块:

import csv
from collections import OrderedDict

organizations = OrderedDict()
with open(filename) as infile:
    for name, org in csv.reader(infile, delimiter='|'):
        organizations.setdefault(name, []).append(org)

然后你可以写字典:

with open(filename, 'w') as outfile:
    writer = csv.writer(outfile, delimiter='|')
    for name, orgs in organizations.items():
        writer.writerow([name, ','.join(orgs)])

答案 3 :(得分:0)

使用itertools.groupby()函数的解决方案:

import csv, itertools

with open('yourfile.csv', 'r') as f:
    reader = csv.reader(f, delimiter='|', skipinitialspace=True)
    head = next(reader)   # header line
    items = [list(g) for k,g in itertools.groupby(sorted(reader), key=lambda x: x[0])]

    fmt = '{0[0]:<5} | {0[1]:^5}'  # format spec
    print(fmt.format(head))
    for item in items:
        print(fmt.format([item[0][0], ','.join(i[1] for i in item)] if len(item) > 1 else item[0]))

输出:

Name  |  Org 
a     |  5,6 
b     |   5  
c     |   7  

答案 4 :(得分:0)

这是另一种解决方案,它可以是通用的,具有输入和输出文件的分隔符。

def parseData(fileName, delimiter):
    dictionary={}
    with open(fileName, 'r') as iFile:
        for line in iFile.readlines():
            row = line.split(delimiter)
            values = []
            if (row[0] in dictionary.keys()):
                values = dictionary[row[0]]
                values.append(row[1].replace('\n',''))
            else:
                values.append(row[1].replace('\n',''))
                dictionary[row[0]] = values
        dictionary[row[0]] = values
    ## print for debugging purpose
    print(dictionary)
    return dictionary

def writeData(fileName, odelimiter, idelimiter, dictionary):
    with open(fileName, 'w') as oFile:
        for key, values in dictionary.items():
            data=""
            for value in values:
                data = data + value + idelimiter
            data=data[:-1]
            ## print for debugging purpose
            print(key, data)
            oFile.write(key + odelimiter + data + "\n")

## main
dictionary=parseData('inputPipe.txt', "|")
writeData('output.txt', "|", ",", dictionary)

inputPipe.txt

a|5
a|6
b|5
c|7

output.txt的

a|5,6
b|5
c|7

示例运行

{'a': ['5', '6'], 'b': ['5'], 'c': ['7']}
a 5,6
b 5
c 7