如何在CSV文件中使用相同的键对后续行进行分组

时间:2016-03-23 20:22:13

标签: python string csv

如果col1等于前一行中的相同值,我尝试连接col3,然后将输出写入新文件。我有一个CSV文件,如下所示:

col1,col2,col3
a,12,"hello "
a,13,"good day"
a,14,"nice weather"
b,1,"cat"
b,2,"dog and cat"
c,2,"animals are cute"

我希望输出:

col1,col3
a,"hello good day nice weather"
b,"cat dog and cat"
c,"animals are cute"

这就是我的尝试:

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    next(reader)
    for row in reader:
        while row[0]==row[0]:
            concat_text=" ".join(row[2])
        print concat_text
        writer.writerow((row[0],concat_text))

它运行但我没有输出。帮助赞赏。

3 个答案:

答案 0 :(得分:3)

如果您对使用pandas感兴趣,可以将Me.County.ControlSource = "Select County_Name FROM Natl_Structure WHERE 1=1" & IIf(IsNull(Forms![Postal]![State]),"", " AND State_Name = " & Forms![Postal]![State]) & IIf(IsNull(Forms![Postal]![Country]),""," AND Country = " & Forms![Postal]![Country]) 分组,然后输出唯一值:

DataFrame

您的原始DataFrame

import pandas as pd

df = pd.read_csv('test.txt')
print(df)

第二个DataFrame

  col1  col2              col3
0    a    12            hello 
1    a    13          good day
2    a    14      nice weather
3    b     1               cat
4    b     2       dog and cat
5    c     2  animals are cute

将导致:

df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()

print(df2)

要连接第三列,您需要使用 col1 col3 0 a [hello , good day, nice weather] 1 b [cat, dog and cat] 2 c [animals are cute]

apply

完整代码:

df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))

  col1                          col3
0    a   hello good day nice weather
1    b               cat dog and cat
2    c              animals are cute

答案 1 :(得分:1)

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv', 'wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    prior_val = None
    text = []
    for line in reader:
        if line[0] == prior_val:
            text.append(line[2])
        else:
            if text:
                writer.writerow([prior_val, " ".join(text)])
            prior_val = line[0]
            text = [line[2]]
    if text:
        writer.writerow([prior_val, " ".join(text)])

>>> !cat outputfile.csv
col1,col3
a,hello  good day nice weather
b,cat dog and cat
c,animals are cute

>>> pd.read_csv('outputfile.csv', index_col=0)
                          col3
col1                              
a     hello  good day nice weather
b                  cat dog and cat
c                 animals are cute

答案 2 :(得分:0)

问题是你要将同一行与自身进行比较。此版本将最后一行与当前行进行比较。输出不是引号分隔,但它是正确的。 script.py的内容

#!/usr/bin/env python

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    next(reader)
    lastRow = None
    # assumes data is in order on first column
    for row in reader:
        if not lastRow:
            # start processing line with the first column and third column
            concat_text = row[2].strip()
            lastRow = row
            print concat_text
        else:
            if lastRow[0]==row[0]:
                # add to line
                concat_text = concat_text + ' ' + row[2].strip()
                print concat_text
            else:
                # end processing
                print concat_text
                writer.writerow((lastRow[0],concat_text))
                # start processing
                concat_text = row[2]
                print concat_text
            lastRow = row
    # write out last element
    print concat_text
    writer.writerow((lastRow[0],concat_text))
运行./script.py后输出outputfile.csv的内容

a,hello good day nice weather
b,cat dog and cat
c,animals are cute