Question

如果col1等于前一行中的相同值，我尝试连接col3，然后将输出写入新文件。我有一个CSV文件，如下所示：

col1,col2,col3
a,12,"hello "
a,13,"good day"
a,14,"nice weather"
b,1,"cat"
b,2,"dog and cat"
c,2,"animals are cute"

我希望输出：

col1,col3
a,"hello good day nice weather"
b,"cat dog and cat"
c,"animals are cute"

这就是我的尝试：

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    next(reader)
    for row in reader:
        while row[0]==row[0]:
            concat_text=" ".join(row[2])
        print concat_text
        writer.writerow((row[0],concat_text))

它运行但我没有输出。帮助赞赏。

Answer 1

如果您对使用pandas感兴趣，可以将Me.County.ControlSource = "Select County_Name FROM Natl_Structure WHERE 1=1" & IIf(IsNull(Forms![Postal]![State]),"", " AND State_Name = " & Forms![Postal]![State]) & IIf(IsNull(Forms![Postal]![Country]),""," AND Country = " & Forms![Postal]![Country])分组，然后输出唯一值：

DataFrame

您的原始DataFrame

import pandas as pd

df = pd.read_csv('test.txt')
print(df)

第二个DataFrame

  col1  col2              col3
0    a    12            hello 
1    a    13          good day
2    a    14      nice weather
3    b     1               cat
4    b     2       dog and cat
5    c     2  animals are cute

将导致：

df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()

print(df2)

要连接第三列，您需要使用col1 col3 0 a [hello , good day, nice weather] 1 b [cat, dog and cat] 2 c [animals are cute]：

apply

完整代码：

df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))

  col1                          col3
0    a   hello good day nice weather
1    b               cat dog and cat
2    c              animals are cute

Answer 2

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv', 'wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    prior_val = None
    text = []
    for line in reader:
        if line[0] == prior_val:
            text.append(line[2])
        else:
            if text:
                writer.writerow([prior_val, " ".join(text)])
            prior_val = line[0]
            text = [line[2]]
    if text:
        writer.writerow([prior_val, " ".join(text)])

>>> !cat outputfile.csv
col1,col3
a,hello  good day nice weather
b,cat dog and cat
c,animals are cute

>>> pd.read_csv('outputfile.csv', index_col=0)
                          col3
col1                              
a     hello  good day nice weather
b                  cat dog and cat
c                 animals are cute

Answer 3

问题是你要将同一行与自身进行比较。此版本将最后一行与当前行进行比较。输出不是引号分隔，但它是正确的。 script.py的内容

#!/usr/bin/env python

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    next(reader)
    lastRow = None
    # assumes data is in order on first column
    for row in reader:
        if not lastRow:
            # start processing line with the first column and third column
            concat_text = row[2].strip()
            lastRow = row
            print concat_text
        else:
            if lastRow[0]==row[0]:
                # add to line
                concat_text = concat_text + ' ' + row[2].strip()
                print concat_text
            else:
                # end processing
                print concat_text
                writer.writerow((lastRow[0],concat_text))
                # start processing
                concat_text = row[2]
                print concat_text
            lastRow = row
    # write out last element
    print concat_text
    writer.writerow((lastRow[0],concat_text))

运行./script.py后输出outputfile.csv的内容

a,hello good day nice weather
b,cat dog and cat
c,animals are cute

如何在CSV文件中使用相同的键对后续行进行分组

3 个答案: