如果col1等于前一行中的相同值,我尝试连接col3,然后将输出写入新文件。我有一个CSV文件,如下所示:
col1,col2,col3
a,12,"hello "
a,13,"good day"
a,14,"nice weather"
b,1,"cat"
b,2,"dog and cat"
c,2,"animals are cute"
我希望输出:
col1,col3
a,"hello good day nice weather"
b,"cat dog and cat"
c,"animals are cute"
这就是我的尝试:
import csv
with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
reader=csv.reader(inputfile)
writer=csv.writer(outputfile)
next(reader)
for row in reader:
while row[0]==row[0]:
concat_text=" ".join(row[2])
print concat_text
writer.writerow((row[0],concat_text))
它运行但我没有输出。帮助赞赏。
答案 0 :(得分:3)
如果您对使用pandas
感兴趣,可以将Me.County.ControlSource = "Select County_Name FROM Natl_Structure WHERE 1=1" &
IIf(IsNull(Forms![Postal]![State]),"", " AND State_Name = " & Forms![Postal]![State]) &
IIf(IsNull(Forms![Postal]![Country]),""," AND Country = " & Forms![Postal]![Country])
分组,然后输出唯一值:
DataFrame
您的原始DataFrame
import pandas as pd
df = pd.read_csv('test.txt')
print(df)
第二个DataFrame
col1 col2 col3
0 a 12 hello
1 a 13 good day
2 a 14 nice weather
3 b 1 cat
4 b 2 dog and cat
5 c 2 animals are cute
将导致:
df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()
print(df2)
要连接第三列,您需要使用 col1 col3
0 a [hello , good day, nice weather]
1 b [cat, dog and cat]
2 c [animals are cute]
:
apply
完整代码:
df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))
col1 col3
0 a hello good day nice weather
1 b cat dog and cat
2 c animals are cute
答案 1 :(得分:1)
import csv
with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv', 'wb') as outputfile:
reader=csv.reader(inputfile)
writer=csv.writer(outputfile)
prior_val = None
text = []
for line in reader:
if line[0] == prior_val:
text.append(line[2])
else:
if text:
writer.writerow([prior_val, " ".join(text)])
prior_val = line[0]
text = [line[2]]
if text:
writer.writerow([prior_val, " ".join(text)])
>>> !cat outputfile.csv
col1,col3
a,hello good day nice weather
b,cat dog and cat
c,animals are cute
>>> pd.read_csv('outputfile.csv', index_col=0)
col3
col1
a hello good day nice weather
b cat dog and cat
c animals are cute
答案 2 :(得分:0)
问题是你要将同一行与自身进行比较。此版本将最后一行与当前行进行比较。输出不是引号分隔,但它是正确的。 script.py的内容
#!/usr/bin/env python
import csv
with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
reader=csv.reader(inputfile)
writer=csv.writer(outputfile)
next(reader)
lastRow = None
# assumes data is in order on first column
for row in reader:
if not lastRow:
# start processing line with the first column and third column
concat_text = row[2].strip()
lastRow = row
print concat_text
else:
if lastRow[0]==row[0]:
# add to line
concat_text = concat_text + ' ' + row[2].strip()
print concat_text
else:
# end processing
print concat_text
writer.writerow((lastRow[0],concat_text))
# start processing
concat_text = row[2]
print concat_text
lastRow = row
# write out last element
print concat_text
writer.writerow((lastRow[0],concat_text))
运行./script.py后输出outputfile.csv的内容
a,hello good day nice weather
b,cat dog and cat
c,animals are cute