合并来自重复编号为csv的所有csv列中的特定列数据

时间:2019-04-23 14:02:35

标签: python python-3.x pandas import-csv

我编写了一个python脚本,该脚本从SIS中获取大量学生数据,其中显示了每个学生的完整课程表。每个班级都有自己的一行,因此每个学生都有多行,因为他们有多个班级。该脚本将写入一个新的csv文件,只有我需要的数据才在脚本中定义,以仅查找某些类名。

但是,这一切都按预期工作。...在最终的csv文件中,而不是像这样的多行:

jane doe, 123456, Language arts, Teacherone@ourdomain.org
jane doe, 123456, Math, Teachertwo@ourdomain.org
Suzie Que, 321256, Math, Teachertwo@ourdomain.org
Suzie Que, 321256, English 101, Teacherthree@ourdomain.org
Johnny Appleseed, 321321, Language Arts, Teacherone@ourdomain.org
Johnny Appleseed, 321321, Math, Teacherone@ourdomai.org

我希望最终的csv文件如下所示:

Jane doe, 123456, Language Arts; Math, Teacherone@ourdomain.org; 
Teachertwo@ourdomain.org

Suzie Que, 321256, Math; English 101, Teachertwo@ourdomain.org; 
Teacherthree@ourdomain.org

Johnny Appleseed, 321321, Language Arts; Math, Teacherone@ourdomain.org

我已经研究过熊猫,但是不知道如何实现。

任何帮助,将不胜感激。

以下代码:

        import csv

def ixl():
    with open(r'C:\Users\sftp\PS\IMPORTED\pythonscripts\ixl\IXL 
CSV\IXL_DATA2.csv') as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        with open(r'C:\Users\sftp\PS\IMPORTED\pythonscripts\ixl\IXL 
CSV\NEW_studentexport.csv', mode='w', newline='') as output_file:
            write = csv.writer(output_file, delimiter=',', 
quoting=csv.QUOTE_MINIMAL)
            for row in csv_reader:
                Title = row[6]
                coursename = row[9]
                firstname = row[13]
                lastname = row[16]
                grade = row[14]
                studentnumber = row[17]
                studentidnumber = row[18]
                teacheremail = row[19]
                teacherfirst = row[20]
                teacherlast = row[21]
                stud_username = studentidnumber + "@highpointaca"
                password = int(studentnumber) + int(studentidnumber)


                if Title in ('Math 7', 'Albebra 1', 'Algebra 1 Honors', 
'Algebra 2', 'Algebra 2 Honors', 'Dual Enrollment College Algebra (MAT 
110', 
'Dual Enrollment English Comp. (ENG 102)' , 'Reading 5' , 'Pre-Calculus 
Honors' , 'Pre-Algebra8' , 'Pre-Algebra' , 'Mathematics' , 'Math K' , 
'Math 
7' , 'Math 6 Honors' , 'Math 6' , 'Math 5' , 'Math 4' , 'Math 3' , 'Math 
2' , 
'Math 1' , 'Language Arts 5', 'Language Arts 4', 'Language Arts 3', 
'Language 
Arts 2', 'Language Arts K', 'Language Arts 1', 'Language Arts', 'Geometry 
Honors', 'Geometry', 'Essentials of Math I', 'English 4', 'English 3', 
'English 2', 'English 1 Honors', 'English 1', 'ELA 7 Honors', 'ELA 6 
Honors', 
'ELA 8', 'ELA 7', 'ELA 6', 'Dual Enrollment English Comp. (ENG 101)'):

                    write.writerow([firstname, lastname, studentidnumber, 
grade, teacheremail, stud_username, password, Title])


if __name__ == '__main__':
    ixl()

1 个答案:

答案 0 :(得分:0)

使用csv模块和collections.defaultdict

演示:

import csv
from collections import defaultdict

result = defaultdict(list)

with open("input.csv") as infile:     #Read csv
    reader = csv.reader(infile)
    for row in reader:
        result[row[0]].append(row)     #Group by name

final_result = []    
for k, v in  result.items():
    temp = v[0]
    for i in v[1:]:
        temp[2] += ", " + i[2]         #Concatenate subject names
    final_result.append(temp)

with open("output.csv", "w") as outfile:
    writer = csv.writer(outfile)
    writer.writerows(final_result)         #Write back to csv