假设我有以下CSV文件(subjects.csv)
subjects,name1,name2,name3
Chemistry,Tom,Will,Rob
Biology,Megan,Sam,Tim
Physics,Tim,Will,Bob
Maths,Will,Tim,Joe
我想找到哪一对学生分享同一个班级,只关注蒂姆,汤姆和威尔。我将如何在Python中配对这些?
即
蒂姆和威尔将一起上两节课。 汤姆和威尔一起参加了一堂课。此外,我想在一张桌子上绘制这个,就像我在下面写的那样,它在两个轴上都有名字,一对学生都分享了类的数量(名字按字母顺序按升序或降序排序) ..我已经阅读了有关如何为整个CSV文件生成表格的信息,但我无法从头开始制作表格,同时从CSV文件中删除列和行。
Tim Tom Will
Tim 0 0 0
Tom 0 0 1
Will 2 0 0
这超出了我的个人技能水平,但我仍然想知道如何去做并尝试理解。
答案 0 :(得分:4)
您可以创建一个字典,其中包含每个学生正在上课的课程:
>>> import csv
>>> import collections
>>> D = collections.defaultdict(set)
>>> with open('subjects.csv','rb') as f:
... subject_reader = csv.reader(f)
... header = subject_reader.next()
... for row in subject_reader:
... for name in row[1:]:
... D[name].add(row[0])
...
>>> import pprint
>>> pprint.pprint(dict(D))
{'Bob': set(['Physics']),
'Joe': set(['Maths']),
'Megan': set(['Biology']),
'Rob': set(['Chemistry']),
'Sam': set(['Biology']),
'Tim': set(['Biology', 'Maths', 'Physics']),
'Tom': set(['Chemistry']),
'Will': set(['Chemistry', 'Maths', 'Physics'])}
>>>
要查看人们聚在一起的课程数量,您可以使用set的交集方法:
>>> D['Tom'].intersection(D['Will'])
set(['Chemistry'])
>>> len(_)
1
>>> D['Tim'].intersection(D['Will'])
set(['Maths', 'Physics'])
>>> len(_)
2
>>>
要打印出示例中的表格,您可以执行以下操作:
>>> EXAMPLE_NAMES = ['Tom','Tim','Will']
>>> for y_name in EXAMPLE_NAMES:
... print '{0:{width}}'.format(y_name,width=5),
... for x_name in EXAMPLE_NAMES:
... if y_name==x_name:
... print '{0:{width}}'.format('-'*5, width=5),
... else:
... print '{0:{width}}'.format(len(D[y_name].intersection(D[x_name])), width=5),
... print
...
Tom ----- 0 1
Tim 0 ----- 2
Will 1 2 -----
表格的标题可能如下所示:
>>> for x_name in [' ']+EXAMPLE_NAMES:
... print '{0:{width}}'.format(x_name, width=5),
...
Tom Tim Will
正如约翰在评论中提到的那样,我很难将名字编成一个列表,模仿你上面给出的例子。要查看整个表格,您可以使用.iterkeys()
或.keys()
从您创建的词典中获取或迭代密钥:
>>> import csv
>>> import collections
>>>
>>> my_d = collections.defaultdict(set)
>>> with open('subjects.csv','rb') as f:
... subject_reader = csv.reader(f)
... header = subject_reader.next()
... for row in subject_reader:
... for name in row[1:]:
... my_d[name].add(row[0])
...
>>> def display_header(D):
... for x_name in [' ']+D.keys():
... print '{0:{width}}'.format(x_name, width=5),
... print
...
>>> def display_body(D):
... for y_name in D.iterkeys():
... print '{0:{width}}'.format(y_name,width=5),
... for x_name in D.iterkeys():
... if y_name==x_name:
... print '{0:{width}}'.format('-'*5, width=5),
... else:
... print '{0:{width}}'.format(len(D[y_name].intersection(D[x_name])), width=5),
... print
...
>>> def display_table(D):
... display_header(D)
... display_body(D)
...
>>> display_table(my_d)
Sam Rob Megan Will Tim Joe Tom Bob
Sam ----- 0 1 0 1 0 0 0
Rob 0 ----- 0 1 0 0 1 0
Megan 1 0 ----- 0 1 0 0 0
Will 0 1 0 ----- 2 1 1 1
Tim 1 0 1 2 ----- 1 0 1
Joe 0 0 0 1 1 ----- 0 0
Tom 0 1 0 1 0 0 ----- 0
Bob 0 0 0 1 1 0 0 -----
>>>