我有学校和所提供课程的清单。我还列出了一些独特的课程,其中各学校仅提供一些课程,有些则没有。我创建了一个循环,为每个学校输出缺少的班级以及该学校的名称,但是我无法将for循环的全部结果输出到csv。
我已经能够将一所学校的课程写到csv,但是我无法将包括所有学校的for循环的整个结果写到csv。
我知道我需要将for循环的结果插入到数据帧中。下一步将是遍历数据帧并将结果逐行发送到csv,但是我首先需要将结果从for循环中获取到数据帧中。
schools = {'School': ['School A', 'School A', 'School A', 'School B', 'School B', 'School B', 'School C','School C', 'School D'], 'Class': ['Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'Physics']}
dfSchool = pd.DataFrame(data=schools)
dfSchool
classes = {'Class': ['Math', 'Chemistry', 'English', 'History', 'Physics']}
dfClasses = pd.DataFrame(data=classes)
dfClasses
grouped = dfSchool.groupby('School')
for name, group in grouped:
print(name)
print(dfClasses[~(dfClasses.Class.isin(group["Class"]))])
listFinal = []
for name, group in grouped:
print(name)
print(dfClasses[~(dfClasses.Class.isin(group["Class"]))])
listFinal.append(name)
listFinal.append(dfClasses[~(dfClasses.Class.isin(group["Class"]))])
dfOutput = pd.DataFrame(listFinal)
dfOutput.to_csv('SchoolClasses.csv', index=True)
实际结果: 控制台包含以下输出,但是当写入csv时,我在文件中仅获得学校A。我希望将下面的所有输出(所有学校)都写入csv文件。
School A
Class
3 History
4 Physics
School B
Class
3 History
4 Physics
School C
Class
2 English
3 History
4 Physics
School D
Class
0 Math
1 Chemistry
2 English
3 History
所需结果: 上面的输出,但是在单个csv文件中。如果您可以将学校名称放在其相应班级的每一行中,而不仅仅是将学校名称作为标题,则可以加分。
当尝试将for循环的结果放入数据帧时,我得到:
listFinal
['School A', Class
3 History
4 Physics, 'School B', Class
3 History
4 Physics, 'School C', Class
2 English
3 History
4 Physics, 'School D', Class
0 Math
1 Chemistry
2 English
3 History]
答案 0 :(得分:1)
创建学校数据框:
schools = {
"School": [
"School A",
"School A",
"School A",
"School B",
"School B",
"School B",
"School C",
"School C",
"School D",
],
"Class": [
"Math",
"Chemistry",
"English",
"Math",
"Chemistry",
"English",
"Math",
"Chemistry",
"Physics",
],
}
dfSchool = pd.DataFrame(data=schools)
print(dfSchool)
School Class
0 School A Math
1 School A Chemistry
2 School A English
3 School B Math
4 School B Chemistry
5 School B English
6 School C Math
7 School C Chemistry
8 School D Physics
创建一个数据框,以显示所有学校都有所有班级的情况。称为df_tot
s = ['School A'] * len(c) + ['School B']* len(c) + ['School C']* len(c) + ['School D']* len(c)
c = ['Math', 'Chemistry', 'English', 'History', 'Physics']
df_tot = pd.DataFrame([s, c*4], index=['School','Class']).T
print(df_tot)
School Class
0 School A Math
1 School A Chemistry
2 School A English
3 School A History
4 School A Physics
5 School B Math
6 School B Chemistry
7 School B English
8 School B History
9 School B Physics
10 School C Math
11 School C Chemistry
12 School C English
13 School C History
14 School C Physics
15 School D Math
16 School D Chemistry
17 School D English
18 School D History
19 School D Physics
进行外部合并,然后将指示器选择为True,然后过滤_merge == left_only。
df_tot = df_tot[df_tot.merge(dfSchool, how='outer', indicator=True)['_merge'] == 'left_only'])
print(df_tot)
School Class
3 School A History
4 School A Physics
8 School B History
9 School B Physics
12 School C English
13 School C History
14 School C Physics
15 School D Math
16 School D Chemistry
17 School D English
18 School D History
保存到csv ...
df_tot.to_csv('anyfile.csv')
数据框的替代答案
我想知道使用字典和json是否不仅容易?
School = [
"School A",
"School A",
"School A",
"School B",
"School B",
"School B",
"School C",
"School C",
"School D",
]
Class = [
"Math",
"Chemistry",
"English",
"Math",
"Chemistry",
"English",
"Math",
"Chemistry",
"Physics",
]
列出学校中现有的课程。
A = list(zip(School, Class))
for item in A:
print(item)
('School A', 'Math')
('School A', 'Chemistry')
('School A', 'English')
('School B', 'Math')
('School B', 'Chemistry')
('School B', 'English')
('School C', 'Math')
('School C', 'Chemistry')
('School D', 'Physics')
将其放入一个必填项:
d1 = {}
for item in A:
d1.setdefault(item[0], []).append(item[1])
print(d1)
{'School A': ['Math', 'Chemistry', 'English'],
'School B': ['Math', 'Chemistry', 'English'],
'School C': ['Math', 'Chemistry'],
'School D': ['Physics']}
使用不在d1中的项目构建新词典:
d2 = {}
for s in set(School):
for c in set(Class):
if c in d1[s]:
continue
else:
d2.setdefault(s,[]).append(c)
print(d2)
{'School C': ['Physics', 'English'],
'School A': ['Physics'],
'School B': ['Physics'],
'School D': ['Math', 'Chemistry', 'English']}
然后我会考虑使用json文件:
import json
with open('data.json', 'w') as fp:
json.dump(d2, fp)
答案 1 :(得分:1)
以下代码将每所学校的所有缺失班级汇总为一组。
schools = {'School': ['School A', 'School A', 'School A', 'School B', 'School B', 'School B', 'School C','School C', 'School D'], 'Class': ['Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'Physics']}
dfSchool = pd.DataFrame(schools)
classes = {'Class': ['Math', 'Chemistry', 'English', 'History', 'Physics']}
set_classes = set(classes["Class"])
df = dfSchool.groupby('School').agg(lambda c: set_classes.difference(c))
df.name = "MissingClasses"
df.to_csv("SchoolClasses.csv")
答案 2 :(得分:1)
这只是对如何将已打印的内容输出到csv文件的直接答案。因此,我保留了您的算法,仅稍微更改了listFinal
列表的内容:
listFinal = []
for name, group in grouped:
print(name)
print(dfClasses[~(dfClasses.Class.isin(group["Class"]))])
# add a new column with the class name to the dataframe appended to the list
listFinal.append(dfClasses[~(dfClasses.Class.isin(group["Class"]))]
.assign(School=name))
然后我们可以使用简单的pd.concat轻松地将所有内容输出到csv文件:
dfOutput = pd.concat(listFinal)
dfOutput.to_csv('SchoolClasses.csv', index=True)
答案 3 :(得分:1)
一种选择是使用pandas.DataFrame.groupby.apply
:
import pandas as pd
schools = {'School': ['School A', 'School A', 'School A',
'School B', 'School B', 'School B',
'School C', 'School C', 'School D'],
'Class': ['Math', 'Chemistry', 'English',
'Math', 'Chemistry', 'English',
'Math', 'Chemistry', 'Physics']
}
classes = {'Class': ['Math', 'Chemistry', 'English', 'History', 'Physics']}
df_school = pd.DataFrame(data=schools)
df_classes = pd.DataFrame(data=classes)
missing = (df_school.groupby('School')
.apply(lambda group: df_classes[~(df_classes["Class"].isin(group["Class"]))])
.droplevel(-1)
)
missing.to_csv("missing_classes.csv")
结果:
>>> missing
Class
School
School A History
School A Physics
School B History
School B Physics
School C English
School C History
School C Physics
School D Math
School D Chemistry
School D English
School D History
学校,班级
学校A,历史
A学校,物理
学校B,历史
B学校,物理
C学校,英语
学校C,历史
C学校,物理
D学校,数学
化学D学院
D学校,英语
学校D,历史