我有两个数据框:
df_GB 是课程的学生和学生数据的列表 df_EV 是一组来自班级一部分学生的调查回答
import pandas as pd
import numpy as np
# Create the two dataframes
In [82]: gradebook=[['Jim','T'],['Susan','F'],['Bob','F'],['Ellen','T']]
In [83]: df_GB=pd.DataFrame(gradebook,columns=['Name','Attend'])
In [84]: survey=[['Jim',1,3,4,'Awesome'],['Ellen',1,4,3,'Splendid'],['Fred',0,1,2,'Passable']]
In [85]: df_EV=pd.DataFrame(survey,columns=['Name','Q1','Q2','Q3','Comment'])
#Display the two dataframes
In [86]: df_GB
Out[86]:
Name Attend
0 Jim T
1 Susan F
2 Bob F
3 Ellen T
In [87]: df_EV
Out[87]:
Name Q1 Q2 Q3 Comment
0 Jim 1 3 4 Awesome
1 Ellen 1 4 3 Splendid
2 Fred 0 1 2 Passable
我想将df_EV中列出的每个学生的调查答复添加到df_GB中的正确行,以获取以下信息:
In [90]: df_result
Out[90]:
Name Attend Q1 Q2 Q3 Comment
0 Jim T 1.0 3.0 4.0 Awesome
1 Susan F NaN NaN NaN NaN
2 Bob F NaN NaN NaN NaN
3 Ellen T 1.0 4.0 3.0 Splendid
4 Fred NaN 0.0 1.0 2.0 Passable
screenshot of what df_GB, df_EV, and df_result look like as tables
我尝试从df_GB中的名称列表中创建df_result,然后遍历df_result中的名称,在df_EV中搜索这些名称,然后使用loc“粘贴”在数据中,但是它不起作用,因为我试图将一个数据帧作为一个序列粘贴到一个数据帧中,并引发“ ValueError:与Series不兼容的索引器”错误。
df_result['Name']=pd.DataFrame({'Name' : df_GB['Name']})
i=0
while i<df_result.shape[0]
name=df_result.at[i,'Name']
df_result.loc[i,'Q1':'Comment']=df_EV.loc[lambda df_EV: df_EV['Name']==name,['Q1','Q2','Q3','Comment']]
i +=1
我已经在键盘上敲了一下头,现在想弄清楚该怎么做。提示?我是python的新手。在使用Matlab多年后,昨天就下载了python。这似乎太简单了,很难弄清楚。
我通读的类似问题似乎可以通过将行或列附加到数据框来解决,在这种情况下,由于两个数据框中的“名称”列表不匹配,我认为这种方法不起作用,除非我缺少明显的把戏。
答案 0 :(得分:0)
@sacul的解决方案完全正确,请使用:
df_GB.merge(df_EV, how='outer')
就这样...
merge
方法为您做很多事情。
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
答案 1 :(得分:0)
import pandas as pd
df_GB = pd.DataFrame([[ 'Jim','T'],['Susan','F'],['Bob','F'],'Ellen','T']],columns = [ 'Name', 'Attend'])
df_EV = pd.DataFrame([[ 'Jim',1,3,4,'Awesome'],['Ellen',1,4,3,'Splendid'], ['Fred',0,1,2,'Passable']],columns = ['Name','Q1','Q2','Q3','Comment'])
df_result = pd.merge(df_EV,df_GB,on = 'Name',how = 'outer')
df_result
Out[33]:
Name Q1 Q2 Q3 Comment Attend
0 Jim 1.0 3.0 4.0 Awesome T
1 Ellen 1.0 4.0 3.0 Splendid T
2 Fred 0.0 1.0 2.0 Passable NaN
3 Susan NaN NaN NaN NaN F
4 Bob NaN NaN NaN NaN F
df_GB.join(df_EV.set_index('Name'), on='Name',how ='outer')
Out[45]:
Name Attend Q1 Q2 Q3 Comment
0 Jim T 1.0 3.0 4.0 Awesome
1 Susan F NaN NaN NaN NaN
2 Bob F NaN NaN NaN NaN
3 Ellen T 1.0 4.0 3.0 Splendid
3 Fred NaN 0.0 1.0 2.0 Passable