错误"只能比较带有相同标签的系列对象"和sort_index

时间:2017-06-27 05:48:59

标签: python pandas indexing boolean

我有两个数据帧df1 df2具有相同数量的行,列和变量,我试图比较两个数据帧中的布尔变量choice。然后使用if/else来操纵数据。但是当我尝试比较布尔变量时,似乎有些错误。

以下是我的数据框示例和代码:

#df1
v_100     choice #boolean
7          True
0          True
7          False
2          True

#df2
v_100     choice #boolean
1          False
2          True
74         True
6          True

def lastTwoTrials_outcome():
     df1 = df.iloc[5::6, :] #df1 and df2 are extracted from the same dataframe first
     df2 = df.iloc[4::6, :]

     if df1['choice'] != df2['choice']:  # if "choice" is different in the two dataframes
         df1['v_100'] = (df1['choice'] + df2['choice']) * 0.5

这是错误:

if df1['choice'] != df2['choice']:
File "path", line 818, in wrapper
raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects

我发现了同样的错误here,并且首先向sort_index提出了答案,但我并不理解为什么会这样?任何人都可以详细解释(如果这是正确的解决方案)?

谢谢!

2 个答案:

答案 0 :(得分:4)

我认为您需要reset_index来获取相同的索引值,然后才能进行comapare - 对于创建新列,最好使用masknumpy.where

另外+使用|因为使用布尔值。

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
                                  (df1['choice'] + df2['choice']) * 0.5)


df1['v_100'] = np.where(df1['choice'] != df2['choice'],
                       (df1['choice'] | df2['choice']) * 0.5,
                        df1['choice'])

样品:

print (df1)
   v_100  choice
5      7    True
6      0    True
7      7   False
8      2    True

print (df2)
   v_100  choice
4      1   False
5      2    True
6     74    True
7      6    True
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
   v_100  choice
0      7    True
1      0    True
2      7   False
3      2    True

print (df2)
   v_100  choice
0      1   False
1      2    True
2     74    True
3      6    True

df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
                                  (df1['choice'] | df2['choice']) * 0.5)

print (df1)
   v_100  choice
0    0.5    True
1    1.0    True
2    0.5   False
3    1.0    True

答案 1 :(得分:0)

发生错误是因为您比较了两个具有不同索引的pandas.Series对象。一个简单的解决方案是只比较系列中的值。试试吧:

if df1['choice'].values != df2['choice'].values