检查Groupby对象熊猫中系列的唯一性

时间:2019-02-05 03:30:01

标签: python python-3.x pandas dataframe pandas-groupby

我正在努力获取transform()以返回我想要的结果。我想检查每个组中的“缺失”在给定的组中是否唯一。

请考虑以下事项:

df = pd.DataFrame({'key': [1, 1, 2, 2, 3, 3, 2, 4], 'type': ['correct', 'incorrect', 'missed', 'incorrect', 'missed', 'missed', 'correct', 'pass']})
df

  key   type
0   1   correct
1   1   incorrect
2   2   missed
3   2   incorrect
4   3   missed
5   3   missed
6   2   correct
7   4   pass

我正在尝试使原始数据框看起来像这样。如果only_missed是组中的唯一类型,则yesmissed

    key type    only_missed
0   1   correct     no
1   1   incorrect   no
2   2   missed      no
3   2   incorrect   no
4   3   missed      yes
5   3   missed      yes
6   2   correct     no
7   4   pass        pass

我尝试了这个,但是输出是意外的:

a = ['correct', 'incorrect']
m = ['missed']
df['only_missed'] = df.groupby('key')['type'].transform(lambda x: 'no' if all(x.isin(a)) else ('yes' if all(x.isin(m)) else 'pass'))
df
   key  type    only_missed
0   1   correct     no
1   1   incorrect   no
2   2   missed      pass
3   2   incorrect   pass
4   3   missed      yes
5   3   missed      yes
6   2   correct     pass
7   4   pass        pass

当我在这里进行了几次迭代以试图弄清楚发生了什么时,这个人真的让我感到难过。

我们非常感谢您的帮助。

3 个答案:

答案 0 :(得分:0)

尝试:

df.groupby('key')['type'].transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))

输出:

0    False
1    False
2    False
3    False
4     True
5     True
6    False
7    False
Name: type, dtype: bool

而且,您可以屏蔽“通过”:

df.groupby('key')['type']\
  .transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
  .mask(df.type == 'pass','pass')

输出:

0    False
1    False
2    False
3    False
4     True
5     True
6    False
7     pass
Name: type, dtype: object

然后,将“是/否”替换为“是/否”:

df.groupby('key')['type']\
  .transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
  .replace({False:'No',True:'Yes'})\
  .mask(df.type == 'pass','pass')

输出:

0      No
1      No
2      No
3      No
4     Yes
5     Yes
6      No
7    pass
Name: type, dtype: object

分配给数据框列:

df['only_misses'] = df.groupby('key')['type']\
                      .transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
                      .replace({False:'No',True:'Yes'})\
                      .mask(df.type == 'pass','pass')
df

输出:

   key       type only_misses
0    1    correct          No
1    1  incorrect          No
2    2     missed          No
3    2  incorrect          No
4    3     missed         Yes
5    3     missed         Yes
6    2    correct          No
7    4       pass        pass

答案 1 :(得分:0)

df.groupby('key')['type'].transform(
    lambda x: ‘yes’
              if (x == 'missed').all() else 
              ('pass' if (x == 'pass').all() else 'no')
)                                        

答案 2 :(得分:0)

一种方法是使用布尔并将它们加起来以创建分类:

In [11]: a = pd.Series(df.type.str.match('correct|incorrect').values, df.key).groupby(level=0).transform('all')

In [12]: m = pd.Series((df.type == 'missed').values, df.key).groupby(level=0).transform('all')

In [13]: pd.Categorical.from_codes(a + 2 * m, ['pass', 'no', 'yes'])
Out[13]:
[no, no, pass, pass, yes, yes, pass, pass]
Categories (3, object): [pass, no, yes]

In [14]: df["only_missed"] = pd.Categorical.from_codes(a + 2 * m, ['pass', 'no', 'yes'])

In [15]: df
Out[15]:
   key       type only_missed
0    1    correct          no
1    1  incorrect          no
2    2     missed        pass
3    2  incorrect        pass
4    3     missed         yes
5    3     missed         yes
6    2    correct        pass
7    4       pass        pass

使用.values(避免重新编制索引)感觉有点,但是应该非常有效...


再次查看时,这是“不正确的”输出,但是我将其保留在那里,因为它基本上是相同的。为了获得正确的答案,您应该查看所有“通过”:

In [21]: p = pd.Series((df.type == 'pass').values, df.key).groupby(level=0).transform('all')

In [22]: pd.Categorical.from_codes(m + 2 * p, ['no', 'yes', 'pass'])
Out[22]:
[no, no, no, no, yes, yes, no, pass]
Categories (3, object): [no, yes, pass]

In [23]: df['only_missed'] = pd.Categorical.from_codes(m + 2 * p, ['no', 'yes', 'pass'])

In [24]: df
Out[24]:
   key       type only_missed
0    1    correct          no
1    1  incorrect          no
2    2     missed          no
3    2  incorrect          no
4    3     missed         yes
5    3     missed         yes
6    2    correct          no
7    4       pass        pass