我正在努力获取transform()
以返回我想要的结果。我想检查每个组中的“缺失”在给定的组中是否唯一。
请考虑以下事项:
df = pd.DataFrame({'key': [1, 1, 2, 2, 3, 3, 2, 4], 'type': ['correct', 'incorrect', 'missed', 'incorrect', 'missed', 'missed', 'correct', 'pass']})
df
key type
0 1 correct
1 1 incorrect
2 2 missed
3 2 incorrect
4 3 missed
5 3 missed
6 2 correct
7 4 pass
我正在尝试使原始数据框看起来像这样。如果only_missed
是组中的唯一类型,则yes
是missed
。
key type only_missed
0 1 correct no
1 1 incorrect no
2 2 missed no
3 2 incorrect no
4 3 missed yes
5 3 missed yes
6 2 correct no
7 4 pass pass
我尝试了这个,但是输出是意外的:
a = ['correct', 'incorrect']
m = ['missed']
df['only_missed'] = df.groupby('key')['type'].transform(lambda x: 'no' if all(x.isin(a)) else ('yes' if all(x.isin(m)) else 'pass'))
df
key type only_missed
0 1 correct no
1 1 incorrect no
2 2 missed pass
3 2 incorrect pass
4 3 missed yes
5 3 missed yes
6 2 correct pass
7 4 pass pass
当我在这里进行了几次迭代以试图弄清楚发生了什么时,这个人真的让我感到难过。
我们非常感谢您的帮助。
答案 0 :(得分:0)
尝试:
df.groupby('key')['type'].transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))
输出:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 False
Name: type, dtype: bool
而且,您可以屏蔽“通过”:
df.groupby('key')['type']\
.transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
.mask(df.type == 'pass','pass')
输出:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 pass
Name: type, dtype: object
然后,将“是/否”替换为“是/否”:
df.groupby('key')['type']\
.transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
.replace({False:'No',True:'Yes'})\
.mask(df.type == 'pass','pass')
输出:
0 No
1 No
2 No
3 No
4 Yes
5 Yes
6 No
7 pass
Name: type, dtype: object
分配给数据框列:
df['only_misses'] = df.groupby('key')['type']\
.transform(lambda x: (x.nunique() == 1) & (x.iloc[0] == 'missed'))\
.replace({False:'No',True:'Yes'})\
.mask(df.type == 'pass','pass')
df
输出:
key type only_misses
0 1 correct No
1 1 incorrect No
2 2 missed No
3 2 incorrect No
4 3 missed Yes
5 3 missed Yes
6 2 correct No
7 4 pass pass
答案 1 :(得分:0)
df.groupby('key')['type'].transform(
lambda x: ‘yes’
if (x == 'missed').all() else
('pass' if (x == 'pass').all() else 'no')
)
答案 2 :(得分:0)
一种方法是使用布尔并将它们加起来以创建分类:
In [11]: a = pd.Series(df.type.str.match('correct|incorrect').values, df.key).groupby(level=0).transform('all')
In [12]: m = pd.Series((df.type == 'missed').values, df.key).groupby(level=0).transform('all')
In [13]: pd.Categorical.from_codes(a + 2 * m, ['pass', 'no', 'yes'])
Out[13]:
[no, no, pass, pass, yes, yes, pass, pass]
Categories (3, object): [pass, no, yes]
In [14]: df["only_missed"] = pd.Categorical.from_codes(a + 2 * m, ['pass', 'no', 'yes'])
In [15]: df
Out[15]:
key type only_missed
0 1 correct no
1 1 incorrect no
2 2 missed pass
3 2 incorrect pass
4 3 missed yes
5 3 missed yes
6 2 correct pass
7 4 pass pass
使用.values
(避免重新编制索引)感觉有点小,但是应该非常有效...
再次查看时,这是“不正确的”输出,但是我将其保留在那里,因为它基本上是相同的。为了获得正确的答案,您应该查看所有“通过”:
In [21]: p = pd.Series((df.type == 'pass').values, df.key).groupby(level=0).transform('all')
In [22]: pd.Categorical.from_codes(m + 2 * p, ['no', 'yes', 'pass'])
Out[22]:
[no, no, no, no, yes, yes, no, pass]
Categories (3, object): [no, yes, pass]
In [23]: df['only_missed'] = pd.Categorical.from_codes(m + 2 * p, ['no', 'yes', 'pass'])
In [24]: df
Out[24]:
key type only_missed
0 1 correct no
1 1 incorrect no
2 2 missed no
3 2 incorrect no
4 3 missed yes
5 3 missed yes
6 2 correct no
7 4 pass pass