我有给定格式的数据集
time color height weight value
1 t1 red hr1 wr1 vr1
2 t1 red hr1 wr1 vr1
3 t1 blue hb1 wb1 vb1
4 t1 blue hb1 wb1 vb1
5 t1 green hg1 wg1 vg1
6 t1 green hg1 wg1 vg1
7 t2 blue hb2 wb2 vb2
8 t2 green hg2 wg2 vg2
9 t2 red hr2 wr2 vr2
10 t2 red hr2 wr2 vr2
11 t3 red hr3 wr3 vr3
12 t3 red hr3 wr3 vr3
13 t3 green hg3 wg3 vg3
14 t3 green hg3 wg3 vg3
15 t3 blue hb3 wb3 vb3
16 t3 blue hb3 wb3 vb3
我想放弃时间的测量,因为对于每种红色,蓝色和绿色,颜色的计数值都不相同。 在给定的代码段中,应保留t1和t3,并删除所有用于t3测量的行。
结果应为:
time color height weight value
1 t1 red hr1 wr1 vr1
2 t1 red hr1 wr1 vr1
3 t1 blue hb1 wb1 vb1
4 t1 blue hb1 wb1 vb1
5 t1 green hg1 wg1 vg1
6 t1 green hg1 wg1 vg1
7 t3 red hr3 wr3 vr3
8 t3 red hr3 wr3 vr3
9 t3 green hg3 wg3 vg3
10 t3 green hg3 wg3 vg3
11 t3 blue hb3 wb3 vb3
12 t3 blue hb3 wb3 vb3
谢谢
答案 0 :(得分:1)
怎么样:
s = df.groupby(['time', 'color']).size()
s = s.unstack(0).eq(2).all()
valid_times = s.index[s]
print(df[df.time.isin(valid_times)])
time color height weight value
1 t1 red hr1 wr1 vr1
2 t1 red hr1 wr1 vr1
3 t1 blue hb1 wb1 vb1
4 t1 blue hb1 wb1 vb1
5 t1 green hg1 wg1 vg1
6 t1 green hg1 wg1 vg1
11 t3 red hr3 wr3 vr3
12 t3 red hr3 wr3 vr3
13 t3 green hg3 wg3 vg3
14 t3 green hg3 wg3 vg3
15 t3 blue hb3 wb3 vb3
16 t3 blue hb3 wb3 vb3
答案 1 :(得分:0)
对返回系列使用双精度GroupBy.transform
,其大小与原始DataFrame相同,因此可以使用boolean indexing
:
df1 = df[df.groupby(['time', 'color'])['color']
.transform('size')
.eq(2)
.groupby(df['time'])
.transform('all')]
print (df1)
time color height weight value
1 t1 red hr1 wr1 vr1
2 t1 red hr1 wr1 vr1
3 t1 blue hb1 wb1 vb1
4 t1 blue hb1 wb1 vb1
5 t1 green hg1 wg1 vg1
6 t1 green hg1 wg1 vg1
11 t3 red hr3 wr3 vr3
12 t3 red hr3 wr3 vr3
13 t3 green hg3 wg3 vg3
14 t3 green hg3 wg3 vg3
15 t3 blue hb3 wb3 vb3
16 t3 blue hb3 wb3 vb3