我有一组小区,可以在不同高度上放置许多天线。我需要建立一个数据框,其中包含天线多于1个但高度不同的单元格
我尝试使用groupby函数,该函数会向我返回单元格的计数,但是我无法弄清楚如何使用它进行过滤
import pandas as pd
df1 = pd.DataFrame( {
"Cell" : ["AAAA", "BBBB","BBBB","CCCC","CCCC","DDDD","DDDD"] ,
"antenna" : ["A1", "A1","A1","A2","A4","A1","A2"] ,
"height": ["5","30","30","45","45","30","15"] ,
"function":
["LTE1800","LTE700","LTE700","LTE700","LTE700","LTE2100","LTE2100"]} )
df1['count'] = df1.groupby('Cell')['Cell'].transform('count')
返回:
Cell antenna height function count
0 AAAA A1 5 LTE1800 1
1 BBBB A1 30 LTE700 2
2 BBBB A1 30 LTE700 2
3 CCCC A2 45 LTE700 2
4 CCCC A4 45 LTE700 2
5 DDDD A1 30 LTE2100 2
6 DDDD A2 15 LTE2100 2
我想要的输出是:
Cell antenna height function count
1 DDDD A1 30 LTE2100 2
2 DDDD A2 15 LTE2100 2
或者相反:
Cell antenna height function count
0 AAAA A1 5 LTE1800 1
1 BBBB A1 30 LTE700 2
2 BBBB A1 30 LTE700 2
3 CCCC A2 45 LTE700 2
4 CCCC A4 45 LTE700 2
我对groupby查询的经验有限,所以我不知道该如何完成。
答案 0 :(得分:0)
过滤器基本上在groupby中的每个组上运行一个函数:
df1[df1['count']>1].groupby('Cell').filter(lambda x: x.height.nunique() > 1)
Cell antenna height function count
5 DDDD A1 30 LTE2100 2
6 DDDD A2 15 LTE2100 2
答案 1 :(得分:0)
另一种方法可能是计算天线高度(height_std
)的标准偏差以作为高度变化的度量,然后仅选择那些标准差非零的行(如果所有高度均为相同,标准偏差为零):
import pandas as pd
df1 = pd.DataFrame({
"Cell": ["AAAA", "BBBB", "BBBB", "CCCC", "CCCC", "DDDD", "DDDD"],
"antenna": ["A1", "A1", "A1", "A2", "A4", "A1", "A2"],
"height": ["5", "30", "30", "45", "45", "30", "15"],
"function":
["LTE1800", "LTE700", "LTE700", "LTE700", "LTE700", "LTE2100", "LTE2100"]})
df1.height = df1.height.astype(int)
df1['height_std'] = df1.groupby('Cell').height.transform('std')
print(df1[df1['height_std'] > 0])
# Cell antenna function height height_std
#5 DDDD A1 LTE2100 30 10.606602
#6 DDDD A2 LTE2100 15 10.606602
答案 2 :(得分:0)
您可以尝试将transform
与nunique
g=df1.groupby('Cell')
df1[g.antenna.transform('nunique').eq(2)&g.height.transform('nunique').eq(2)]
Cell antenna height function
5 DDDD A1 30 LTE2100
6 DDDD A2 15 LTE2100
答案 3 :(得分:0)
因此,您基本上想使用group by
和having
(如果这是SQL),则可以这样实现:
df1.groupby(['Cell'], as_index=False).filter(lambda g: g['height'].nunique() >= 2)
Cell antenna height function
5 DDDD A1 30 LTE2100
6 DDDD A2 15 LTE2100
df1.groupby(['Cell'], as_index=False).filter(lambda g: g['height'].nunique() < 2)
Cell antenna height function
0 AAAA A1 5 LTE1800
1 BBBB A1 30 LTE700
2 BBBB A1 30 LTE700
3 CCCC A2 45 LTE700
4 CCCC A4 45 LTE700