我有一个
形式的数据框data = [{'Energy': 2,'spin': 1},{'Energy': 6,'spin': 2},{'Energy':5,'spin':2},
{'Energy': 15,'spin': 5},{'Energy': 4,'spin': 1}, {'Energy': 10,'spin': 5}]
df=pd.DataFrame(data, index=['Particle 1', 'Particle 2','Particle 3',
'Particle 4','Particle 5','Particle 6'])
print(df)
对于每组具有相同旋转的粒子,我希望只保留能量最小的粒子并丢弃其余粒子。也就是说,生成的数据框应如下所示:
dataclean = [{'Energy': 2,'spin': 1},{'Energy': 5,'spin': 2},{'Energy': 10,'spin': 5}]
df2=pd.DataFrame(dataclean, index=['Particle 1','Particle 3','Particle 6'])
print(df2)
Energy spin
Particle 1 2 1
Particle 3 5 2
Particle 6 10 5
我尝试过不同的方法但没有成功。这样做最简单的方法是什么?
答案 0 :(得分:3)
您可以使用idxmin()
找出每个 spin Energy 最小的索引,然后使用它来对原始数据框进行子集化(这假设你没有重复的索引):
df.loc[df.groupby("spin").Energy.idxmin()]
另一个选项:使用nsmallest
df.groupby('spin').Energy.nsmallest(1).reset_index(level=0)
答案 1 :(得分:0)
<强>更新强>
来源DF
model{
for(i in 1:10){
for (t in 1:2){
# y[i,t] =collision at intersection i in year t
# mi[i,t] and ma [i,t] = major and minor traffic volume at intersection i at year t
#likehood
y[i,t] ~ dpois(theta[i,t])
# link function (for collision rate)
theta[i,t] <- lambda[i,t] * rate[i,t]
rate [i,t] <- (mi[i,t] + ma[i,t]) /1000
log(lambda[i,t]) <- beta0 + beta1*log(mi[i,t]) + beta2*log(ma[i,t]) + c[i]
}
c[i] ~ dnorm (0.0, tau)
}
# prior distribution
beta0 ~ dnorm(0.0, 1.0E-6)
beta1 ~ dnorm(0.0, 1.0E-6)
beta2 ~ dnorm(0.0, 1.0E-6)
tau ~ dgamma(0.01, 0.01)
sigma <- 1/ sqrt(tau)
}
使用GroupBy.rank(method='dense')方法
的解决方案In [70]: df
Out[70]:
Energy spin
Particle 1 2 1
Particle 2 6 2
Particle 3 5 2
Particle 4 15 5
Particle 5 4 1
Particle 6 10 5
Particle 7 10 5 # i've added this row
说明:
In [71]: df.loc[df.groupby('spin').Energy.rank(method='dense').le(1)]
Out[71]:
Energy spin
Particle 1 2 1
Particle 3 5 2
Particle 6 10 5
Particle 7 10 5
OLD回答:
替代解决方案:
In [72]: df.groupby('spin').Energy.rank(method='dense')
Out[72]:
Particle 1 1.0
Particle 2 2.0
Particle 3 1.0
Particle 4 2.0
Particle 5 2.0
Particle 6 1.0
Particle 7 1.0
Name: Energy, dtype: float64
In [73]: df.groupby('spin').Energy.rank(method='dense').le(1)
Out[73]:
Particle 1 True
Particle 2 False
Particle 3 True
Particle 4 False
Particle 5 False
Particle 6 True
Particle 7 True
Name: Energy, dtype: bool
PS请注意,@Psidom's solution: df.groupby('spin').Energy.nsmallest(1).reset_index(level=0)
更具惯用性,应该表现得更好