通过在行的子集之间保持最小值,从旧产生新的熊猫数据帧

时间:2017-03-11 03:37:31

标签: python pandas dataframe

我有一个

形式的数据框
data = [{'Energy': 2,'spin': 1},{'Energy': 6,'spin': 2},{'Energy':5,'spin':2},
        {'Energy': 15,'spin': 5},{'Energy': 4,'spin': 1},  {'Energy': 10,'spin': 5}]

df=pd.DataFrame(data, index=['Particle 1', 'Particle 2','Particle 3',
                             'Particle 4','Particle 5','Particle 6'])
print(df)

对于每组具有相同旋转的粒子,我希望只保留能量最小的粒子并丢弃其余粒子。也就是说,生成的数据框应如下所示:

dataclean = [{'Energy': 2,'spin': 1},{'Energy': 5,'spin': 2},{'Energy': 10,'spin': 5}]

df2=pd.DataFrame(dataclean, index=['Particle 1','Particle 3','Particle 6'])
print(df2)

            Energy  spin
Particle 1       2     1
Particle 3       5     2
Particle 6      10     5

我尝试过不同的方法但没有成功。这样做最简单的方法是什么?

2 个答案:

答案 0 :(得分:3)

您可以使用idxmin()找出每个 spin Energy 最小的索引,然后使用它来对原始数据框进行子集化(这假设你没有重复的索引):

df.loc[df.groupby("spin").Energy.idxmin()]

enter image description here

另一个选项:使用nsmallest

df.groupby('spin').Energy.nsmallest(1).reset_index(level=0)

enter image description here

答案 1 :(得分:0)

<强>更新

来源DF

model{
    for(i in 1:10){
        for (t in 1:2){
            #  y[i,t] =collision at intersection i in year t
            # mi[i,t] and ma [i,t] = major and minor traffic volume at intersection i at   year t
            #likehood
            y[i,t] ~ dpois(theta[i,t]) 
            # link function (for collision rate)
            theta[i,t] <- lambda[i,t]  * rate[i,t]
            rate [i,t] <- (mi[i,t] + ma[i,t]) /1000
            log(lambda[i,t]) <- beta0 + beta1*log(mi[i,t]) + beta2*log(ma[i,t]) + c[i]
        }
        c[i] ~ dnorm (0.0, tau)
    }

    # prior distribution
    beta0 ~ dnorm(0.0, 1.0E-6)
    beta1 ~ dnorm(0.0, 1.0E-6)
    beta2 ~ dnorm(0.0, 1.0E-6)
    tau ~ dgamma(0.01, 0.01)
    sigma <- 1/ sqrt(tau)
}

使用GroupBy.rank(method='dense')方法

的解决方案
In [70]: df
Out[70]:
            Energy  spin
Particle 1       2     1
Particle 2       6     2
Particle 3       5     2
Particle 4      15     5
Particle 5       4     1
Particle 6      10     5
Particle 7      10     5   # i've added this row

说明:

In [71]: df.loc[df.groupby('spin').Energy.rank(method='dense').le(1)]
Out[71]:
            Energy  spin
Particle 1       2     1
Particle 3       5     2
Particle 6      10     5
Particle 7      10     5

OLD回答:

替代解决方案:

In [72]: df.groupby('spin').Energy.rank(method='dense')
Out[72]:
Particle 1    1.0
Particle 2    2.0
Particle 3    1.0
Particle 4    2.0
Particle 5    2.0
Particle 6    1.0
Particle 7    1.0
Name: Energy, dtype: float64

In [73]: df.groupby('spin').Energy.rank(method='dense').le(1)
Out[73]:
Particle 1     True
Particle 2    False
Particle 3     True
Particle 4    False
Particle 5    False
Particle 6     True
Particle 7     True
Name: Energy, dtype: bool

PS请注意,@Psidom's solution: df.groupby('spin').Energy.nsmallest(1).reset_index(level=0)更具惯用性,应该表现得更好