Question

大家好，我有这样的问题。我需要根据等式过滤我的数据。我是什么意思

例如，我有这样的数据框：

    tonnage period_year
5   2,462,297.5 2014
13  2,274,912.9 2015
19  2,181,492.2 2015
20  2,173,654.8 2016
21  2,158,043.7 2016
... ... ...
92885   5.0 2016
92886   5.0 2016
92901   5.0 2016
94814   0.0 2016
94861   0.0 2013

我有

data[data.tonnage > 0.02e6]['tonnage'].sum()/data.tonnage.sum() * 100.0

97.08690080799717

data[data.tonnage > 5e6]['tonnage'].sum()/data.tonnage.sum() * 100.0

18.541547916532426

所以我需要在哪里找到最大x

data[data.tonnage > x]['tonnage'].sum()/data.tonnage.sum() * 100.0

给出的答案等于或大于40

什么是最好的方法？

Answer 1

尝试一下：

# Your sample input
df = pd.DataFrame({
    'tonnage': [100,100,100,200,5,5,5,5,5]
})

# Get the sum of each unique value in `tonnage`
t = df.groupby('tonnage')['tonnage'].sum().sort_index(ascending=False)

# Since your requirement is "> x", we have to subtract the current value from the cumsum
ratio = (t.cumsum() - t) / t.sum() * 100

# And voila!
x = ratio[ratio >= 40].index[0]

根据公式过滤熊猫数据

1 个答案: