Question

对于每个具有至少一个负值的数值列，计算数据框中负值百分比的快速方法是什么？

目前我能够为每个数值计算它。

for i in range(1,len(df.columns)):
  if df.dtypes[df.columns[i]] in ['float64','int64','float32','int32']:
    negatives = np.sum(np.array(df.iloc[:,[i]] < 0))
    negativesP= "{:.2%}".format(negatives/len(df))
    print(df.columns[i], ": \n", negatives, "\t", negativesP)

Answer 1

您可以使用 select_dtypes 仅获取带有 np.number 的数字列，使用 lt 与 0 和 mean 进行比较。然后 format 和 apply。

import numpy as np

# toy sample
np.random.seed(10)
n=100
d = {'a': np.random.randint(-10,10,n),
     'b': np.random.choice(list('abcdefgh'), size=n) ,
     'c':np.random.random(size=n)-0.5}
df = pd.DataFrame(d)

# this gives you only the percentage
print(df.select_dtypes(include=[np.number]).lt(0).mean().apply("{:.2%}".format))
#a    37.00%
#c    56.00%
#dtype: object

如果你想要计数，你也可以使用 sum 之类的

_df = df.select_dtypes(include=[np.number]).lt(0)
print(
    pd.concat([_df.sum().astype(int), 
               _df.mean().apply("{:.2%}".format), ], 
              keys=['Count','Percentage'], axis=1)
      .loc[lambda x: x['Count']>0] #keep only columns with at least one negative value
)
   Count Percentage
a     37     37.00%
c     56     56.00%

计算df中每个数值列的负值分布

1 个答案: