查找熊猫中每一列的每个唯一值的百分比

时间:2020-11-03 17:00:34

标签: python python-3.x pandas

我知道要计算列的每个唯一值并将其转换为百分比,我可以使用:

df['name_of_the_column'].value_counts(normalize=True)*100

我想知道如何对所有列进行此操作,然后将给定列中的唯一值占所有值的95%以上的列删除?请注意,该函数还应该计算NaN值。

2 个答案:

答案 0 :(得分:1)

您可以尝试以下方法:

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_finish_group);

      InitializeFields();

    Intent intent = getIntent();
    contacts = intent.getParcelableArrayListExtra("selectedUsers");

    Contacts contacts1 = new Contacts;




}

答案 1 :(得分:1)

您可以在value_counts周围写一个小的包装,如果任何值超过某个阈值,则返回False;如果计数看起来不错,则返回True:

样本数据

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "A": [1] * 20,                   # should NOT survive
    "B": [1, 0] * 10,                # should survive
    "C": [np.nan] * 20,              # should NOT survive
    "D": [1,2,3,4] * 5,              # should survive
    "E": [0] * 18 + [np.nan, np.nan] # should survive
})

print(df.head())

实施

def threshold_counts(s, threshold=0):
    counts = s.value_counts(normalize=True, dropna=False)
    if (counts >= threshold).any():
        return False
    return True

column_mask = df.apply(threshold_counts, threshold=0.95)
clean_df = df.loc[:, column_mask]

print(clean_df.head())
   B  D    E
0  1  1  0.0
1  0  2  0.0
2  1  3  0.0
3  0  4  0.0
4  1  1  0.0