我知道要计算列的每个唯一值并将其转换为百分比,我可以使用:
df['name_of_the_column'].value_counts(normalize=True)*100
我想知道如何对所有列进行此操作,然后将给定列中的唯一值占所有值的95%以上的列删除?请注意,该函数还应该计算NaN值。
答案 0 :(得分:1)
您可以尝试以下方法:
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_finish_group);
InitializeFields();
Intent intent = getIntent();
contacts = intent.getParcelableArrayListExtra("selectedUsers");
Contacts contacts1 = new Contacts;
}
答案 1 :(得分:1)
您可以在value_counts
周围写一个小的包装,如果任何值超过某个阈值,则返回False;如果计数看起来不错,则返回True:
样本数据
import pandas as pd
import numpy as np
df = pd.DataFrame({
"A": [1] * 20, # should NOT survive
"B": [1, 0] * 10, # should survive
"C": [np.nan] * 20, # should NOT survive
"D": [1,2,3,4] * 5, # should survive
"E": [0] * 18 + [np.nan, np.nan] # should survive
})
print(df.head())
实施
def threshold_counts(s, threshold=0):
counts = s.value_counts(normalize=True, dropna=False)
if (counts >= threshold).any():
return False
return True
column_mask = df.apply(threshold_counts, threshold=0.95)
clean_df = df.loc[:, column_mask]
print(clean_df.head())
B D E
0 1 1 0.0
1 0 2 0.0
2 1 3 0.0
3 0 4 0.0
4 1 1 0.0