删除具有少于95%的值计数的记录

时间:2018-06-26 23:44:57

标签: python

我想删除所有值计数小于95%的记录。目前,我静态地对值计数<= 5进行此操作,但我只想保留值计数的前95%。我该怎么办?另外,PART_NO在这里也是分类的。

vc = repair['PART_NO'].value_counts()
u  = [i not in set(vc[vc<=5].index) for i in repair['PART_NO']]
repair = repair[u]
repair.describe(include="all")

2 个答案:

答案 0 :(得分:0)

可能是我了解您的问题不对,让我们举个例子。如下面的代码:

# in the below list, the num 1 have percentile > 95
lst = [1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
count = len(lst) * 1.0

# test for each num percentile compute
print [[x, lst.count(x) / count] for x in set(lst)]
# [[1, 0.9545454545454546], [2, 0.045454545454545456]]

# remove percentile < 95
ret_lst = [[x, lst.count(x) / count] for x in set(lst) if lst.count(x) / count >= 0.95]
print ret_lst
# [[1, 0.9545454545454546]]

# get the item whose percentile > 95
lst_final = [item[0] for item in ret_lst]
print lst_final
# [1]

上面lst中的元素可以是str或其他type。例如:

# the lst can have any type, like int, str in the below
# in the below list, the 'a' have percentile > 95
lst = ['a', 'b', 1, 0, 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
       'a', 'a', 'a', 'a', 'a', 'a', 'a',
       'a', 'a', 'a', 'a', 'a', 'a',
       'a', 'a', 'a', 'a', 'a', 'a',
       'a', 'a', 'a', 'a', 'a', 'a',
       'a', 'a', 'a', 'a', 'a', 'a',
       'a', 'a', 'a', 'a', 'a', 'a',
       'a', 'a', 'a', 'a', 'a', 'a',
       'a', 'a', 'a', 'a', 'a', 'a']

count = len(lst) * 1.0

# test for each elements percentile compute
print [[x, lst.count(x) / count] for x in set(lst)]
# [['a', 0.9508196721311475], [1, 0.01639344262295082], ['b', 0.01639344262295082], [0, 0.01639344262295082]]

# remove percentile < 95
ret_lst = [[x, lst.count(x) / count] for x in set(lst) if lst.count(x) / count >= 0.95]
print ret_lst
# [['a', 0.9508196721311475]]

lst_final = [item[0] for item in ret_lst]
print lst_final
# ['a']

答案 1 :(得分:0)

我一直希望保留第95个百分点,而不是该百分比。 因此,解决方案是:

获取每个零件编号的频率并将其添加到数据框中

delete arrName[1];

过滤出修复行,其中repair.freq大于或等于第95个百分位数:19600

repair['FREQ'] = \ repair.groupby('PART_NO', as_index=False)['PART_NO'].transform(lambda s: s.count()) repair.head()