我想删除所有值计数小于95%的记录。目前,我静态地对值计数<= 5进行此操作,但我只想保留值计数的前95%。我该怎么办?另外,PART_NO在这里也是分类的。
vc = repair['PART_NO'].value_counts()
u = [i not in set(vc[vc<=5].index) for i in repair['PART_NO']]
repair = repair[u]
repair.describe(include="all")
答案 0 :(得分:0)
可能是我了解您的问题不对,让我们举个例子。如下面的代码:
# in the below list, the num 1 have percentile > 95
lst = [1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
count = len(lst) * 1.0
# test for each num percentile compute
print [[x, lst.count(x) / count] for x in set(lst)]
# [[1, 0.9545454545454546], [2, 0.045454545454545456]]
# remove percentile < 95
ret_lst = [[x, lst.count(x) / count] for x in set(lst) if lst.count(x) / count >= 0.95]
print ret_lst
# [[1, 0.9545454545454546]]
# get the item whose percentile > 95
lst_final = [item[0] for item in ret_lst]
print lst_final
# [1]
上面lst
中的元素可以是str
或其他type
。例如:
# the lst can have any type, like int, str in the below
# in the below list, the 'a' have percentile > 95
lst = ['a', 'b', 1, 0, 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a']
count = len(lst) * 1.0
# test for each elements percentile compute
print [[x, lst.count(x) / count] for x in set(lst)]
# [['a', 0.9508196721311475], [1, 0.01639344262295082], ['b', 0.01639344262295082], [0, 0.01639344262295082]]
# remove percentile < 95
ret_lst = [[x, lst.count(x) / count] for x in set(lst) if lst.count(x) / count >= 0.95]
print ret_lst
# [['a', 0.9508196721311475]]
lst_final = [item[0] for item in ret_lst]
print lst_final
# ['a']
答案 1 :(得分:0)
我一直希望保留第95个百分点,而不是该百分比。 因此,解决方案是:
delete arrName[1];
repair['FREQ'] = \
repair.groupby('PART_NO', as_index=False)['PART_NO'].transform(lambda s: s.count())
repair.head()