在Excel行中查找异常值

时间:2018-08-07 06:12:57

标签: python excel openpyxl

举个例子,列C有1000个单元格,大多数填充有“ 1”,但是其中散布了几个“ 2”。我试图找到有多少个“ 2”并打印数字。

import openpyxl

wb = openpyxl.load_workbook('TestBook')
ws = wb.get_sheet_by_name('Sheet1')

for cell in ws['C']:
    print(cell.value)

我如何遍历该列,然后拉几位?

3 个答案:

答案 0 :(得分:1)

正如@ K.Marker指出的那样,您可以使用以下方式查询行中特定值的计数:

[c.value for c in ws['C']].count(2)

但是,如果您不知道值和/或想查看特定行的值分布怎么办?您可以使用具有类似Counter行为的dict

In [446]: from collections import Counter

In [448]: from collections import Counter

In [449]: counter = Counter([c.value for c in ws[3]])

In [451]: counter
Out[451]: Counter({1: 17, 2: 5})

In [452]: for k, v in counter.items():
     ...:     print('{0} occurs {1} time(s)'.format(k, v))
     ...:
1 occurs 17 time(s)
2 occurs 5 time(s)

答案 1 :(得分:0)

react-router

列表推导会在整个C列中创建一个单元格值列表,并计算其中的2个值。

答案 2 :(得分:0)

您要查找的数字是2吗?

count = 0
#load a row in the list
row = list(worksheet.rows)[wantedRowNumber]

#iterate over it and increase the count
for r in row:
    if r==2:
        count+=1

现在,这仅适用于值“ 2”,而找不到其他异常值。要找到异常值,通常必须先确定一个阈值。在此示例中,我将使用平均值,尽管您将需要确定最佳测试以根据数据获取离群值阈值。不用担心,统计数据很有趣!

count = 0
#load a row in the list
row = list(worksheet.rows)[wantedRowNumber]

#calculatethe average
#using numpy
import numpy as np
NPavg = np.mean(list)

#without numpy
#need to cast it to float - otherwise it will round it to int
avg=sum(row)/float(len(row))

#iterate over it and increase the count
for r in row:
    #of course use your own threshold, 
    #determined appropriately, instead of average
    if r>NPavg:
        count+=1