本福德定律计算来自csv文件的前导数字

时间:2018-05-06 12:24:52

标签: python python-3.x benfords-law

我是python的新手,正在编写一个从.csv文件中读取值的程序,然后显示一个图表,显示测试结果与Benford定律的预期输出相比较。

.csv文件有贷款值,我需要在第一栏中阅读,如下所示:

Values  Leading Digit   Number of occurances
170     1               88                   
900     9               62          
250     2               44          
450     4               51          
125     1               19          
.....

主文件,app.py:

 ...
 filename = filedialog.askopenfilename(filetypes=(
    ("Excel files", "*.csv"), ("All files", "*.*")))
 print(filename)
 try:
    with open(filename, 'rt') as csvfile:
        reader = csv.reader(csvfile, delimiter=',')
        next(reader, None)  # skip the headers
        for row in reader:
            minutePriceCloses.append(row[0])
            # calculate the percentage distribution of leading digits
        benford_test_data_dist = calc.getBenfordDist(minutePriceChanges)
        ....

在calc.py中:

import numpy as np


def getBenfordDist(data):
# set initial dist to zero
dist = [0, 0, 0, 0, 0, 0, 0, 0, 0]
# for each figure, check what the first non-zero digit is, hacky multiply
# by 1000000 to handle small values
for d in data:
    # sneaky multiply by 1000000 to ensure that the leading digit is unlikely to be zero
    # since benfords law is assumed to relate somehow to scale invariance, this *SHOULDN'T* make a difference
    # but it might, so this might all be wrong :-)
    s = str(np.abs(d) * 1000000)
    for i in range(0, 8):
        if(s.startswith(str(i + 1))):
            dist[i] = dist[i] + 1
            break
# return fractions of the total for each digit
percentDist = []
# convert to % - todo, start using numpy vectors that allow scalar mult/div
for count in dist:
    percentDist.append(float(count) / len(data))
    # print(float(count))
return percentDist

现在我遇到的问题是图表输出没有正确显示值列数的百分比结果除以带有值的行总数,即对于前导数字为1的值,图表上的百分比应该是是0.25等等。有352行。

请帮忙。感谢

0 个答案:

没有答案