我一直被我分配的任务所困扰,要求我输出从CSV文件中读取的数据集......
......根据本福德定律得到如下条形图:
Benford Bar Graph示例
这是我到目前为止的代码:
import matplotlib.pyplot as plt
import math
import csv
import locale
with open("immigrants.csv", newline='') as csvfile:
immidata = csv.reader(csvfile)
X_labels = []
Y = []
for row in immidata:
X_labels.append(row[0])
Y.append(locale.atoi(row[1]))
numbers = [float(n) for n in range(1, 10)]
benford = [math.log10(1 + 1 / d) for d in numbers]
plt.plot(numbers, benford, 'ro', label = "Benford's Law")
plt.bar(numbers, range(1, 11), align = 'left', normed = True,
rwidth = 0.7, label = "Actual data")
plt.bar(benford, range(1, 11), align = 'left', normed = True,
rwidth = 0.7, label = "Predicted data")
plt.title("Immigrants in countries")
plt.xlabel("Digit")
plt.ylabel("Probability")
plt.grid(True)
plt.xlim(0, 10)
plt.xticks(numbers)
plt.legend()
plt.show()
以下是CSV文件中的一些信息,其中显示了每个国家的移民人数(国家,移民人数,世界移民总人数和移民占全国人口的百分比):
United States,"45,785,090",19.8,14.3
Russia,"11,048,064",4.8,7.7
Germany,"9,845,244",4.3,11.9
Saudi Arabia,"9,060,433",3.9,31.4
United Arab Emirates,"7,826,981",3.4,83.7
United Kingdom,"7,824,131",3.4,12.4
我的输出现在:
line 19, in <module>
Y.append(locale.atoi(row[1]))
line 321, in atoi
return int(delocalize(string))
ValueError: invalid literal for int() with base 10: 'Number of
immigrants'
Process finished with exit code 1
我还是比较新,所以任何有助于我获得输出的建议都非常感谢!
谢谢!
需要看起来像样品的输出。
答案 0 :(得分:1)
阅读数据:
import locale
with open("immigrants.csv", newline='') as csvfile:
immidata = csv.reader(csvfile) # defaults are fine!
X_labels = []
Y = []
for row in immidata:
X_labels.append(row[0])
Y.append(locale.atoi(row[1]))
为您提供X_labels
和Y
(转换为int)
注意:close()
with
块无需自动执行此操作。
祝其余部分好运。顺便说一句:您共享的代码中未定义digits
- 您应该尽一切努力使其成为MCVE
答案 1 :(得分:1)
以下是使用pandas的解决方案:
这是对本福特法律对抗移民人数的情节
编辑:
您的文件可能有一个标题行,该标题行由num_immigrants列中的“移民数量”字符串指示。删除读取数据的行中的header=None
选项。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# set the width of the bars, you're gonna have to massage this
width = 0.35
immi = pd.read_csv('immigrants.csv')
# name columns
immi.columns = ['country', 'num_immigrants', 'perc_world', 'perc_nat_pop']
# convert num_immigrants to float
immi.num_immigrants= immi.num_immigrants.str.replace(',', '').apply(float)
total = immi.num_immigrants.sum()
# scale the immigration to between 0 and 1
immi['immi_scaled'] = immi['num_immigrants'].apply(lambda x: x/total)
indx = np.arange(1, len(immi) + 1)
benford = [np.log10(1 + (1.0 / d)) for d in indx]
plt.bar(indx, benford, width, color='r', label="Benford's Law")
plt.bar(np.arange(1, immi.shape[0]+1)+ width,
immi.immi_scaled, width, color='b', label="Predicted data")
# center the xtick labels
ax = plt.gca()
ax.set_xticks(indx + width / 2)
ax.set_xticklabels((indx))
# limit the number of bars if you have more data
plt.xlim(1, 9)
plt.title("Immigrants in countries")
plt.ylabel("Probability")
plt.grid(True)
plt.legend()
plt.show()