UPDATE 我的问题已得到完全解答,我已使用jarmod的答案将其应用于程序,尽管代码看起来更简洁,但并没有影响(当我的图形出现时(使用matplotlib绘制此数据)我对为什么我的程序运行缓慢以及如何提高速度(大约需要30秒,并且我知道这部分代码正在减慢速度)有些困惑,我在第二个代码块。此外,速度很大程度上取决于我设置的范围,在较短的范围内,它的运行速度很快
我这里有示例代码,显示了进行预测和提取值所需的计算。我使用for循环遍历我标记为1-100的特定范围的CSV文件。我返回每个月(1-12)的数字,以获得给定月份的预测的预测平均值。
我的完整代码包括一个用于全年预测的12个函数,但我觉得代码效率低下,因为除了一个数字外,这些函数非常相似,而且读取csv文件的次数使该程序变慢了。
有没有一种方法可以组合这些功能,并可能添加另一个参数以使其运行。我最担心的是,很难返回单独的数字并将其分类。换句话说,理想情况下,我只希望对所有12个月的准确性预测都使用一个函数,而我可能看到的方法是添加另一个参数和另一个循环序列,但不知道该怎么做或如果可能的话。本质上,我想存储一个月精度的所有值(该值进入当前文件之前的文件中,并比较与当前文件关联的日期的预测值),然后存储两个月精度的所有值,依此类推……以后我可以将这些变量用于绘图和其他用途
import csv
import pandas as pd
def onemonthaccuracy(basefilenumber):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='latin-1')
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains('Customer A', na=False), 'Jun-16\nQty']
onemonthread = pd.read_csv(str(basefilenumber-1)+'.csv', encoding='latin-1')
onemonthvalue = onemonthread.loc[onemonthread['Customer'].str.contains('Customer A', na=False),'Jun-16\nQty']
onetotal = int(onemonthvalue)/int(basefilevalue)
return onetotal
def twomonthaccuracy(basefilenumber):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='Latin-1')
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains('Customer A', na=False), 'Jun-16\nQty']
twomonthread = pd.read_csv(str(basefilenumber-2)+'.csv', encoding = 'Latin-1')
twomonthvalue = twomonthread.loc[twomonthread['Customer'].str.contains('Customer A', na=False), 'Jun-16\nQty']
twototal = int(twomonthvalue)/int(basefilevalue)
return twototal
onetotal = 0
twototal = 0
onetotallist = []
twototallist = []
for basefilenumber in range(24,36):
onetotal += onemonthaccuracy(basefilenumber)
twototal +=twomonthaccuracy(basefilenumber)
onetotallist.append(onemonthaccuracy(i))
twototallist.append(twomonthaccuracy(i))
onetotalpermonth = onetotal/12
twototalpermonth = twototal/12
x = [1,2]
y = [onetotalpermonth, twototalpermonth]
z = [1,2]
w = [(onetotallist),(twototallist)]
for ze, we in zip(z, w):
plt.scatter([ze] * len(we), we, marker='D', s=5)
plt.scatter(x,y)
plt.show()
这是我在程序中使用的真正的代码块,也许是某些原因使我不知道的速度变慢了?
#other parts of code
#StartRange = yearvalue+Value
#EndRange = endValue + endyearvalue
#Range = EndRange - StartRange
# Department
#more code....
def nmonthaccuracy(basefilenumber, n):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='Latin-1')
baseheader = getfileheader(basefilenumber)
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains(Department, na=False), baseheader]
nmonthread = pd.read_csv(str(basefilenumber-n)+'.csv', encoding = 'Latin-1')
nmonthvalue = nmonthread.loc[nmonthread['Customer'].str.contains(Department, na=False), baseheader]
return (1-(int(basefilevalue)/int(nmonthvalue))+1) if int(nmonthvalue) > int(basefilevalue) else int(nmonthvalue)/int(basefilevalue)
N = 13
total = [0] * N
total_by_month_list = [[] for _ in range(N)]
for basefilenumber in range(int(StartRange),int(EndRange)):
for n in range(N):
total[n] += nmonthaccuracy(basefilenumber, n)
total_by_month_list[n].append(nmonthaccuracy(basefilenumber,n))
onetotal=total[1]/ Range
twototal=total[2]/ Range
threetotal=total[3]/ Range
fourtotal=total[4]/ Range
fivetotal=total[5]/ Range #... all the way to 12
onetotallist=total_by_month_list[1]
twototallist=total_by_month_list[2]
threetotallist=total_by_month_list[3]
fourtotallist=total_by_month_list[4]
fivetotallist=total_by_month_list[5] #... all the way to 12
# alot more code after this
答案 0 :(得分:2)
类似这样的东西:
def nmonthaccuracy(basefilenumber, n):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='Latin-1')
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains('Lam DepT', na=False), 'Jun-16\nQty']
nmonthread = pd.read_csv(str(basefilenumber-n)+'.csv', encoding = 'Latin-1')
nmonthvalue = nmonthread.loc[nmonthread['Customer'].str.contains('Lam DepT', na=False), 'Jun-16\nQty']
return int(nmonthvalue)/int(basefilevalue)
N = 2
total_by_month = [0] * N
total_aggregate = 0
for basefilenumber in range(20,30):
for n in range(N):
a = nmonthaccuracy(basefilenumber, n)
total_by_month[n] += a
total_aggregate += a
如果您想知道以下代码的作用:
N = 2
total_by_month = [0] * N
它将N
设置为所需的月数(2,但是您可以将其设置为12或其他值),然后创建一个total_by_month
数组,该数组可以存储N个结果,每月一个。然后,它将total_by_month
初始化为全零(N
零),以便每个N
每月总计从零开始。