我有这段代码,用于跟踪特定交付的延迟时间以及交付的延迟时间。我将它们归类为:提前交货,准时交货和延迟交货。如果我包括每个物料编号,我都可以绘制这些结果的图形。但是,当我按物料号指定时,遇到了一个错误(如下所示),我还提供了终端中准确打印的内容。好像数据框已经创建了两行,标记了不同的东西,并从那里开始计数,因此由于有两个值,所以我无法绘制图形,那么如何解决我的代码以简单地提取“计数”并使用该数字以绘制条形图
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Material= 'Material'
DELIVERY_DATE = 'Delivery Date'
DESIRED_DATE = 'source desired delivery date'
DELAYED_DAYS = 'Delayed Days'
StartYear = input("Start Year? ")
StartYear = int(StartYear)
EndYear = input("End Year? ")
EndYear = int(EndYear)
DELIVERY_DATE = 'Delivery Date'
DESIRED_DATE = 'source desired delivery date'
DELAYED_DAYS = 'Delayed Days'
df = pd.read_csv('otdo.csv')
df['Delivery Date'] = pd.to_datetime(df['Delivery Date'], format='%m/%d/%Y')
df['source desired delivery date'] = pd.to_datetime(df['source desired delivery date'], format='%m/%d/%Y')
late_threshold = pd.Timedelta(days=0)
late_threshold2 = pd.Timedelta(days=10)
df[DELIVERY_DATE] = pd.to_datetime(df[DELIVERY_DATE])
df[DESIRED_DATE] = pd.to_datetime(df[DESIRED_DATE])
df[DELAYED_DAYS] = df[DELIVERY_DATE] - df[DESIRED_DATE]
df2 = df[(df['Delivery Date'].dt.year >= int(StartYear)) & (df['Delivery Date'].dt.year <= int(EndYear))]
df3 = df2[ df2[DELAYED_DAYS] > late_threshold]
df3 = df3[late_threshold2 > df3[DELAYED_DAYS]]
df3 = df3.loc[df['Material'].str.contains('20080810', na=False)]
df4 = df2[ df2[DELAYED_DAYS] > late_threshold2]
df4 = df4.loc[df['Material'].str.contains('20080810', na=False)]
df5 = df2[df2[DELAYED_DAYS] <= late_threshold]
df5 = df5.loc[df['Material'].str.contains('20080810', na=False)]
df6 = df2.loc[df['Material'].str.contains('20080810', na=False)]
df7 = df2[ df2[DELAYED_DAYS] > late_threshold]
df7 = df7[late_threshold2 > df7[DELAYED_DAYS]]
df8 = df2[ df2[DELAYED_DAYS] > late_threshold2]
df9 = df2[df2[DELAYED_DAYS] <= late_threshold]
zero = df2.count()
zero2 = df3.count()
zero3 = df4.count()
zero4 = df5.count()
zero5 = df7.count()
zero7 = df9.count()
hey = zero7.iloc[1:1]
print(hey)
print(zero7)
objects = ('1', '2', '3')
y_pos = np.arange(len(objects))
values = [zero5, zero4, zero7]
plt.bar(y_pos, values, align='center', alpha=0.2)
plt.xticks(y_pos, objects)
plt.show()
这是终端中产生的:
Start Year? 2014
End Year? 2018
Series([], dtype: int64)
Material 4936
Delayed Days 4936
dtype: int64
Traceback (most recent call last):
File "C:\Users\khalha\eclipse-workspace\Test3\Test3\gagada.py", line 118, in <module>
plt.bar(y_pos, values, align='center', alpha=0.2)
File "C:\Users\khalha\AppData\Local\Programs\Python\Python37\lib\site-packages\matplotlib\pyplot.py", line 2770, in bar
ret = ax.bar(*args, **kwargs)
File "C:\Users\khalha\AppData\Local\Programs\Python\Python37\lib\site-packages\matplotlib\__init__.py", line 1855, in inner
return func(ax, *args, **kwargs)
File "C:\Users\khalha\AppData\Local\Programs\Python\Python37\lib\site-packages\matplotlib\axes\_axes.py", line 2233, in bar
np.atleast_1d(x), height, width, y, linewidth)
File "C:\Users\khalha\AppData\Local\Programs\Python\Python37\lib\site-packages\numpy\lib\stride_tricks.py", line 249, in broadcast_arrays
shape = _broadcast_shape(*args)
File "C:\Users\khalha\AppData\Local\Programs\Python\Python37\lib\site-packages\numpy\lib\stride_tricks.py", line 184, in _broadcast_shape
b = np.broadcast(*args[:32])
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Csv文件:
Material Delivery Date source desired delivery date
3334678 12/31/2014 12/31/2014
233433 12/31/2014 12/31/2014
3434343 1/5/2015 1/5/2015
3334567 1/5/2015 1/6/2015
546456 2/11/2015 2/21/2015
221295 4/10/2015 4/10/2015
答案 0 :(得分:2)
错误消息中指出
plt.bar(y_pos, values...
matplotlib期望一个1D数组的高度为bar,但是使用values
,您可以提供一个数据帧列表,这些数据帧无法广播到简单的1D数组。
您应该使用标量列表来代替这项工作。
例如
values = [zero5.Material, zero4.Material, zero7.Material]
如果我正确理解您的数据模型。
请注意,如果出于比较原因要绘制两个数组,即在每个y_pos绘制两个条,则可以通过两次调用plt.bar(...)
来完成。首先使用一个数组,然后使用另一个数组,向y-pos数组添加一些y偏移。有关示例,请参见this。
但是-我建议您不要创建太多从csv-import派生的数据框,而是创建一个数据框,其中包含取决于阈值时间的布尔结果,大概已经转换为'int'来计算总和,例如:
df2['thresh1'] = (df2[DELAYED_DAYS] > late_threshold).astype(int)
df2['thresh2'] = (df2[DELAYED_DAYS] > late_threshold).astype(int)
这使您有机会单行计算
zeros = df2.sum()
您叫zeros
的名字。
然后可能是第一个测试
zeros.plot(kind='bar')