错误消息:x和y必须具有相同的第一尺寸,但形状为(23,)和(1,)

时间:2019-05-19 12:02:51

标签: python python-3.x

我有一个excel文件,我想在特定列中绘制特定范围的行 我可以访问所需的列和行,但是每当我运行代码时,此错误消息就会出现

  

x和y必须具有相同的第一尺寸,但形状为(23,)和(1,)   消费表的第一行是:

enter image description here

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import math

start_date = "2017-07-24 00:00:00"
end_date = "2019-03-09 23:00:00"

data_generation2 = pd.read_excel(r"C:\\Users\\user\\Desktop\\Master\\Thesis\\Tarek\\Parent.xlsx", index_col="Timestamp")

data_consumption2 = pd.read_excel(r"C:\\Users\\user\\Desktop\\Master\\Thesis\\Tarek\\Parent.xlsx", index_col="Timestamp", sheetname="Consumption")

data_generation = data_generation2.loc[start_date:end_date] 
data_consumption = data_consumption2.loc[start_date:end_date] 
data_consumption2["Timestamp"] = 
pd.to_datetime(data_consumption2["Timestamp"], format="%d/%m/%y %H%M:%S")
df_to_plot = data_consumption2.loc[(data_consumption2.Timestamp >= start_date) & (data_consumption2.Timestamp < end_date)]
df_to_plot.set_index(df_to_plot.Timestamp)
df_to_plot = df_to_plot[df_to_plot.columns[1:]]  

我希望输出为2053G列中前720行相对于时间列的图。

1 个答案:

答案 0 :(得分:1)

我会尝试给您一些提示。

读取数据后,应将date列(字符串)转换为datetime格式。 to_datetime函数完成(doc)的工作。

然后,您需要按date进行过滤。完成后,可以使用loc来完成。

现在,我们要绘制数据框。熊猫模块提供了plot方法(doc)。让我们尝试使用它。默认情况下,它将绘制所有列。在我们的案例中,我们想要绘制日期消费量。一种方法是将date列转换为索引。 set_index方法可以为您提供(doc)

完成后,就可以进行绘制了!

这里是一个示例,假设其具有以下数据帧:

  • 第一列是日期(dd / MM / YYYY HH:MM:ss)
  • 接下来的20列是每户的用电量
import pandas as pd
import matplotlib.pyplot as plt

start_date = "2019-01-01 00:00:00"
end_date = "2019-01-02 00:00:00"

data_generation2 = pd.read_excel("your_path")

# transform date column (type string) to datetime type
data_generation2["Date"] = pd.to_datetime(data_generation2["Date"], format="%d/%m/%y %H%M:%S")
print(data_generation2)
#                      Date    House1    House2    House3    House4  ...   House17   House18   House19   House20
# 0   2019-01-01 00: 00: 00  5.087228  6.416912  0.997723  4.094832  ...  0.462354  5.105688  7.504992  4.688377
# 1   2019-01-01 00: 01: 00  5.343098  3.205145  8.874018  0.567703  ...  5.593759  3.584948  5.503817  7.027469
# 2   2019-01-01 00: 02: 00  5.894551  6.411896  3.208524  6.582306  ...  9.149066  5.485656  0.820700  6.391030
# 3   2019-01-01 00: 03: 00  5.674840  6.558950  3.043355  6.759310  ...  2.270176  3.513524  1.833294  9.889432
# 4   2019-01-01 00: 04: 00  9.022593  4.088398  6.082084  8.095797  ...  9.270887  4.947194  2.505969  7.396191
# 5   2019-01-01 00: 05: 00  9.217690  1.950666  5.684551  7.782337  ...  6.990156  1.430377  2.269080  7.343031
#                       ...                                                              ...
# 715 2019-01-30 01: 09: 00  6.892563  0.553806  0.744839  3.151984  ...  9.418145  2.575782  0.801205  4.290434
# 716 2019-01-30 02: 00: 00  0.894091  8.872986  8.445927  2.781972  ...  0.832707  4.264321  6.972225  4.155494
# 717 2019-01-30 02: 01: 00  0.079396  5.270916  0.452875  5.248232  ...  5.331906  4.026245  6.598595  1.637325
# 718 2019-01-30 02: 02: 00  5.287946  4.745940  8.479177  7.713943  ...  0.642650  9.735159  2.499371  7.824247
# 719 2019-01-30 02: 03: 00  1.376087  6.262784  5.621571  4.937615  ...  0.801079  5.506654  6.878042  2.430062
# [720 rows x 23 columns]

# Filter according the date interval
df_to_plot = data_generation2.loc[(data_generation2.Date >= start_date) & (data_generation2.Date < end_date)]

# Transform the columns of date as index
df_to_plot.set_index(df_to_plot.Date)
df_to_plot = df_to_plot[df_to_plot.columns[1:]] # Remove date columns
print(df_to_plot)
# [720 rows x 21 columns]
#       House1    House2    House3  ...   House17   House18   House19   House20
# 0   5.087228  6.416912  0.997723  ...  0.462354  5.105688  7.504992  4.688377
# 1   5.343098  3.205145  8.874018  ...  5.593759  3.584948  5.503817  7.027469
# 2   5.894551  6.411896  3.208524  ...  9.149066  5.485656  0.820700  6.391030
# 3   5.674840  6.558950  3.043355  ...  2.270176  3.513524  1.833294  9.889432
# 4   9.022593  4.088398  6.082084  ...  9.270887  4.947194  2.505969  7.396191
# 5   9.217690  1.950666  5.684551  ...  6.990156  1.430377  2.269080  7.343031
# ...
# 20  8.908322  7.452523  9.046442  ...  9.786092  6.370266  5.635464  8.560888
# 21  7.666228  1.294614  2.034965  ...  0.091431  3.210088  9.038868  1.492585
# 22  8.266986  6.209395  9.689687  ...  6.500988  6.133358  2.237199  7.289748
# 23  2.687560  6.852061  5.136214  ...  0.815944  8.267767  0.244232  7.705181
# [24 rows x 20 columns]



# create a figure (2 x 2)
fig, axes = plt.subplots(nrows=2, ncols=2)

# Daily chart
ax = df_to_plot.plot(ax=axes[0, 0], title="Daily consumption", legend=False)
ax.set_xlabel("Days")
ax.set_ylabel("Consumption")

# Cumulative chart
ax = df_to_plot.sort_index().cumsum().plot(ax=axes[0, 1], title="Cumuative consumption")
ax.set_xlabel("Days")
ax.set_ylabel("Cumulative consumption")
ax.legend(ncol=3)

# Histogram chart house 1
ax = df_to_plot.House1.plot(ax=axes[1, 0], kind="bar",
                       title="House 1 daily consumption", rot=1)
ax.set_xlabel("Days")
ax.set_ylabel("Consumption")

# Pie chart
df_to_plot.sum().sort_values().plot(
    ax=axes[1, 1], kind="pie", title="Pie chart consumption")

# Main title
fig.suptitle("From " + start_date +' to ' +end_date + ' summary')
plt.show()

enter image description here

您要做的主要任务是重建数据框。我不能为你。完成后,先前的代码可以为您提供一些想法。