如何解析特定值的比较时间范围比较的日期

时间:2015-10-03 13:21:48

标签: python excel pandas

我有一张带有两张纸的Excel文件。

('表1')包含数据:

DATE       TMAX TMIN
20110706    317 211
20110707    322 211
20110708    317 211
20110709    322 211
20110710    328 222
20110711    333 244
20110712    356 250
20110713    356 222

和另一个('表2')包括:

Start Date  End Date    Rep Month    Cost    kWh     kW 
7/6/2011    8/3/2011    July     5,065.17    76,640      205 
8/3/2011    9/7/2011    August   5,572.38    86,640      195 

我的目标是在('表1和#39;)上写下另一栏,了解kwh值是否属于特定日期范围('表2')

例如:

DATE        TMAX    TMIN    kWh
20110706    317   211   76640
20110707    322   211   76640
20110708    317   211   76640
20110709    322   211   76640
20110710    328   222   76640
20110711    333   244   76640
20110712    356   250   76640
20110713    356   222   76640
20110801    344   228   76640
20110802    356   200   76640
20110803    367   200   86640
20110804    361   228   86640

我不知道为什么我的代码导致df [" kWh"]为空(' NaN')导致写入空白的kWh列('工作表Sheet&#39)

以下是我的代码:

import pandas as pd
from pandas import ExcelWriter

df = pd.read_excel("thecddhddtest.xlsx",'Sheet1')
df2 = pd.read_excel("thecddhddtest.xlsx",'Sheet2')
df.head()


df["DATE"] = pd.to_datetime(df["DATE"], format="%Y%m%d")
pd.to_datetime(df2["Start Date"], format="%m/%d/%Y")

df3 = df2.set_index("Start Date")

df["DATE"] = pd.to_datetime(df["DATE"], format="%Y%m%d")
df2["Start Date"] = pd.to_datetime(df2["Start Date"], format="%m/%d/%Y")

df3["kWh"].reindex(df["DATE"], method="ffill")
df["kWh"] = df3["kWh"].reindex(df["DATE"], method="ffill")
print(df["kWh"])


writer = ExcelWriter('thecddhddtestkWh.xlsx')
df.to_excel(writer,'Sheet1',index=False)
df2.to_excel(writer,'Sheet2',index=False)
writer.save()

导致:

DATE       TMAX TMIN kWh
20110706    317 211
20110707    322 211
20110708    317 211
20110709    322 211
20110710    328 222
20110711    333 244
20110712    356 250
20110713    356 222

1 个答案:

答案 0 :(得分:1)

尝试我的解决方案 - 仅将DATE中的列df设置为index,然后设置reindex

df["DATE"] = pd.to_datetime(df["DATE"], format="%Y%m%d")
#set column DATE to index
df = df.set_index("DATE")
df2["Start Date"] = pd.to_datetime(df2["Start Date"], format="%m/%d/%Y")
df3 = df2.set_index("Start Date")
#reindex by index of df
df["kWh"] = df3["kWh"].reindex(df.index, method="ffill")
print(df["kWh"])
#DATE
#2011-07-06    76,640
#2011-07-07    76,640
#2011-07-08    76,640
#2011-07-09    76,640
#2011-07-10    76,640
#2011-07-11    76,640
#2011-07-12    76,640
#2011-07-13    76,640
#2011-08-01    76,640
#2011-08-02    76,640
#2011-08-03    86,640
#2011-08-04    86,640
#Name: kWh, dtype: object