我正在尝试使用pd.concat将两个数据帧( df 和 df2 )合并到一个新的数据帧( df3 )中熊猫使用以下代码:
df3 = pd.concat([df,df2])
这几乎按我想要的方式工作,但它会产生一个问题。
df 包含当前日期的数据,索引是时间序列。它看起来像这样:
Facility Servers PUE
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
df2 仅包含NaN数据,索引是一个时间序列,其格式与 df 中的格式相对应,但从较早的日期开始并继续完整年(即17520行对应365 * 48三十分钟的间隔)。看起来基本上是这样的:
Facility Servers PUE
2016-10-01 00:00:00 NaN NaN NaN
2016-10-01 00:30:00 NaN NaN NaN
2016-10-01 01:00:00 NaN NaN NaN
2016-10-01 01:30:00 NaN NaN NaN
2016-10-01 02:00:00 NaN NaN NaN
2016-10-01 02:30:00 NaN NaN NaN
<continues to 17520 rows, i.e. one year of 30 minute time intervals>
申请时: df3 = pd.concat([df,df2])
然后运行 df3.head(),我得到以下内容:
Facility Servers PUE
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
2016-10-31 02:30:00 NaN NaN NaN
2016-10-31 03:00:00 NaN NaN NaN
2016-10-31 03:30:00 NaN NaN NaN
<continues to the end of the year>
换句话说,代码似乎删除了 df 中数据之前发生的时间间隔的所有NaN数据。任何人都可以建议如何保留 df2 中的所有数据,仅将数据替换为 df 的相应时间间隔?
答案 0 :(得分:1)
我认为您indexes
print (df2.index.union(df.index))
DatetimeIndex(['2016-10-01 00:00:00', '2016-10-01 00:30:00',
'2016-10-01 01:00:00', '2016-10-01 01:30:00',
'2016-10-01 02:00:00', '2016-10-01 02:30:00',
'2016-10-31 00:00:00', '2016-10-31 00:30:00',
'2016-10-31 01:00:00', '2016-10-31 01:30:00',
'2016-10-31 02:00:00'],
dtype='datetime64[ns]', freq=None)
df = df.reindex(df2.index.union(df.index))
print (df)
Facility Servers PUE
2016-10-01 00:00:00 NaN NaN NaN
2016-10-01 00:30:00 NaN NaN NaN
2016-10-01 01:00:00 NaN NaN NaN
2016-10-01 01:30:00 NaN NaN NaN
2016-10-01 02:00:00 NaN NaN NaN
2016-10-01 02:30:00 NaN NaN NaN
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
reindex
需要union
:
var activeDirectoryClientSettings = ActiveDirectoryClientSettings.UsePromptOnly(nativeClientAppCLIENTID, new Uri("https://xxx.azurewebsites.net"));
return UserTokenProvider.LoginWithPromptAsync(domainName, activeDirectoryClientSettings).Result;
答案 1 :(得分:1)
使用 combine_first
result = df1.combine_first(df2)
结果只会从右侧DataFrame中获取值,如果它们在左侧DataFrame中缺失