Question

我正在尝试使用pd.concat将两个数据帧（ df 和 df2 ）合并到一个新的数据帧（ df3 ）中熊猫使用以下代码：

df3 = pd.concat（[df，df2]）

这几乎按我想要的方式工作，但它会产生一个问题。

df 包含当前日期的数据，索引是时间序列。它看起来像这样：

                        Facility    Servers   PUE
2016-10-31  00:00:00    6.0         5.0       1.2
2016-10-31  00:30:00    7.0         5.0       1.4
2016-10-31  01:00:00    6.0         5.0       1.2
2016-10-31  01:30:00    6.0         5.0       1.2
2016-10-31  02:00:00    6.0         5.0       1.2

df2 仅包含NaN数据，索引是一个时间序列，其格式与 df 中的格式相对应，但从较早的日期开始并继续完整年（即17520行对应365 * 48三十分钟的间隔）。看起来基本上是这样的：

                        Facility    Servers   PUE
2016-10-01  00:00:00    NaN         NaN       NaN
2016-10-01  00:30:00    NaN         NaN       NaN
2016-10-01  01:00:00    NaN         NaN       NaN
2016-10-01  01:30:00    NaN         NaN       NaN
2016-10-01  02:00:00    NaN         NaN       NaN
2016-10-01  02:30:00    NaN         NaN       NaN
<continues to 17520 rows, i.e. one year of 30 minute time intervals>

申请时： df3 = pd.concat（[df，df2]）

然后运行 df3.head（），我得到以下内容：

                        Facility    Servers   PUE
2016-10-31  00:00:00    6.0         5.0       1.2
2016-10-31  00:30:00    7.0         5.0       1.4
2016-10-31  01:00:00    6.0         5.0       1.2
2016-10-31  01:30:00    6.0         5.0       1.2
2016-10-31  02:00:00    6.0         5.0       1.2
2016-10-31  02:30:00    NaN         NaN       NaN
2016-10-31  03:00:00    NaN         NaN       NaN
2016-10-31  03:30:00    NaN         NaN       NaN
<continues to the end of the year>

换句话说，代码似乎删除了 df 中数据之前发生的时间间隔的所有NaN数据。任何人都可以建议如何保留 df2 中的所有数据，仅将数据替换为 df 的相应时间间隔？

Answer 1

我认为您indexes print (df2.index.union(df.index)) DatetimeIndex(['2016-10-01 00:00:00', '2016-10-01 00:30:00', '2016-10-01 01:00:00', '2016-10-01 01:30:00', '2016-10-01 02:00:00', '2016-10-01 02:30:00', '2016-10-31 00:00:00', '2016-10-31 00:30:00', '2016-10-31 01:00:00', '2016-10-31 01:30:00', '2016-10-31 02:00:00'], dtype='datetime64[ns]', freq=None) df = df.reindex(df2.index.union(df.index)) print (df) Facility Servers PUE 2016-10-01 00:00:00 NaN NaN NaN 2016-10-01 00:30:00 NaN NaN NaN 2016-10-01 01:00:00 NaN NaN NaN 2016-10-01 01:30:00 NaN NaN NaN 2016-10-01 02:00:00 NaN NaN NaN 2016-10-01 02:30:00 NaN NaN NaN 2016-10-31 00:00:00 6.0 5.0 1.2 2016-10-31 00:30:00 7.0 5.0 1.4 2016-10-31 01:00:00 6.0 5.0 1.2 2016-10-31 01:30:00 6.0 5.0 1.2 2016-10-31 02:00:00 6.0 5.0 1.2 reindex需要union：

var activeDirectoryClientSettings = ActiveDirectoryClientSettings.UsePromptOnly(nativeClientAppCLIENTID, new Uri("https://xxx.azurewebsites.net")); 
return UserTokenProvider.LoginWithPromptAsync(domainName, activeDirectoryClientSettings).Result;

Answer 2

使用 combine_first

result = df1.combine_first(df2)

结果只会从右侧DataFrame中获取值，如果它们在左侧DataFrame中缺失

Python / Pandas与NaN数据合并问题

2 个答案: