Python-Pandas数据框合并:多列返回NaN

时间:2019-10-09 20:44:13

标签: python pandas dataframe

我有两个数据帧:一个是每个VM的每日性能指标,另一个是群集级别的详细信息,例如多少个VM /刀片。

我正在尝试使用当天的群集详细信息填充VM性能指标数据框。

df1-每个VM的每日性能指标

+-------------+-----------+---------+------------+------+
| Audit Date  |  Cluster  |   VM    |   Metric   | Peak |
+-------------+-----------+---------+------------+------+
| 2019-10-08  | Cluster A | Server1 | CPU Util % |   88 |
| 2019-10-08  | Cluster A | Server2 | CPU Util % |   34 |
| 2019-10-08  | Cluster B | Server3 | CPU Util % |   89 |
| 2019-10-08  | Cluster B | Server4 | CPU Util % |   92 |
| 2019-10-09  | Cluster A | Server1 | CPU Util % |   88 |
| 2019-10-09  | Cluster A | Server2 | CPU Util % |   34 |
| 2019-10-09  | Cluster B | Server3 | CPU Util % |   89 |
| 2019-10-09  | Cluster B | Server4 | CPU Util % |   92 |
+-------------+-----------+---------+------------+------+

df2-每日群集值

+------------+-----------+---------------+
| Audit Date |  Cluster  | VMs Per Blade |
+------------+-----------+---------------+
| 2019-10-08 | Cluster A |            62 |
| 2019-10-08 | Cluster B |            32 |
| 2019-10-09 | Cluster A |            64 |
| 2019-10-09 | Cluster B |            32 |
+------------+-----------+---------------+

我想要得到的东西:

+------------+-----------+---------+------------+------+---------------+
| Audit Date |  Cluster  |   VM    |   Metric   | Peak | VMs Per Blade |
+------------+-----------+---------+------------+------+---------------+
| 2019-10-08 | Cluster A | Server1 | CPU Util % |   88 |            62 |
| 2019-10-08 | Cluster A | Server2 | CPU Util % |   34 |            62 |
| 2019-10-08 | Cluster B | Server3 | CPU Util % |   89 |            32 |
| 2019-10-08 | Cluster B | Server4 | CPU Util % |   92 |            32 |
| 2019-10-09 | Cluster A | Server1 | CPU Util % |   88 |            64 |
| 2019-10-09 | Cluster A | Server2 | CPU Util % |   34 |            64 |
| 2019-10-09 | Cluster B | Server3 | CPU Util % |   89 |            32 |
| 2019-10-09 | Cluster B | Server4 | CPU Util % |   92 |            32 |
+------------+-----------+---------+------------+------+---------------+

到目前为止我尝试过的是: 我一直在尝试使用以下方法对这些熊猫进行合并:

    df1.merge(df2, how="left", on=["audit date", "cluster"])

但是,当我尝试这样做时,我会在“每个刀片的VM”字段中获得所有NaN。

+------------+-----------+---------+------------+------+---------------+
| Audit Date |  Cluster  |   VM    |   Metric   | Peak | VMs Per Blade |
+------------+-----------+---------+------------+------+---------------+
| 2019-10-08 | Cluster A | Server1 | CPU Util % |   88 | NaN           |
| 2019-10-08 | Cluster A | Server2 | CPU Util % |   34 | NaN           |
| 2019-10-08 | Cluster B | Server3 | CPU Util % |   89 | NaN           |
| 2019-10-08 | Cluster B | Server4 | CPU Util % |   92 | NaN           |
| 2019-10-09 | Cluster A | Server1 | CPU Util % |   88 | NaN           |
| 2019-10-09 | Cluster A | Server2 | CPU Util % |   34 | NaN           |
| 2019-10-09 | Cluster B | Server3 | CPU Util % |   89 | NaN           |
| 2019-10-09 | Cluster B | Server4 | CPU Util % |   92 | NaN           |
+------------+-----------+---------+------------+------+---------------+

我试图将列转换为字符串并将其剥离以确保没有任何尾随空格无效。

df1['audit date'] = df1['audit date'].astype(str).str.strip()

我不确定这是否相关,但是由于重复使用了集群中的分析模型,我的df2数据帧中有一些重复的行。

0 个答案:

没有答案