Question

我有两个数据帧：一个是每个VM的每日性能指标，另一个是群集级别的详细信息，例如多少个VM /刀片。

我正在尝试使用当天的群集详细信息填充VM性能指标数据框。

df1-每个VM的每日性能指标

+-------------+-----------+---------+------------+------+
| Audit Date  |  Cluster  |   VM    |   Metric   | Peak |
+-------------+-----------+---------+------------+------+
| 2019-10-08  | Cluster A | Server1 | CPU Util % |   88 |
| 2019-10-08  | Cluster A | Server2 | CPU Util % |   34 |
| 2019-10-08  | Cluster B | Server3 | CPU Util % |   89 |
| 2019-10-08  | Cluster B | Server4 | CPU Util % |   92 |
| 2019-10-09  | Cluster A | Server1 | CPU Util % |   88 |
| 2019-10-09  | Cluster A | Server2 | CPU Util % |   34 |
| 2019-10-09  | Cluster B | Server3 | CPU Util % |   89 |
| 2019-10-09  | Cluster B | Server4 | CPU Util % |   92 |
+-------------+-----------+---------+------------+------+

df2-每日群集值

+------------+-----------+---------------+
| Audit Date |  Cluster  | VMs Per Blade |
+------------+-----------+---------------+
| 2019-10-08 | Cluster A |            62 |
| 2019-10-08 | Cluster B |            32 |
| 2019-10-09 | Cluster A |            64 |
| 2019-10-09 | Cluster B |            32 |
+------------+-----------+---------------+

我想要得到的东西：

+------------+-----------+---------+------------+------+---------------+
| Audit Date |  Cluster  |   VM    |   Metric   | Peak | VMs Per Blade |
+------------+-----------+---------+------------+------+---------------+
| 2019-10-08 | Cluster A | Server1 | CPU Util % |   88 |            62 |
| 2019-10-08 | Cluster A | Server2 | CPU Util % |   34 |            62 |
| 2019-10-08 | Cluster B | Server3 | CPU Util % |   89 |            32 |
| 2019-10-08 | Cluster B | Server4 | CPU Util % |   92 |            32 |
| 2019-10-09 | Cluster A | Server1 | CPU Util % |   88 |            64 |
| 2019-10-09 | Cluster A | Server2 | CPU Util % |   34 |            64 |
| 2019-10-09 | Cluster B | Server3 | CPU Util % |   89 |            32 |
| 2019-10-09 | Cluster B | Server4 | CPU Util % |   92 |            32 |
+------------+-----------+---------+------------+------+---------------+

到目前为止我尝试过的是：我一直在尝试使用以下方法对这些熊猫进行合并：

    df1.merge(df2, how="left", on=["audit date", "cluster"])

但是，当我尝试这样做时，我会在“每个刀片的VM”字段中获得所有NaN。

+------------+-----------+---------+------------+------+---------------+
| Audit Date |  Cluster  |   VM    |   Metric   | Peak | VMs Per Blade |
+------------+-----------+---------+------------+------+---------------+
| 2019-10-08 | Cluster A | Server1 | CPU Util % |   88 | NaN           |
| 2019-10-08 | Cluster A | Server2 | CPU Util % |   34 | NaN           |
| 2019-10-08 | Cluster B | Server3 | CPU Util % |   89 | NaN           |
| 2019-10-08 | Cluster B | Server4 | CPU Util % |   92 | NaN           |
| 2019-10-09 | Cluster A | Server1 | CPU Util % |   88 | NaN           |
| 2019-10-09 | Cluster A | Server2 | CPU Util % |   34 | NaN           |
| 2019-10-09 | Cluster B | Server3 | CPU Util % |   89 | NaN           |
| 2019-10-09 | Cluster B | Server4 | CPU Util % |   92 | NaN           |
+------------+-----------+---------+------------+------+---------------+

我试图将列转换为字符串并将其剥离以确保没有任何尾随空格无效。

df1['audit date'] = df1['audit date'].astype(str).str.strip()

我不确定这是否相关，但是由于重复使用了集群中的分析模型，我的df2数据帧中有一些重复的行。

Python-Pandas数据框合并：多列返回NaN

0 个答案: