我有两个数据帧:一个是每个VM的每日性能指标,另一个是群集级别的详细信息,例如多少个VM /刀片。
我正在尝试使用当天的群集详细信息填充VM性能指标数据框。
df1-每个VM的每日性能指标
+-------------+-----------+---------+------------+------+
| Audit Date | Cluster | VM | Metric | Peak |
+-------------+-----------+---------+------------+------+
| 2019-10-08 | Cluster A | Server1 | CPU Util % | 88 |
| 2019-10-08 | Cluster A | Server2 | CPU Util % | 34 |
| 2019-10-08 | Cluster B | Server3 | CPU Util % | 89 |
| 2019-10-08 | Cluster B | Server4 | CPU Util % | 92 |
| 2019-10-09 | Cluster A | Server1 | CPU Util % | 88 |
| 2019-10-09 | Cluster A | Server2 | CPU Util % | 34 |
| 2019-10-09 | Cluster B | Server3 | CPU Util % | 89 |
| 2019-10-09 | Cluster B | Server4 | CPU Util % | 92 |
+-------------+-----------+---------+------------+------+
df2-每日群集值
+------------+-----------+---------------+
| Audit Date | Cluster | VMs Per Blade |
+------------+-----------+---------------+
| 2019-10-08 | Cluster A | 62 |
| 2019-10-08 | Cluster B | 32 |
| 2019-10-09 | Cluster A | 64 |
| 2019-10-09 | Cluster B | 32 |
+------------+-----------+---------------+
我想要得到的东西:
+------------+-----------+---------+------------+------+---------------+
| Audit Date | Cluster | VM | Metric | Peak | VMs Per Blade |
+------------+-----------+---------+------------+------+---------------+
| 2019-10-08 | Cluster A | Server1 | CPU Util % | 88 | 62 |
| 2019-10-08 | Cluster A | Server2 | CPU Util % | 34 | 62 |
| 2019-10-08 | Cluster B | Server3 | CPU Util % | 89 | 32 |
| 2019-10-08 | Cluster B | Server4 | CPU Util % | 92 | 32 |
| 2019-10-09 | Cluster A | Server1 | CPU Util % | 88 | 64 |
| 2019-10-09 | Cluster A | Server2 | CPU Util % | 34 | 64 |
| 2019-10-09 | Cluster B | Server3 | CPU Util % | 89 | 32 |
| 2019-10-09 | Cluster B | Server4 | CPU Util % | 92 | 32 |
+------------+-----------+---------+------------+------+---------------+
到目前为止我尝试过的是: 我一直在尝试使用以下方法对这些熊猫进行合并:
df1.merge(df2, how="left", on=["audit date", "cluster"])
但是,当我尝试这样做时,我会在“每个刀片的VM”字段中获得所有NaN。
+------------+-----------+---------+------------+------+---------------+
| Audit Date | Cluster | VM | Metric | Peak | VMs Per Blade |
+------------+-----------+---------+------------+------+---------------+
| 2019-10-08 | Cluster A | Server1 | CPU Util % | 88 | NaN |
| 2019-10-08 | Cluster A | Server2 | CPU Util % | 34 | NaN |
| 2019-10-08 | Cluster B | Server3 | CPU Util % | 89 | NaN |
| 2019-10-08 | Cluster B | Server4 | CPU Util % | 92 | NaN |
| 2019-10-09 | Cluster A | Server1 | CPU Util % | 88 | NaN |
| 2019-10-09 | Cluster A | Server2 | CPU Util % | 34 | NaN |
| 2019-10-09 | Cluster B | Server3 | CPU Util % | 89 | NaN |
| 2019-10-09 | Cluster B | Server4 | CPU Util % | 92 | NaN |
+------------+-----------+---------+------------+------+---------------+
我试图将列转换为字符串并将其剥离以确保没有任何尾随空格无效。
df1['audit date'] = df1['audit date'].astype(str).str.strip()
我不确定这是否相关,但是由于重复使用了集群中的分析模型,我的df2数据帧中有一些重复的行。