用不同DF的信息填充pandas DF cell

时间:2017-10-21 22:14:42

标签: python-3.x pandas conditional-statements

你好到了这个DF。我在这里展示了有趣的专栏。它在行和列中都更大:DF.shape = (185, 34)

enter image description here

如您所见,对于类似滞后的端口,outSpeed字段为空。

我使用outSpeed为LAG计算了不同的DF ...

dfLags = df[df['lag_id'] > 0 ]
dfLags = dfLags.groupby([SITE_IP,'lag_id'])['outSpeed'].sum().reset_index()
dfLags['lag_id'] = 'Lag ' + dfLags['lag_id'].astype(str).str[:-2]
dfLags.rename(columns   = {'lag_id':PORT_NAME}  , inplace = True)

......产生以下内容......

        Site ID     Port     outSpeed
0     10.2.20.5  Lag 112  10000000000
1     10.2.20.5  Lag 122  10000000000
2     10.2.21.3    Lag 1   2000000000
3     10.2.21.3    Lag 3  20000000000
4     10.2.21.3   Lag 10  20000000000
5   10.2.22.123    Lag 2   3000000000
6   10.2.22.123    Lag 3   2000000000
7   10.2.22.123   Lag 10   6000000000
8    10.2.22.21    Lag 1   3000000000
9    10.2.22.21    Lag 3   2000000000
10   10.2.22.21   Lag 10   6000000000
11   10.2.46.52    Lag 3  20000000000
12   10.2.46.52   Lag 10  20000000000

dfLags.shape = (13, 3)

因此,例如,要完成原始DF,我必须使用Site ID = 10.2.46.52 Port = lag 10填写outSpeed = 20000000000

我没有找到一种简单的方法。我的意思是:如何以两个字段(outSpeed)为条件填写原始DF site ID,port字段,当然要记住原始DF更大?

编辑:我已经阅读了[帖子],这与我面临的问题相同但尚无法实现。

他们建议这样做:

values = (dfTemp[[SITE_IP,PORT_NAME]] == dfLags[[SITE_IP,PORT_NAME]]).axis(all=1)

......但是在跑步的时候,我得到了:

ValueError: Can only compare identically-labeled DataFrame objects

我觉得我离得更近了。有什么想法吗?

[post] - Pandas (Python) - Update column of a dataframe from another one with conditions

2 个答案:

答案 0 :(得分:2)

您可以使用mergeadd

首先,一些示例数据:

import pandas as pd

data1 = {"Site ID":["10.2.22.274", "10.2.46.52", "10.2.46.52", "10.2.21.3"],
         "Port":["1/5/7", "Lag 10", "3/2/10", "1/1/7"],
         "outSpeed":[10000000000, None, 10000000000, 3000000000]}

data2 = {"Site ID":["10.2.20.5", "10.2.46.52", "10.2.22.21"],
         "Port":["Lag 112", "Lag 10", "Lag 1"],
         "outSpeed":[10000000000, 20000000000, 3000000000]}

df1 = pd.DataFrame(data1)
df1
     Port      Site ID      outSpeed
0   1/5/7  10.2.22.274  1.000000e+10
1  Lag 10   10.2.46.52           NaN
2  3/2/10   10.2.46.52  1.000000e+10
3   1/1/7    10.2.21.3  3.000000e+09

df2 = pd.DataFrame(data2)
df2
      Port     Site ID     outSpeed
0  Lag 112   10.2.20.5  10000000000
1   Lag 10  10.2.46.52  20000000000
2    Lag 1  10.2.22.21   3000000000

df1中,outSpeed为网站10.2.46.52,端口Lag 10为空。使用df2中的相应值填充 为此,在mergeSite IDPort,然后将两个outSpeed列一起添加到新的outSpeed中,并删除不需要的列:

merged = df1.merge(df2, on=["Site ID", "Port"], how="left")
merged["outSpeed"] = merged.outSpeed_x.add(merged.outSpeed_y, fill_value=0)
merged.drop(["outSpeed_x","outSpeed_y"], 1)

     Port      Site ID      outSpeed
0   1/5/7  10.2.22.274  1.000000e+10
1  Lag 10   10.2.46.52  2.000000e+10
2  3/2/10   10.2.46.52  1.000000e+10
3   1/1/7    10.2.21.3  3.000000e+09

答案 1 :(得分:0)

df1.loc[(df1["Port"]==df2["Port"]) & (df1["outSpeed"].empty), "outSpeed"] = df2.loc[(df1["Port"]==df2["Port"]) & (df1["outSpeed"].empty), "outSpeed"]

请根据您使用的名称进行修改