通过操纵现有列来创建新列

时间:2018-05-03 12:30:04

标签: python pandas numpy

我在数据框中有很多列,我想通过操纵同一数据框中的其他两列来填充一列

col1 | col2 | col3 | col4
nan     1      2      4
2       2      2      3
3       nan    1      2

如果基于col1,col2和col3值存在nan,我想要col1,col2和col3的填充值。

我的代码如下:

indices_of_nan_cell = [(index,col1,col2,col3) for index,(col1,col2,col3) in enumerate(zip(col1,col2,col3)) if str(col1)=='nan' or str(col2)=='nan' or str(col3)=='nan']


for nan_values in indices:
    if np.isnan(nan_values[1]) or nan_values[1] == 'nan':
       read4['col1'][nan_values[0]]=float(nan_values[2])*float(nan_values[3])
    if np.isnan(nan_values[2]) or nan_values[2] == 'nan':
       read4['col2'][nan_values[0]]=float(nan_values[1])/float(nan_values[3])
    if np.isnan(nan_values[3]) or nan_values[3] == 'nan':
       read4['col3'][nan_values[0]]=float(nan_values[1])*float(nan_values[2])

它对我来说很好,但由于我的数据框中有数千行,所以花了很多时间,有没有有效的方法,我们可以做到这一点?

1 个答案:

答案 0 :(得分:2)

我认为只有fillnamul和参数NaN替换fill_value才能替换NaN替换df['col1'] = df['col1'].fillna(df['col2'].mul(df['col3'], fill_value=1)) df['col2'] = df['col2'].fillna(df['col1'].div(df['col3'], fill_value=1)) df['col3'] = df['col3'].fillna(df['col1'].mul(df['col2'], fill_value=1)) print (df) col1 col2 col3 col4 0 2.0 1.0 2 4 1 2.0 2.0 2 3 2 3.0 3.0 1 2 乘法:

NaN

另一种方法只适用于m1 = df['col1'].isna() m2 = df['col2'].isna() m3 = df['col3'].isna() #oldier versions of pandas #m1 = df['col1'].isnull() #m2 = df['col2'].isnull() #m3 = df['col3'].isnull() df.loc[m1, 'col1'] = df.loc[m1, 'col2'].mul(df.loc[m1, 'col3'], fill_value=1) df.loc[m2, 'col2'] = df.loc[m2, 'col1'].div(df.loc[m2, 'col3'], fill_value=1) df.loc[m3, 'col3'] = df.loc[m3, 'col1'].mul(df.loc[m3, 'col2'], fill_value=1) 行:

df.loc[m1, 'col2']

<强>解释

  1. 使用div过滤每列,以获得3个独立的布尔值。
  2. 对于每个蒙版,首先过滤行NaN和多个或分割
  3. 上次分配回来 - 仅替换df.loc[m1, 'col1'],因为 var arrData = typeof JSONData != 'object' ? JSON.parse(JSONData) : JSONData; var xl = ''; if (ShowLabel) { var row = ""; for (var index in arrData[0]) { row += index + ','; } row = row.slice(0, -1); xl += row + '\r\n'; } for (var i = 0; i < arrData.length; i++) { var row = ""; for (var index in arrData[i]) { row += '"' + arrData[i][index] + '",'; } row.slice(0, row.length - 1); xl += row + '\r\n'; } if (xl == '') { this.alertService.error("Invalid data"); return; } var fileName = "file_"; fileName += ReportTitle.replace(/ /g, "_"); var uri = 'data:text/xlsx;application/vnd.openxmlformats;charset=utf-8,' + encodeURI(xl); var link = document.createElement("a"); link.href = uri; link.style.cssText = "visibility:hidden"; link.download = fileName+".xlsx"; document.body.appendChild(link); link.click(); document.body.removeChild(link);
  4. 再次过滤