在由现有列中的值组成的数据框中创建新列

时间:2019-08-14 11:36:53

标签: python pandas dataframe

我有一个看起来像这样的数据框:

    if (measureTitle.clientWidth > 585) {
        document.getElementById("titlePixels").style.color = "red";
    }
    else if (measureTitle.clientWidth < 585)
    {
        document.getElementById("titlePixels").style.color = null;
    }

    document.getElementById("titleText").innerText = inputTitle;
    document.getElementById("titlePixels").innerText = width;

我想创建一个新列,其中包含 X Y Corr_Value 0 51182 51389 1.00 1 51182 50014 NaN 2 51182 50001 0.85 3 51182 50014 NaN X列的值。想法是遍历行,如果Y不为null,则新列应显示:

Corr_Value

例如,对于第一行,结果应为:

Solving (X column value) will solve (Y column value) at (Corr_value column)% probability.

这是我写的代码:

Solving 51182 will solve 51389 with 100% probability.

dfs = [] for i in df1.iterrows(): if ([df1['Corr_Value']] != np.nan): a = df1['X'] b = df1['Y'] c = df1['Corr_Value']*100 df1['Remarks'] = (f'Solving {a} will solve {b} at {c}% probability') dfs.append(df1) 是存储df1XY数据的数据框。

但是似乎有一个问题,因为我得到的结果看起来像这样:

enter image description here

但是结果应该像这样:

enter image description here

如果您可以帮助我获得理想的结果,那就太好了。

3 个答案:

答案 0 :(得分:3)

使用DataFrame.dropna删除丢失的行,并使用DataFrame.applyf-string应用于自定义输出字符串:

f = lambda x: f'Solving {int(x["X"])} will solve {int(x["Y"])} at {int(x["Corr_Value"] * 100)}% probability.'
df['Remarks'] = df.dropna(subset=['Corr_Value']).apply(f,axis=1)
print (df)
       X      Y  Corr_Value                                            Remarks
0  51182  51389        1.00  Solving 51182 will solve 51389 at 100% probabi...
1  51182  50014         NaN                                                NaN
2  51182  50001        0.85  Solving 51182 will solve 50001 at 85% probabil...
3  51182  50014         NaN                                                NaN

答案 1 :(得分:2)

您还可以在以下位置使用numpy:

import numpy as np

df['Remarks'] = np.where(df.Corr_Value.notnull(), 'Solving ' + df['X'].astype(str) + ' will solve ' + df['Y'].astype(str) + ' with ' + (df['Corr_Value'] * 100).astype(str) + '% probability', df['Corr_Value'])

输出:

       X      Y  Corr_Value                                            Remarks
0  51182  51389        1.00  Solving 51182 will solve 51389 with 100.0% pro...
1  51182  50014         NaN                                                NaN
2  51182  50001        0.85  Solving 51182 will solve 50001 with 85.0% prob...
3  51182  50014         NaN                                                NaN

答案 2 :(得分:1)

只需尝试:

dfs = []
for i, r in df1.iterrows():
    if (r['Corr_Value'] != np.nan):
        a = r['X']
        b = r['Y']
        c = r['Corr_Value']*100
        df1.at[i, 'Remarks'] = "Solving "+  str(a) + " will solve " + str(b) + " at " + str(c) + " % probability"

我认为问题与使用df1而不是当前行有关。