Python-用另一列中的值替换字符串

时间:2018-07-12 06:04:36

标签: python pandas numpy

我有一个包含以下数据的数据框:

average_x,  average_y,  average_z,  Result
1,2,3,x | y
4,5,6,x | y |z
8,7,9,z
11,12,31,x | z
67,56,43,y | z

,并且要求将结果列中的值替换为相应列中的值:

Result

Average X is 1 | Average Y is 2 
Average X is 4 | Average Y is 5 | Average Z is 6 
Average Z is 9 
Average X is 11 | Average Z is 31 
Average Y is 56 | Average Z is 43 

我尝试了以下代码,但最终收到错误消息:

df_test['Result']=np.where(df_test['Result'].str.contains('x'),df_test['Result'].astype(np.str).replace(to_replace='x',"Average X is " + df_test[average_x]),df_test['Result'])

df_test['Result']=np.where(df_test['Result'].str.contains('y'),df_test['Result'].astype(np.str).replace(to_replace='y',"Average Y is " + df_test[average_y]),df_test['Result'])

df_test['Result']=np.where(df_test['Result'].str.contains('z'),df_test['Result'].astype(np.str).replace(to_replace='z',"Average X is " + df_test[average_z]),df_test['Result'])

但是收到以下错误消息:

df_test['Result']=np.where(df_test['Result'].str.contains('x'),df_test['Result'].astype(np.str).replace(to_replace='x',"Average X is " + df_test[average_x]),df_test['Result'])
  File "<ipython-input-69-50ca75be0ce5>", line 1
    df_test['Result']=np.where(df_test['Result'].str.contains('x'),df_test['Result'].astype(np.str).replace(to_replace='x',"Average X is " + df_test[average_x]),df_test['Result'])
                                                                                                                          ^
SyntaxError: positional argument follows keyword argument

请建议如何解决此问题,因为我有将近14-15个关键字,其中的值也需要用其各自列中的值替换为文本来替换。

谢谢。

最好的问候, 索拉比

3 个答案:

答案 0 :(得分:0)

问题出在以下方面:

.replace(to_replace='x',"Average X is " + df_test[average_x])

假设这是一种pandas.DataFrame.replace方法,并假设您想对value使用第二个位置参数,则可以将to_replace=关键字参数片段作为消息放在异常建议,或在第二个参数中添加value=。基本上:

.replace('x', "Average X is " + df_test[average_x])

.replace(to_replace='x', value="Average X is " + df_test[average_x])

应该适合您的情况。

答案 1 :(得分:0)

使用apply()Result上拆分|,然后在构造新的average_?输出时捕获相关的Result列:

df.apply(
    lambda row: " | ".join(
        ["Average {} is {}".format(x.upper(), row["average_{}".format(x)]) 
         for x in row.Result.split("|")]
    ), axis=1)

输出:

0                     Average X is 1 | Average Y is 2
1    Average X is 4 | Average Y is 5 | Average Z is 6
2                                      Average Z is 9
3                   Average X is 11 | Average Z is 31
4                   Average Y is 56 | Average Z is 43
dtype: object

您还可以将事物移至一个函数中,这使其更具可读性:

def describe_results(row):
    results = row.Result.split("|")
    updated = ["Average {} is {}".format(x.upper(), row["average_{}".format(x)]) for x in results]
    return " | ".join(updated)

df.apply(describe_results, axis=1)

数据:

df
   average_x  average_y  average_z Result
0          1          2          3    x|y
1          4          5          6  x|y|z
2          8          7          9      z
3         11         12         31    x|z
4         67         56         43    y|z

注意:我使用提供的原始数据中的df.Result = df.Result.str.replace(" ","")来消除Result中的间距。

答案 2 :(得分:0)

感谢大家,通过以下代码成功解决了问题:

for i in range(df_test.shape[0]):
if "x" in df_test.ix[i,"Result"]:
    df_test.ix[i,"Result"]=df_test.ix[i,"Result"].replace("x","Average X is " + df_test.ix[i,"average_x"].astype(np.str))

for i in range(df_test.shape[0]):
if "y" in df_test.ix[i,"Result"]:
    df_test.ix[i,"Result"]=df_test.ix[i,"Result"].replace("y","Average Y is " + df_test.ix[i,"average_y"].astype(np.str))

for i in range(df_test.shape[0]):
if "z" in df_test.ix[i,"Result"]:
    df_test.ix[i,"Result"]=df_test.ix[i,"Result"].replace("z","Average Z is " + df_test.ix[i,"average_z"].astype(np.str))

BR // Saurabh