Question

我想在数据框中执行字符串替换，在其中找到一列中所有“ X”的实例，并将其替换为列名。

ex

Name  FFF1  H0L1
  -    L     -
  -    X     L
  X    -     -
  -    -     X

替换后的结果df

Name     FFF1      H0L1
  -      FFF1        -
  -      FFF1      H0L1
Name      -          -
  -       -        H0L1

这似乎很简单，我只是对如何“引用”列名感到困惑。有想法吗？

Answer 1

“应用”方法将列按序列迭代，其中“名称”属性对应于列名称：

df.apply(lambda col: col.where(~col.str.contains("X"), \
                        col.str.replace("X",col.name)) )

更好：

df.apply(lambda col: col.str.replace("X",col.name))

编辑： 回答其他问题：使用正则表达式：

#df.apply(lambda col: col.str.replace(r"([^X]|^)(X)([^X]|$)",r"\1"+col.name+r"\3")) # didn't work correctly in all situation, e.g.: "aXbXcXd"
df.apply(lambda col: col.str.replace(r"([^X]|^)(X)(?=[^X]|$)",r"\1"+col.name))


"""  The details:
     We create three pattern groups: (...) 
     [^X] can be any char but X (^ in square br. negates the chars)  
     ^ as a separate char means start of string; 
     $ means end of string; 
     | means 'or'. 
     \1 and \2 mean the corresponding groups;
     (?=...) lookahead check
"""

修改2： 如果单元格中总是有一个字符要替换：

df.apply(lambda col: col.replace(["X","L"],col.name))

Answer 2

您可以使用df.where：

df = pd.DataFrame({"A": ['-', 'X'],
                   'B': ['X', '-']})
df.where(df.eq('X'), df.columns)

输出：

   A  B
0  A  X
1  X  B

使用列名作为要替换的字符串值执行字符串替换

2 个答案: