Question

我想对DataFrame的所有列（第一列除外）进行搜索，并添加一个新列（例如'Column_Match'），其名称为匹配列。

我尝试过这样的事情：

df.apply(lambda row: row.astype(str).str.contains('my_keyword').any(), axis=1)

但是它不排除第一列，而且我不知道如何返回并添加列名。

任何帮助，不胜感激！

Answer 1

如果要让每行第一个匹配值的列名添加新列（不匹配），请使用DataFrame.assign和DataFrame.idxmax作为列名：

df = pd.DataFrame({
         'B':[4,5,4,5,5,4],
         'A':list('abcdef'),
         'C':list('akabbe'),
         'F':list('eakbbb')
})


f = lambda row: row.astype(str).str.contains('e')
df['new'] = df.iloc[:,1:].apply(f, axis=1).assign(missing=True).idxmax(axis=1)
print (df)
   B  A  C  F      new
0  4  a  a  e        F
1  5  b  k  a  missing
2  4  c  a  k  missing
3  5  d  b  b  missing
4  5  e  b  b        A
5  4  f  e  b        C

如果需要所有匹配值的所有列名称，请创建布尔型DataFrame并使用点乘积，其列名称分别为DataFrame.dot和Series.str.rstrip：

f = lambda row: row.astype(str).str.contains('a')
df1 = df.iloc[:,1:].apply(f, axis=1)
df['new'] = df1.dot(df.columns[1:] + ', ').str.rstrip(', ').replace('', 'missing')
print (df)
   B  A  C  F      new
0  4  a  a  e     A, C
1  5  b  k  a        F
2  4  c  a  k        C
3  5  d  b  b  missing
4  5  e  b  b  missing
5  4  f  e  b  missing

在所有DataFrame列中搜索值（第一列除外！），并添加具有匹配列名的新列

1 个答案: