Question

我正在寻找一种方法来改变这组数据

 columns0    columns1 columns2   
 row1         bill    bill   
 row2         $0.00   $0.00      
 row3         Free    $1.25
 row4         $1.50   $1.25

进入这个...

 columns0     columns2        columns3   
 row1         bill( match)    bill(match)   
 row2         $0.00(match)   $0.00(match)  
 row3         Free            $1.25    
 row4         $1.50           $1.25

当我使用这个df.loc [（df [＆＃39; columns1＆＃39;] == df [＆＃39; columns2＆＃39;]），：] + =＆＃39; （匹配）＆＃39;

 columns0          columns1         columns2   
 row1**(match)**   bill( match)    bill( match)   
 row2**(match)**   $0.00(match)   $0.00(match)  
 row3              Free            $1.25    
 row4              $1.50           $1.25

我也在列0上得到一个匹配。我只想＆＃34;匹配＆＃34;仅在columns1和columns2上。

我需要一些方法来匹配行或甚至列的相似性以找到匹配。

如果有人有更好的方法可以解决这个问题，或者有资源可以帮助我在python + pandas问题中解决这个问题，请发表评论。

Answer 1

您可以使用.loc访问器，假设您的数据存储为字符串。

df.loc[df['columns1'] == df['columns2'], :] += ' (match)'

如果您的数据未存储为字符串，则必须事先转换它们：

df.loc[:] = df.loc[:].astype(str)

结果：

           columns1       columns2
row1   bill (match)   bill (match)
row2  $0.00 (match)  $0.00 (match)
row3           Free          $1.25
row4          $1.50          $1.25

如果您只想检查索引row1是否匹配：

df.loc[(df['columns1'] == df['columns2']) & (df.index == 'row1'), :] += ' (match)'

#           columns1      columns2
# row1  bill (match)  bill (match)
# row2         $0.00         $0.00
# row3          Free         $1.25
# row4         $1.50         $1.25

更新：当索引提升为列时，解决方案仍然有效。您只需选择要更新的列：

df.loc[df['columns1'] == df['columns2'], ['columns1', 'columns2']] += ' (match)'

#   columns0       columns1       columns2
# 0     row1   bill (match)   bill (match)
# 1     row2  $0.00 (match)  $0.00 (match)
# 2     row3           Free          $1.25
# 3     row4          $1.50          $1.25

将字符串附加到相同的Pandas数据帧

1 个答案: