我有两个数据帧(df1,df2),我想在df1中创建一个新列,该列指示每个数据帧之间的多个列中是否存在匹配,可能匹配或不匹配的情况。 df1:
<ul class="pagination pull-center" style="direction: rtl">
<?php
global $connect,$tbl_users;
$total = $connect->query("SELECT COUNT(*) FROM $tbl_users");
$count=$total->fetch();
$countUser=$count['0'];
$countUsers=ceil($countUser/5) ;
?>
<?php for ($p=1;$p<=$countUsers;$p++): ?>
<li class=""><a value="../../actions/selectAlluser.php?page=<?php echo $p?>" class="paginationUrl"><?php echo $p ?></a></li>
<?php endfor; ?>
</ul>
df2:
id a b c d name
a1 94 18 10 20 b1
a2 20 18 1 2 b4,b5
a3 21 18 34 32 b2,b3,b4
a4 216 5 56 76 b5
a5 210 5 10 30 b4,b5
基本上名称是df2的ID。 我想将df1的名称与df2的ID匹配,并根据以下条件创建新列。
id a b c d
b1 94 5 10 20
b2 A150 5 13 45
b3 167 5 4 -1
b4 210 5 40 80
b5 216 5 60 80
结果
Match : df1['a','b','c','d'] = df2['a','b','c','d']
likely match : df1['a','b'] = df2['a','b'] & c or d +- 10 is fine
Missmatch: df1['a','b'] = df2['a','b'] but column c & d > +- 10
Missing: df1 record not in df2
答案 0 :(得分:0)
您期望的结果是错误的。您翻转了df1['id'] == 'a4'
和df1['id'] == 'a5'
的列值,则列名不同。不过,您可以使用np.select
df2['name'] = df1['name'].str.split(',')
conditions = [
((df2.apply(lambda x: x['id'] in x['name'], axis=1)) & (df1[['a','b','c','d']] == df2[['a','b','c','d']]).any(axis=1)),
((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) <= 10).any(axis=1)),
((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) >= 10).any(axis=1)),
]
choices = [
'Match',
'Likely',
'Missmatch',
]
df1['Status'] = np.select(conditions,choices,default='Missing')
结果:
id a b c d name Status
0 a1 94 18 10 20 b1 Match
1 a2 20 18 1 2 b4,b5 Missing
2 a3 21 18 34 32 b2,b3,b4 Missing
3 a4 216 5 56 76 b5 Likely
4 a5 210 5 10 30 b4,b5 Match