在两个不同的DataFrames Pandas中匹配字符串值

时间:2018-09-21 11:04:55

标签: python pandas dataframe matching

我有两个数据帧(df1,df2),我想在df1中创建一个新列,该列指示每个数据帧之间的多个列中是否存在匹配,可能匹配或不匹配的情况。 df1:

 <ul class="pagination pull-center" style="direction: rtl">
                    <?php
                    global $connect,$tbl_users;
                    $total = $connect->query("SELECT COUNT(*) FROM $tbl_users");
                    $count=$total->fetch();
                    $countUser=$count['0'];
                    $countUsers=ceil($countUser/5) ;
                    ?>
                    <?php for ($p=1;$p<=$countUsers;$p++): ?>
                    <li class=""><a value="../../actions/selectAlluser.php?page=<?php echo $p?>" class="paginationUrl"><?php echo $p ?></a></li>
                    <?php endfor; ?>

                </ul>

df2:

id  a   b   c   d   name
a1  94  18  10  20  b1
a2  20  18  1   2   b4,b5
a3  21  18  34  32  b2,b3,b4
a4  216 5   56  76  b5
a5  210 5   10  30  b4,b5

基本上名称是df2的ID。 我想将df1的名称与df2的ID匹配,并根据以下条件创建新列。

id  a   b   c   d
b1  94  5   10  20
b2  A150    5   13  45
b3  167 5   4   -1
b4  210 5   40  80
b5  216 5   60  80

结果

Match : df1['a','b','c','d'] = df2['a','b','c','d']  
likely match : df1['a','b'] = df2['a','b'] & c or d +- 10 is fine
Missmatch: df1['a','b'] = df2['a','b'] but column c & d > +- 10
Missing: df1 record not in df2

1 个答案:

答案 0 :(得分:0)

您期望的结果是错误的。您翻转了df1['id'] == 'a4'df1['id'] == 'a5'的列值,则列名不同。不过,您可以使用np.select

df2['name'] = df1['name'].str.split(',')

conditions = [
    ((df2.apply(lambda x: x['id'] in x['name'], axis=1)) & (df1[['a','b','c','d']] == df2[['a','b','c','d']]).any(axis=1)),
    ((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) <= 10).any(axis=1)),
    ((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) >= 10).any(axis=1)),
]

choices = [
    'Match',
    'Likely',
    'Missmatch',
]

df1['Status'] = np.select(conditions,choices,default='Missing')

结果:

    id  a   b   c   d   name      Status
0   a1  94  18  10  20  b1         Match
1   a2  20  18  1   2   b4,b5      Missing
2   a3  21  18  34  32  b2,b3,b4   Missing
3   a4  216 5   56  76  b5         Likely
4   a5  210 5   10  30  b4,b5      Match