Question

我的数据框df1看起来像 -

user     data                               dep                    
1        ['dep_78','fg7uy8']                78
2        ['the_dep_45','34_dep','re23u']    45
3        ['fhj56','dep_89','hgjl09']        91

我想专注于包含字符串“dep”的值的“data”列，并查看附加到该字符串的数字是否与“dep”列中的数字匹配。例如，用户1的数据列中的dep_78与dep列中的dep 78匹配。我想输出不匹配的行。所以结果应该给我 -

user     data                      dep
2        ['the_dep_45','34_dep']   45
3        ['dep_89']                91

问题是只使用带有字符串“dep”的数据列中的特定值，然后将附加了这些字符串的数字与“dep”列进行比较。

Answer 1

这个怎么样？

import re

r = re.compile('\d+')

idx = df.apply(lambda x: str(x['dep']) in r.search(x['data']).group(0), axis=1)

0     True
1     True
2    False
dtype: bool


df[idx]

   user                             data  dep
0     1              ['dep_78','fg7uy8']   78
1     2  ['the_dep_45','34_dep','re23u']   45

Answer 2

你可以这样做

def select(row):
    keystring = 'dep_'+str(row['dep'])
    result = []
    for one in row['data']:
        if (one!=keystring)&('dep' in one):
            result.append(one)
    return result

df['data'] =df.apply(lambda x:select(x),axis=1)
df['datalength'] = df['data'].map(lambda x:len(x))
result = df[df['datalength']>0][df.columns[:3]]
print(result)
   user                  data  dep
1     2  [the_dep_45, 34_dep]   45
2     3              [dep_89]   91

找出两个列值之间的不匹配

2 个答案: