我有一个包含多列的Pandas数据帧。我需要检查记录是否存在,但只能通过某些列进行比较。 有没有比这更好的方法:
#df has the following columns df = pd.DataFrame(columns = ['DIR', FILE_NAME', 'A-HASH', 'P-HASH', 'D-HASH', 'W-HASH','SIZE', 'TAGSIZE', 'FILE_CREATE_DATE'])
df = pd.read_csv(mainDFfile, index_col = ['INDEX'])
for base in filter(functools.partial(re.match, "(?i).*jpe?g$"),
filenames):
fn = os.path.join(dirpath, base)
file_size = os.path.getsize(fn)
if ((df['DIR'] == dirpath) & (df['FILE_NAME'] == base) & (df['SIZE'] == file_size)).any():
print('FILE EXISTS', fn)
else:
#add file to DB