我有许多作为泡菜文件存储的熊猫文件。我想得到一个熊猫数据框,其中仅包含所有这些泡菜文件的相似熊猫条目。
如何正确使用解压缩运算符*
来解压缩我的熊猫文件列表,以便可以与pd.merge
一起使用,而不是在{{1 }}功能?
pd.merge
我现在得到的错误是:
#!/usr/bin/python3
import pandas as pd
class unify_database:
'''
input list of pkl files
-----------
output: single df file containing only similar entries
'''
def get_similar(*args):
l_pandafiles = []
for pickle_file in args:
print('reading {0}'.format(pickle_file))
l_pandafiles.append(pd.read_pickle(pickle_file))
print(l_pandafiles)
df_similar = pd.merge(*l_pandafiles, on =['ID'],
how='inner', suffixes=('','_y'))
df_similar.drop(df_similar.filter(regex='_y').columns.tolist(), axis=1, inplace=True) # drop columns from df_similar with _y suffix
return df_similar
if __name__ == '__main__':
list_of_pickles = ['pickle_file1.pkl', 'pickle_file2.pkl']
unify_database.get_similar(*list_of_pickles)