我目前正在尝试将pd.DataFrame(sports.pubdf
)的单元格中的标记列表与另一个单元格的列表进行比较,并在另一列的列表中生成匹配和非命中。
为此,我使用嵌套循环,如
for i in sports.pubdf.index:
hitlist=[]
falselist=[]
print("Control: {0} of {1}".format(sports.pubdf.index.get_loc(i), len(sports.pubdf.index)))
for meshterm in sports.pubdf['Supplemented MeSH-Terms'][i]:
if meshterm in sports.pubdf['auto-MeSH-Terms'][i]:
hitlist.append(meshterm)
sports.pubdf.set_value(i,'Hits',hitlist)
elif meshterm not in sports.pubdf['auto-MeSH-Terms'][i]:
falselist.append(meshterm)
sports.pubdf.set_value(i, 'Non-Hits',falselist)
这样可行,但它确实很慢(大约3600行的长度需要几分钟)。我知道np.where
应该有可能,但我的版本无效:
sports.pubdf['Hits'] = np.where(pubdf['auto-MeSH-Terms'].isin(sports.pubdf['Supplemented MeSH-Terms'])
它说“对象没有属性'isin'”???
有什么建议吗?