Question

我有一个数据框，我想根据另一个数据框中存在的项目添加一列“存在”。

使用isin函数只能基于其他数据帧以1个匹配项进行回答。当我将要过滤的列设置为索引时，对于位置过滤器也是如此。

当我使用对另一个DF的列表或列的引用时，它无法按预期工作：

table.loc [table.index.isin（tableOther ['column']），：]

在这种情况下，它仅返回1个项目。

import pandas as pd
import numpy as np

# Source that i like to enrich with additional column
table = pd.read_csv('keywordsDataSource.csv', encoding='utf-8', delimiter=';', index_col='Keyword') 

# Source to compare keywords against 
tableSubject = pd.read_csv('subjectDataSource.csv', encoding='utf-8', names=["subjects"])

### This column based check only returns 1 - seemingly random - match ### 
table.loc[table.index.isin(tableSubject['subjects']), : ]


--------------

######## also tried it like this:

# Source that i like to enrich with additional column
table = pd.read_csv('keywordsDataSource.csv', encoding='utf-8', delimiter=';') 

# Source to compare keywords against 
tableSubject = pd.read_csv('subjectDataSource.csv', encoding='utf-8', names=["subjects"])

mask = table['Keyword'].isin(tableSubject.subjects)
table[mask]

我也尝试使用.query并将pd主题列转到一个列表，该列表以与上述相同的单数匹配结果结尾。

由于在所有尝试中输出都是相同的，所以我希望它与数据源有关。

谢谢您的想法！

Answer 1

发现答案就像单词大写一样简单。两种数据源均未使用小写字母设置。一个列表包含这样的大写单词，另一个列表是随机的。

学习方法：请确保将列设置为与所有匹配选项完全相同。

这可以按照以下步骤进行：

table['Keyword'] = table['Keyword'].str.lower()

如果您不需要完全匹配，也可以在这里找到一个很好的答案：

How to test if a string contains one of the substrings in a list, in pandas?

为什么Pandas isin-query-loc函数未找到所有匹配项

1 个答案: