Question

我有一个字符串。我需要确定数据框中是否存在该字符串中的关键字。

如果存在，我需要返回该关键字。

字符串：

question="Joe is Available"
question=question.upper()
str_list=question.split()
str_list

出[107]：

['JOE', 'IS', 'AVAILABLE']

数据框：

df=pd.DataFrame({"Person1":("Ash","Joe","Harry"),"Person2":("Abe","Lisa","Katty",),"Person3":("Sam","Max","Stone")})
df=df.apply(lambda x: x.astype(str).str.upper())


Person1 Person2 Person3
ASH     ABE     SAM
JOE     LISA    MAX
HARRY   KATTY   STONE

我的尝试：

return_field=""
for x in str_list:
    print(x)
    for i in df.iterrows():
        if(df.str.contains(x)):
            return_field=x

给我AttributeError：'DataFrame'对象没有属性'str'

预期产量

由于乔存在于数据框中，它应该使我返回“乔”。

Answer 1

如果您重复执行此操作，则可能希望通过#include <stdio.h> #include <string.h> #define N 10000000 unsigned char A[N+1]; int primes[N]; int p_count=0; int main(int argc, char **argv) { memset(A, 0, sizeof(A)); for (int i=2; i<=N; i++) { if(A[i])continue; primes[p_count++] = i; for (int j=i; j<=N; j+=i)A[j]=1; } memset(A, 0, sizeof(A)); for(int i=0; i<=N; i++) { if(A[i])continue; printf("%d ", i); fflush(stdout); for(int j=0; j<p_count && i+primes[j]<=N; j++)A[i+primes[j]]=1; } return 0; }对值进行哈希处理。另外，您可以将set与map一起使用，以将数据帧值转换为大写¹：

str.upper

¹您可以使用str_all = set(map(str.upper, df.values.ravel())) question = "Joe is Available" str_search = set(question.upper().split()) res = str_search & str_all # {'JOE'} + pd.DataFrame.apply，但是不建议这样做。目前，众所周知，通过lambda进行字符串操作很慢。在顶部添加pd.Series.str循环会使情况变得更糟。

Answer 2

使用

In [741]: [x for x in str_list if x in df.values]
Out[741]: ['JOE']

如果在数据框列中找到，则返回字符串的关键字

2 个答案: