Question

df（Pandas Dataframe）有三行。

some_col_name
"apple is delicious"
"banana is delicious"
"apple and banana both are delicious"

df.col_name.str.contains("apple|banana")

将捕获所有行：

"apple is delicious",
"banana is delicious",
"apple and banana both are delicious".

如何在str.contains方法上应用AND运算符，以便它只抓取包含BOTH apple＆amp;的字符串。香蕉？

"apple and banana both are delicious"

我想抓住包含10-20个不同单词的字符串（葡萄，西瓜，浆果，橙子等......）

Answer 1

df = pd.DataFrame({'col': ["apple is delicious",
                           "banana is delicious",
                           "apple and banana both are delicious"]})

targets = ['apple', 'banana']

# Any word from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: any(word in sentence for word in targets))
0    True
1    True
2    True
Name: col, dtype: bool

# All words from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: all(word in sentence for word in targets))
0    False
1    False
2     True
Name: col, dtype: bool

Answer 2

您可以按照以下方式执行此操作：

df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]

Answer 3

您也可以使用正则表达式表达式：

df[df['col_name'].str.contains(r'^(?=.*apple)(?=.*banana)')]

然后，您可以将单词列表构建为正则表达式字符串，如下所示：

base = r'^{}'
expr = '(?=.*{})'
words = ['apple', 'banana', 'cat']  # example
base.format(''.join(expr.format(w) for w in words))

将呈现：

'^(?=.*apple)(?=.*banana)(?=.*cat)'

然后你可以动态地完成你的工作。

Answer 4

这有效

df.col.str.contains(r'(?=.*apple)(?=.*banana)',regex=True)

Answer 5

如果你想在句子中至少捕获至少两个单词，也许这样可行（从@Alexander获取提示）：

target=['apple','banana','grapes','orange']
connector_list=['and']
df[df.col.apply(lambda sentence: (any(word in sentence for word in target)) & (all(connector in sentence for connector in connector_list)))]

输出：

                                   col
2  apple and banana both are delicious

如果你有两个以上的单词要用逗号分隔，那么这些单词就会被分开，＆＃39;然后将其添加到connector_list并将第二个条件从all修改为任何

df[df.col.apply(lambda sentence: (any(word in sentence for word in target)) & (any(connector in sentence for connector in connector_list)))]

输出：

                                        col
2        apple and banana both are delicious
3  orange,banana and apple all are delicious

Answer 6

试试这个正则表达式

apple.*banana|banana.*apple

代码是：

import pandas as pd

df = pd.DataFrame([[1,"apple is delicious"],[2,"banana is delicious"],[3,"apple and banana both are delicious"]],columns=('ID','String_Col'))

print df[df['String_Col'].str.contains(r'apple.*banana|banana.*apple')]

<强>输出

   ID                           String_Col
2   3  apple and banana both are delicious

Answer 7

枚举大型列表的所有可能性非常麻烦。更好的方法是使用reduce()和bitwise AND运算符（&）。

例如，请考虑以下DataFrame：

df = pd.DataFrame({'col': ["apple is delicious",
                       "banana is delicious",
                       "apple and banana both are delicious",
                       "i love apple, banana, and strawberry"]})

#                                    col
#0                    apple is delicious
#1                   banana is delicious
#2   apple and banana both are delicious
#3  i love apple, banana, and strawberry

假设我们想要搜索以下所有内容：

targets = ['apple', 'banana', 'strawberry']

我们可以这样做：

#from functools import reduce  # needed for python3
print(df[reduce(lambda a, b: a&b, (df['col'].str.contains(s) for s in targets))])

#                                    col
#3  i love apple, banana, and strawberry

Answer 8

如果您只想使用本机方法而避免编写正则表达式，则下面是不包含lambda的矢量化版本：

const express = require('express');
const bodyParser = require('body-parser');
var request = require("request");

const app = express();

app.use(bodyParser.json());

var token = '';
process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = 0;

function postRequest(options) {
    request(options, function (error, response, body) {
        if (error) throw new Error(error);

        token = (JSON.parse(body)).access_token;
        console.log(token + 'tet');
        return (token);
    });
}

async function requestToken() {
    var options = {
        method: 'POST',
        url: 'https://simplivity@xxxx/api/oauth/token',
        headers: { 'Content-Type': 'application/json' },
        formData:
        {
            grant_type: 'password',
            username: 'administrator@vsphere.local',
            password: 'xxxx'
        }
    };
    try {
        var test = await postRequest(options)
        return (test);
    } catch (error) {
        console.error(error);
    }
}

var test = requestToken();
console.log(test + 'TOT');

pandas dataframe str.contains（）AND操作

8 个答案: