pandas dataframe str.contains()AND操作

时间:2016-05-03 18:29:15

标签: python string pandas dataframe

df(Pandas Dataframe)有三行。

some_col_name
"apple is delicious"
"banana is delicious"
"apple and banana both are delicious"

df.col_name.str.contains("apple|banana")

将捕获所有行:

"apple is delicious",
"banana is delicious",
"apple and banana both are delicious".

如何在str.contains方法上应用AND运算符,以便它只抓取包含BOTH apple&的字符串。香蕉?

"apple and banana both are delicious"

我想抓住包含10-20个不同单词的字符串(葡萄,西瓜,浆果,橙子等......)

8 个答案:

答案 0 :(得分:12)

df = pd.DataFrame({'col': ["apple is delicious",
                           "banana is delicious",
                           "apple and banana both are delicious"]})

targets = ['apple', 'banana']

# Any word from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: any(word in sentence for word in targets))
0    True
1    True
2    True
Name: col, dtype: bool

# All words from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: all(word in sentence for word in targets))
0    False
1    False
2     True
Name: col, dtype: bool

答案 1 :(得分:11)

您可以按照以下方式执行此操作:

df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]

答案 2 :(得分:5)

您也可以使用正则表达式表达式:

df[df['col_name'].str.contains(r'^(?=.*apple)(?=.*banana)')]

然后,您可以将单词列表构建为正则表达式字符串,如下所示:

base = r'^{}'
expr = '(?=.*{})'
words = ['apple', 'banana', 'cat']  # example
base.format(''.join(expr.format(w) for w in words))

将呈现:

'^(?=.*apple)(?=.*banana)(?=.*cat)'

然后你可以动态地完成你的工作。

答案 3 :(得分:4)

这有效

df.col.str.contains(r'(?=.*apple)(?=.*banana)',regex=True)

答案 4 :(得分:2)

如果你想在句子中至少捕获至少两个单词,也许这样可行(从@Alexander获取提示):

target=['apple','banana','grapes','orange']
connector_list=['and']
df[df.col.apply(lambda sentence: (any(word in sentence for word in target)) & (all(connector in sentence for connector in connector_list)))]

输出:

                                   col
2  apple and banana both are delicious

如果你有两个以上的单词要用逗号分隔,那么这些单词就会被分开,'然后将其添加到connector_list并将第二个条件从all修改为任何

df[df.col.apply(lambda sentence: (any(word in sentence for word in target)) & (any(connector in sentence for connector in connector_list)))]

输出:

                                        col
2        apple and banana both are delicious
3  orange,banana and apple all are delicious

答案 5 :(得分:1)

试试这个正则表达式

apple.*banana|banana.*apple

代码是:

import pandas as pd

df = pd.DataFrame([[1,"apple is delicious"],[2,"banana is delicious"],[3,"apple and banana both are delicious"]],columns=('ID','String_Col'))

print df[df['String_Col'].str.contains(r'apple.*banana|banana.*apple')]

<强>输出

   ID                           String_Col
2   3  apple and banana both are delicious

答案 6 :(得分:1)

枚举大型列表的所有可能性非常麻烦。更好的方法是使用reduce()bitwise AND运算符(&)。

例如,请考虑以下DataFrame:

df = pd.DataFrame({'col': ["apple is delicious",
                       "banana is delicious",
                       "apple and banana both are delicious",
                       "i love apple, banana, and strawberry"]})

#                                    col
#0                    apple is delicious
#1                   banana is delicious
#2   apple and banana both are delicious
#3  i love apple, banana, and strawberry

假设我们想要搜索以下所有内容:

targets = ['apple', 'banana', 'strawberry']

我们可以这样做:

#from functools import reduce  # needed for python3
print(df[reduce(lambda a, b: a&b, (df['col'].str.contains(s) for s in targets))])

#                                    col
#3  i love apple, banana, and strawberry

答案 7 :(得分:1)

如果您只想使用本机方法而避免编写正则表达式,则下面是不包含lambda的矢量化版本:

const express = require('express');
const bodyParser = require('body-parser');
var request = require("request");

const app = express();

app.use(bodyParser.json());

var token = '';
process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = 0;

function postRequest(options) {
    request(options, function (error, response, body) {
        if (error) throw new Error(error);

        token = (JSON.parse(body)).access_token;
        console.log(token + 'tet');
        return (token);
    });
}

async function requestToken() {
    var options = {
        method: 'POST',
        url: 'https://simplivity@xxxx/api/oauth/token',
        headers: { 'Content-Type': 'application/json' },
        formData:
        {
            grant_type: 'password',
            username: 'administrator@vsphere.local',
            password: 'xxxx'
        }
    };
    try {
        var test = await postRequest(options)
        return (test);
    } catch (error) {
        console.error(error);
    }
}

var test = requestToken();
console.log(test + 'TOT');