df(Pandas Dataframe)有三行。
some_col_name
"apple is delicious"
"banana is delicious"
"apple and banana both are delicious"
df.col_name.str.contains("apple|banana")
将捕获所有行:
"apple is delicious",
"banana is delicious",
"apple and banana both are delicious".
如何在str.contains方法上应用AND运算符,以便它只抓取包含BOTH apple&的字符串。香蕉?
"apple and banana both are delicious"
我想抓住包含10-20个不同单词的字符串(葡萄,西瓜,浆果,橙子等......)
答案 0 :(得分:12)
df = pd.DataFrame({'col': ["apple is delicious",
"banana is delicious",
"apple and banana both are delicious"]})
targets = ['apple', 'banana']
# Any word from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: any(word in sentence for word in targets))
0 True
1 True
2 True
Name: col, dtype: bool
# All words from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: all(word in sentence for word in targets))
0 False
1 False
2 True
Name: col, dtype: bool
答案 1 :(得分:11)
您可以按照以下方式执行此操作:
df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]
答案 2 :(得分:5)
您也可以使用正则表达式表达式:
df[df['col_name'].str.contains(r'^(?=.*apple)(?=.*banana)')]
然后,您可以将单词列表构建为正则表达式字符串,如下所示:
base = r'^{}'
expr = '(?=.*{})'
words = ['apple', 'banana', 'cat'] # example
base.format(''.join(expr.format(w) for w in words))
将呈现:
'^(?=.*apple)(?=.*banana)(?=.*cat)'
然后你可以动态地完成你的工作。
答案 3 :(得分:4)
这有效
df.col.str.contains(r'(?=.*apple)(?=.*banana)',regex=True)
答案 4 :(得分:2)
如果你想在句子中至少捕获至少两个单词,也许这样可行(从@Alexander获取提示):
target=['apple','banana','grapes','orange']
connector_list=['and']
df[df.col.apply(lambda sentence: (any(word in sentence for word in target)) & (all(connector in sentence for connector in connector_list)))]
输出:
col
2 apple and banana both are delicious
如果你有两个以上的单词要用逗号分隔,那么这些单词就会被分开,'然后将其添加到connector_list并将第二个条件从all修改为任何
df[df.col.apply(lambda sentence: (any(word in sentence for word in target)) & (any(connector in sentence for connector in connector_list)))]
输出:
col
2 apple and banana both are delicious
3 orange,banana and apple all are delicious
答案 5 :(得分:1)
试试这个正则表达式
apple.*banana|banana.*apple
代码是:
import pandas as pd
df = pd.DataFrame([[1,"apple is delicious"],[2,"banana is delicious"],[3,"apple and banana both are delicious"]],columns=('ID','String_Col'))
print df[df['String_Col'].str.contains(r'apple.*banana|banana.*apple')]
<强>输出强>
ID String_Col
2 3 apple and banana both are delicious
答案 6 :(得分:1)
枚举大型列表的所有可能性非常麻烦。更好的方法是使用reduce()
和bitwise AND运算符(&
)。
例如,请考虑以下DataFrame:
df = pd.DataFrame({'col': ["apple is delicious",
"banana is delicious",
"apple and banana both are delicious",
"i love apple, banana, and strawberry"]})
# col
#0 apple is delicious
#1 banana is delicious
#2 apple and banana both are delicious
#3 i love apple, banana, and strawberry
假设我们想要搜索以下所有内容:
targets = ['apple', 'banana', 'strawberry']
我们可以这样做:
#from functools import reduce # needed for python3
print(df[reduce(lambda a, b: a&b, (df['col'].str.contains(s) for s in targets))])
# col
#3 i love apple, banana, and strawberry
答案 7 :(得分:1)
如果您只想使用本机方法而避免编写正则表达式,则下面是不包含lambda的矢量化版本:
const express = require('express');
const bodyParser = require('body-parser');
var request = require("request");
const app = express();
app.use(bodyParser.json());
var token = '';
process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = 0;
function postRequest(options) {
request(options, function (error, response, body) {
if (error) throw new Error(error);
token = (JSON.parse(body)).access_token;
console.log(token + 'tet');
return (token);
});
}
async function requestToken() {
var options = {
method: 'POST',
url: 'https://simplivity@xxxx/api/oauth/token',
headers: { 'Content-Type': 'application/json' },
formData:
{
grant_type: 'password',
username: 'administrator@vsphere.local',
password: 'xxxx'
}
};
try {
var test = await postRequest(options)
return (test);
} catch (error) {
console.error(error);
}
}
var test = requestToken();
console.log(test + 'TOT');