Question

有没有办法检查字符串的任何部分是否与python中的另一个字符串匹配？

例如：我的网址看起来像这样

url = pd.DataFrame({'urls' : ['www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA', 'www.ulta.com/beautyservices/benefitbrowbar/']})

我的字符串看起来像：

string_list = ['Benefit Cosmetics', 'Anastasia Beverly Hills']
string = '|'.join(string_list)

我想将string与url相匹配。

Anastasia Beverly Hills www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA和

{p} www.ulta.com/beautyservices/benefitbrowbar/与Benefit Cosmetics。

我一直在尝试url['urls'].str.contains('('+string+')', case = False)，但这不匹配。

什么是正确的方法？

Answer 1

我不能在一行中使用正则表达式，但这是我尝试使用itertools和任何：

import pandas as pd
from itertools import product

url = pd.DataFrame({'urls' : ['www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA', 'www.ulta.com/beautyservices/benefitbrowbar/']})
string_list = ['Benefit Cosmetics', 'Anastasia Beverly Hills']

"""
For each of Cartesian product (the different combinations) of 
string_list and urls.
"""
for x in list(product(string_list, url['urls'])):
    """
    If any of the words in the string (x[0]) are present in 
    the URL (x[1]) disregarding case.
    """
    if any (word.lower() in x[1].lower() for word in x[0].split()):
        """
        Show the match.
        """
        print ("Match String: %s URL: %s" % (x[0], x[1]))

输出：

Match String: Benefit Cosmetics URL: www.ulta.com/beautyservices/benefitbrowbar/
Match String: Anastasia Beverly Hills URL: www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA

<强>更新

你看待它的方式你也可以使用：

import pandas as pd
import warnings
pd.set_option('display.width', 100)
"""
Supress the warning it will give on a match.
"""
warnings.filterwarnings("ignore", 'This pattern has match groups')
string_list = ['Benefit Cosmetics', 'Anastasia Beverly Hills']
"""
Create a pandas DataFrame.
"""
url = pd.DataFrame({'urls' : ['www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA', 'www.ulta.com/beautyservices/benefitbrowbar/']})
"""
Using one string at a time.
"""
for string in string_list:
    """
    Get the individual words in the string and concatenate them 
    using a pipe to create a regex pattern. 
    """
    s = "|".join(string.split())
    """
    Update the DataFrame with True or False where the regex 
    matches the URL.
    """
    url[string] = url['urls'].str.contains('('+s+')', case = False)
"""
Show the result
"""
print (url)

将输出：

                                                urls Benefit Cosmetics Anastasia Beverly Hills
0  www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00...             False                    True
1        www.ulta.com/beautyservices/benefitbrowbar/              True                   False

我想，如果你想在DataFrame中使用它，可能会更好，但我更喜欢第一种方式。

包含在另一个字符串regex python中的字符串的一部分

1 个答案: