检查字符串以了解子字符串的特定格式,如何..?

时间:2019-04-18 10:11:51

标签: python regex string compare substring

两个字符串。我的物品名称:

  

香水名称EDT 50ml

以及竞争对手的商品名称:

  

香水别称EDP 60ml

我在一列中列出了很长的名称,另一列中列出了竞争对手的名称,并且我只想在数据框中保留那些行数,无论其他什么内容,我和竞争对手的名称中都包含相同的ml在这些字符串中看起来像。那么,如何在较大的字符串中找到以'ml'结尾的 substring ?我可以做

"**ml" in competitors_name

查看它们是否包含相同量的毫升。

谢谢

更新

'ml'并不总是在字符串末尾。可能看起来像这样

  

香水又是60ml EDP

3 个答案:

答案 0 :(得分:3)

尝试一下:

import re

def same_measurement(my_item, competitor_item, unit="ml"):
    matcher = re.compile(r".*?(\d+){}".format(unit))
    my_match = matcher.match(my_item)
    competitor_match = matcher.match(competitor_item)
    return my_match and competitor_match and my_match.group(1) == competitor_match.group(1)

my_item = "Parfume name EDT 50ml"
competitor_item = "Parfume another name EDP 50ml"
assert same_measurement(my_item, competitor_item)

my_item = "Parfume name EDT 50ml"
competitor_item = "Parfume another name EDP 60ml"
assert not same_measurement(my_item, competitor_item)

答案 1 :(得分:1)

您可以使用python Regex库为每个数据行选择“ xxml”值,然后执行一些逻辑检查它们是否匹配。

import re

data_rows = [["Parfume name EDT", "Parfume another name EDP 50ml"]]

for data_pairs in data_rows:
    my_ml = None
    comp_ml = None

    # Check for my ml matches and set value
    my_ml_matches = re.search(r'(\d{1,3}[Mm][Ll])', data_pairs[0])
    if my_ml_matches != None:
        my_ml = my_ml_matches[0]
    else:
        print("my_ml has no ml")

    # Check for comp ml matches and set value
    comp_ml_matches = re.search(r'(\d{1,3}[Mm][Ll])', data_pairs[1])     
    if comp_ml_matches != None:
        comp_ml = comp_ml_matches[0]
    else:
        print("comp_ml has no ml")

    # Print outputs
    if (my_ml != None) and (comp_ml != None):
        if my_ml == comp_ml:
            print("my_ml: {0} == comp_ml: {1}".format(my_ml, comp_ml))
        else:
            print("my_ml: {0} != comp_ml: {1}".format(my_ml, comp_ml))

data_rows =数据集中的每一行

其中data_pairs = {您的商品名称,竞争对手的商品名称}

答案 2 :(得分:-1)

您可以使用lambda函数来做到这一点。

import pandas as pd
import re
d = {
    'Us':
        ['Parfume one 50ml', 'Parfume two 100ml'],
    'Competitor':
        ['Parfume uno 50ml', 'Parfume dos 200ml']
}
df = pd.DataFrame(data=d)

df['Eq'] = df.apply(lambda x : 'Yes' if re.search(r'(\d+)ml', x['Us']).group(1) == re.search(r'(\d+)ml', x['Competitor']).group(1) else "No", axis = 1)

结果:

enter image description here

'ml'是否在字符串中间的末尾都没关系。