如果列字符串值是另一列的子字符串,如何替换

时间:2018-10-10 14:16:50

标签: python pandas data-science

如何使用另一列替换一列的零件字符串值。

我的数据集是:

ID          Product Name                            Size ID    Size Name
1   24 Mantra Ancient Grains Foxtail Millet 500 gm      1       500 gm
2   24 Mantra Ancient Grains Little Millet 500 gm       2       500 gm
3   24 Mantra Naturals Almonds 100 gm                   3       100 gm
4   24 Mantra Naturals Kismis 100 gm                    4       100 gm
5   24 Mantra Organic Ajwain 100 gm                     5       100 gm
6   24 Mantra Organic Apple Blast Drink 250 ml          6       250 ml
7   24 Mantra Organic Apple Juice 1 Ltr Tetra Pack      7       1000 ml
8   24 Mantra Organic Apple Juice 200 ml                8       200 ml
9   24 Mantra Organic Assam Tea 100 gm                  9       100 gm

这里的要求是Product Name列的值为24 Mantra Ancient Grains Foxtail Millet 500 gm,而Size Name列的值为500 Gm。在这种情况下,我的输出将为24 Mantra Ancient Grains Foxtail Millet。 如果Size Name字符串中包含Product Name,则忽略大小写单词而忽略大小写,否则无需采取任何措施。

3 个答案:

答案 0 :(得分:1)

假设您要将“尺寸名称”值替换为“产品名称”的子字符串,则将其替换为“无”

df = pd.DataFrame({
            'Product Name' : ['24 Mantra Ancient Grains Foxtail Millet 500 gm', '24 Mantra Ancient Grains Little Millet 500 gm ', '24 Mantra Naturals Kismis 100 gm'], 
            'Size ID' : [1, 2, 3],
            'Size Name': ['500 gm', '500 gm', '200 gm']
        })

df['same']= df.apply(lambda x: x['Size Name'] in x['Product Name'], axis = 1)
df['Size Name'] = np.where(df['same'], None, df['Size Name'])
df.drop(columns=['same'], inplace = True)
df

  Product Name                                Size ID      Size Name
0   24 Mantra Ancient Grains Foxtail Millet 500 gm  1              None
1   24 Mantra Ancient Grains Little Millet 500 gm   2              None
2   24 Mantra Naturals Kismis 100 gm                3              200 gm

答案 1 :(得分:1)

IIUC,您可以使用apply()replace()

df['Product Name'] = df.apply(lambda x: x['Product Name'].replace(x['Size Name'], '').strip(), axis=1)

收益:

   ID                                    Product Name  Size ID Size Name
0   1         24 Mantra Ancient Grains Foxtail Millet        1    500 gm
1   2          24 Mantra Ancient Grains Little Millet        2    500 gm
2   3                      24 Mantra Naturals Almonds        3    100 gm
3   4                       24 Mantra Naturals Kismis        4    100 gm
4   5                        24 Mantra Organic Ajwain        5    100 gm
5   6             24 Mantra Organic Apple Blast Drink        6    250 ml
6   7  24 Mantra Organic Apple Juice 1 Ltr Tetra Pack        7   1000 ml
7   8                   24 Mantra Organic Apple Juice        8    200 ml
8   9                     24 Mantra Organic Assam Tea        9    100 gm

答案 2 :(得分:0)

假设,您size name始终是最后一列,这是我认为您需要的:

import re

data = '''ID          Product Name                            Size ID    Size Name
1   24 Mantra Ancient Grains Foxtail Millet 500 gm      1       500 gm
2   24 Mantra Ancient Grains Little Millet 500 gm       2       500 gm
3   24 Mantra Naturals Almonds 100 gm                   3       100 gm
4   24 Mantra Naturals Kismis 100 gm                    4       100 gm
5   24 Mantra Organic Ajwain 100 gm                     5       100 gm
6   24 Mantra Organic Apple Blast Drink 250 ml          6       250 ml
7   24 Mantra Organic Apple Juice 1 Ltr Tetra Pack      7       1000 ml
8   24 Mantra Organic Apple Juice 200 ml                8       200 ml
9   24 Mantra Organic Assam Tea 100 gm                  9       100 gm
'''
def cleaner(txt):
    data = txt
    temp = data.split('\n')
    products = temp[1:-1]
    fixed_products = [temp[0]]

    for p in products:
    res = re.search('(\d+\s\w*)$', p)
    try:
        match = res.group(0)
        ignore_from = len(match)
        found_at = p[:-ignore_from].find(match)
        if found_at > -1:#we found a duplicate
            fixed_product = p.replace(match,'',1)
            fixed_products.append(fixed_product)
    except:
        pass
    products = '\n'.join(fixed_products)
    return products

#Example
#cleaner(data)

Here is the result