Question

我有一个结构一致的产品名称列表。

product_names = ['brand_name1 product1', 'brand_name2 product2', 'brand_name1 product3', 'brand_name3 product4']

我还从网站过滤器中抓取了一个品牌列表：

brand_names = ['brand_name1','brand_name2','brand_name3']

[品牌名称]中的每个元素都可以在[产品名称]中的多个元素中找到，因为多个产品可以属于同一品牌。

输出：

我想从product_names中提取brand_names，并获得一个包含两列的.csv文件：Brand，Product。

解决方案：

谢谢大家！我尝试自己使用列表理解功能，但将其编码完全错误。

import pandas as pd

product_names = ['brand_name1 product1', 'brand_name2 product2','brand_name1 product3']
brand_names = ['brand_name1','brand_name2','brand_name3']

brands = [i for j in product_names for i in brand_names if i in j]

result = pd.DataFrame(
    {'Brand': brands,
     'Product': product_names
     })

result.to_csv('result.csv', index=False)

Answer 1

您可以尝试以下方法：

for brand in brand_names:
    for product in product_names:
        if (brand in product):
            print(brand,product)

或者其他解决方案是使用generator：

matching=[]
for brand in brand_names:
    matching.append([product for product in product_names if brand in product])

Answer 2

希望这就是您想要的：

data = []
for i in product_names:
    data.append(i.split())
df = pd.DataFrame(data, columns=["Brand", "Product"])
df.to_csv(csv_file_name)

输出：

    Brand       Product
0   brand_name1 product1
1   brand_name2 product2
2   brand_name1 product3
3   brand_name3 product4

Answer 3

在大多数情况下，您的产品名称中没有图案，请使用以下方法：

product_brand = [i for j in product_names for i in brand_names if i in j]

但是，如果有一种模式，您应该利用它来加快流程。

输出：

product_brand
['brand_name1', 'brand_name2', 'brand_name1', 'brand_name3']

要以列的形式写入csv文件，请使用以下方法：

import csv

rows = zip(product_names,product_brand)
with open('file.txt', "w") as f:
    writer = csv.writer(f)
    for row in rows:
        writer.writerow(row)

输出：

brand_name1 product1,brand_name1
brand_name2 product2,brand_name2
brand_name1 product3,brand_name1
brand_name3 product4,brand_name3

Answer 4

这是一种方法：

假设您的品牌名称是独特品牌的列表，您可以尝试以下操作：

import pandas as pd

# Brand and Product lists
product_names = ['brand_name1 product1', 'brand_name2 product2', 'brand_name1 product3', 'brand_name3 product4']
brand_names = ['brand_name1','brand_name2','brand_name3']

# Empty list to save the results
res_ls = []

# Iterate over each brand
for b in brand_names:
    # Select products for your current brand
    brand_prducts = [i for i in product_names if b in i]
    res_ls.append({
        'brand': b,
        'products': ', '.join(brand_prducts)
    })

# Get result as a pandas dataframe
res_df = pd.DataFrame(res_ls)

# Save your dataframe to csv
res_df.to_csv('/path/to/save', index=False)

这是熊猫数据框的外观：

Answer 5

这是我的小版本。首先进行一些排序，并假设品牌和产品名称中可能存在空格。

排序使事情变得更轻松，更好。使用strip()可以避免空格引起的问题。但是，如果产品名称带有空格，并且偶然将某些空格加倍，则认为该品牌名称不同。要解决这个问题，可能需要使用正则表达式。

product_names = ['brand_name1   product1', 'brand name2 product2', 'brand_name1 product 3', ' brand_name3 product4', 'brand name2 product 2']
prbrand_names = ['brand_name1','brand name2','brand_name3']

product_names = sorted( [ s.strip() for s in product_names ] )
prbrand_names = sorted( [ s.strip() for s in prbrand_names ])

with open( "out.csv", "wb") as fpntr:
    cnt = 0
    for bn in prbrand_names:
        # second case is not tested if first is already false -> no IndexError
        while cnt < len( product_names ) and product_names[cnt].startswith( bn ):
            
            pn = product_names[cnt][len( bn ) : ]
            # pn might have unnecessary spaces that can be stripped
            fpntr.write( "{}, {}\n".format( bn, pn.stip() ) )
            cnt += 1

out.csv是：

brand name2, product 2
brand name2, product2
brand_name1, product1
brand_name1, product 3
brand_name3, product4

从字符串列表中提取子字符串列表

5 个答案: