将数据列表与CSV文件进行比较并对匹配进行排序

时间:2018-04-13 06:36:25

标签: python pandas analytics data-analysis text-analysis

我有一个产品名称和品牌列表的数据集。 我需要找到我的清单中有多少品牌产品。

**Brands sample :** ['HM International', 'Sara', 'Wildcraft', 'Nike']
**Product name sample :** [Attache backpack11Green Waterproof Backpack
Simba BTSPOKEMON POKÈMON POKÈ BALLS 18 BP Waterproof S...
HM International HMHTPB 24304MK Waterproof Multipurpos...
Chris & Kate CKB_122SS Waterproof School Bag
Simba BTSPRINCESS FOLLOW YOUR DREAMS 16 BP Waterproof ...
Kuber Industries School Bag, Backpack Waterproof School...
Minnie Trio School Bag Waterproof School Bag
Thomas School Bag Waterproof School Bag
Sara Green 002 Shoulder Bag
Disney Frozen Anna & Elsa Pink Sequins 16' ' Backpack
Disney Princess Pink Flap 18' ' Backpack
My Baby Excel Peppa Side Sling Bag Sling Bag
Ranger Black School Bag with laptop compartment Waterpr...
HM International HMHTPB 73279AV Waterproof Multipurpos...
Peppa Peppa Pig Pink Plush Toy Wallet Round Shape Plush...
Disney Frozen Anna & Elsa Pink Sequins 14' ' Backpack
Disney Frozen Magic Blue 16' ' School Bag
Good Friends stylish Waterproof School Bag
ZEVORA Pink 3D Design Children Travel & School Bag, 1 L...
Gleam A103 School Bag
SARA BAGS TG15 Waterproof Backpack
Despicable Me Favourite Subject School Bag 16 inches Tr...
AARIP LTB037 Waterproof School Bag
Simba BTSSMURFS FOOTBALL 18 BP Waterproof School Bag
Gleam JB0402C Waterproof School Bag
Simba BTSSMURFS SMURFETTE SINGING STAR 18 BP Waterproo... ]

1 个答案:

答案 0 :(得分:-1)

我建议使用str.findallword boundary regex一起搜索多个值,然后展平嵌套列表并使用Counter

from collections import Counter

Brands = ['HM International', 'Sara', 'Wildcraft', 'Nike']
pat = r'\b{}\b'.format('|'.join(Brands))

d = Counter([y for x in df['Product'].str.findall(pat) for y in x])
print (d)

Counter({'HM International': 2, 'Sara': 1})

或者,如果要在输出中Series使用Series.value_counts

s = pd.Series(np.concatenate(df['Product'].str.findall(pat))).value_counts()
print (s)
HM International    2
Sara                1
dtype: int64

<强>设置

d = {'Product': ['Attache backpack11Green Waterproof Backpack', 'Simba BTSPOKEMON POKÈMON POKÈ BALLS 18 BP Waterproof S...', 'HM International HMHTPB 24304MK Waterproof Multipurpos...', 'Chris & Kate CKB_122SS Waterproof School Bag', 'Simba BTSPRINCESS FOLLOW YOUR DREAMS 16 BP Waterproof ...', 'Kuber Industries School Bag, Backpack Waterproof School...', 'Minnie Trio School Bag Waterproof School Bag', 'Thomas School Bag Waterproof School Bag', 'Sara Green 002 Shoulder Bag', "Disney Frozen Anna & Elsa Pink Sequins 16' ' Backpack", "Disney Princess Pink Flap 18' ' Backpack", 'My Baby Excel Peppa Side Sling Bag Sling Bag', 'Ranger Black School Bag with laptop compartment Waterpr...', 'HM International HMHTPB 73279AV Waterproof Multipurpos...', 'Peppa Peppa Pig Pink Plush Toy Wallet Round Shape Plush...', "Disney Frozen Anna & Elsa Pink Sequins 14' ' Backpack", "Disney Frozen Magic Blue 16' ' School Bag", 'Good Friends stylish Waterproof School Bag', 'ZEVORA Pink 3D Design Children Travel & School Bag, 1 L...', 'Gleam A103 School Bag', 'SARA BAGS TG15 Waterproof Backpack', 'Despicable Me Favourite Subject School Bag 16 inches Tr...', 'AARIP LTB037 Waterproof School Bag', 'Simba BTSSMURFS FOOTBALL 18 BP Waterproof School Bag', 'Gleam JB0402C Waterproof School Bag', 'Simba BTSSMURFS SMURFETTE SINGING STAR 18 BP Waterproo']}
df = pd.DataFrame(d)
print (df.head())
                                             Product
0        Attache backpack11Green Waterproof Backpack
1  Simba BTSPOKEMON POKÈMON POKÈ BALLS 18 BP Wate...
2  HM International HMHTPB 24304MK Waterproof Mul...
3       Chris & Kate CKB_122SS Waterproof School Bag
4  Simba BTSPRINCESS FOLLOW YOUR DREAMS 16 BP Wat...