拥有以下Python代码,尝试使用pd.merge,但似乎关键列需要相同。 尝试使用类似于SQL的东西加入"喜欢"来自df.B的运算符,带有categories.Pattern。
使用更好的数据示例更新。
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
Out[12]:
A B
0 1 Gas Station
1 2 Servicenter
2 5 Bakery good bread
3 58 Fresh market MIA
4 76 Auto Liberty aa1121
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'], ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])
Out[13]:
Category Pattern
0 Gasoline Gas Station
1 Gasoline Servicenter
2 Food Bakery
3 Food Fresh market
4 Insurance Auto Liberty
预期结果是:
Out[14]:
A B Category
0 1 Gas Station Gasoline
1 2 Servicenter Gasoline
2 5 Bakery good bread Food
3 58 Fresh market MIA Food
4 58 Auto Liberty aa1121 Insurance
感谢您的建议/反馈。
答案 0 :(得分:0)
df['lower'] = df['B'].str.extract(r'([A-z0-9]+)')
categories['lower'] = categories['pattern'].str.extract(r'([A-z0-9]+)')
final = pd.merge(df, categories)
答案 1 :(得分:0)
创建一个新功能,如:
def lookup_table(value, df):
"""
:param value: value to find the dataframe
:param df: dataframe which constains the lookup table
:return:
A String representing a the data found
"""
# Variable Initialization for non found entry in list
out = None
list_items = df['Pattern'].tolist()
for item in list_items:
if item in value:
out = item
break
return out
将使用数据框作为查找表和参数 value
返回新值以下完整示例将显示预期的数据帧。
import pandas as pd
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'], ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])
def lookup_table(value, df):
"""
:param value: value to find the dataframe
:param df: dataframe which constains the lookup table
:return:
A String representing a the data found
"""
# Variable Initialization for non found entry in list
out = None
list_items = df['Pattern'].tolist()
for item in list_items:
if item in value:
out = item
break
return out
df['Pattern'] = df['B'].apply(lambda x: lookup_table(x, categories))
final = pd.merge(df, categories)