如何使用列作为模式合并两个pandas数据帧并包含左数据帧的列?

时间:2017-07-18 23:05:29

标签: python pandas

拥有以下Python代码,尝试使用pd.merge,但似乎关键列需要相同。 尝试使用类似于SQL的东西加入"喜欢"来自df.B的运算符,带有categories.Pattern。

使用更好的数据示例

更新

import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])

    Out[12]:
    A   B
0   1   Gas Station
1   2   Servicenter
2   5   Bakery good bread
3   58  Fresh market MIA
4   76  Auto Liberty aa1121

categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'],  ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])

    Out[13]:
    Category    Pattern
0   Gasoline    Gas Station
1   Gasoline    Servicenter
2   Food    Bakery
3   Food    Fresh market
4   Insurance   Auto Liberty

预期结果是:

    Out[14]:
    A   B                   Category
0   1   Gas Station         Gasoline
1   2   Servicenter         Gasoline
2   5   Bakery good bread   Food
3   58  Fresh market MIA    Food
4   58  Auto Liberty aa1121 Insurance

感谢您的建议/反馈。

2 个答案:

答案 0 :(得分:0)

df['lower'] = df['B'].str.extract(r'([A-z0-9]+)')
categories['lower'] = categories['pattern'].str.extract(r'([A-z0-9]+)')
final = pd.merge(df, categories)

答案 1 :(得分:0)

创建一个新功能,如:

def lookup_table(value, df):
    """

    :param value: value to find the dataframe
    :param df: dataframe which constains the lookup table
    :return: 
        A String representing a the data found
    """
    # Variable Initialization for non found entry in list
    out = None
    list_items = df['Pattern'].tolist()
    for item in list_items:
        if item in value:
            out = item
            break
    return out

将使用数据框作为查找表和参数 value

返回新值

以下完整示例将显示预期的数据帧。

import pandas as pd

df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'],  ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])

def lookup_table(value, df):
    """

    :param value: value to find the dataframe
    :param df: dataframe which constains the lookup table
    :return: 
        A String representing a the data found
    """
    # Variable Initialization for non found entry in list
    out = None
    list_items = df['Pattern'].tolist()
    for item in list_items:
        if item in value:
            out = item
            break
    return out


df['Pattern'] = df['B'].apply(lambda x: lookup_table(x, categories))
final = pd.merge(df, categories)

Expected output