Question

我有一个带有模型ID和相关值的数据框。这些列是日期，client_id，model_id，category1，category2，颜色和价格。我有一个简单的烧瓶应用程序，用户可以在其中选择型号ID，并将其添加到他们的“购买”历史记录中。基于模型ID，我想在数据框中添加一行，并带入category1，category2，颜色和价格的关联值。使用Pandas的最佳方法是什么？我知道在Excel中我会使用vlookup，但是我不确定如何使用Python进行操作。假设每个型号ID的category1，category2，颜色和价格都是唯一的。

client_id = input("ENTER Model ID:  ")      
model_id = input("ENTER Model ID:  ")
def update_history(df, client_id, model_id):
        today=pd.to_datetime('today')
        #putting in tmp but just need to "lookup" these values from the original dataframe somehow
        df.loc[len(df)]=[today, client_id, model_id, today, 'tmp', 'tmp','tmp', 'tmp'] 
        return df

Answer 1

下面的代码向现有数据框添加具有新值的新行。新值列表可以传递给函数。

导入库

/* Attach the date picker to a jQuery selection.
 * @param  target   element - the target input field or division or span
 * @param  settings  object - the new settings to use for this date picker instance (anonymous)
 */
_attachDatepicker: function( target, settings ) {
    var nodeName, inline, inst;
    nodeName = target.nodeName.toLowerCase();
    inline = ( nodeName === "div" || nodeName === "span" );
    if ( !target.id ) {
        this.uuid += 1;
        target.id = "dp" + this.uuid;
    }
    inst = this._newInst( $( target ), inline );
    inst.settings = $.extend( {}, settings || {} );
    if ( nodeName === "input" ) {
        this._connectDatepicker( target, inst );
    } else if ( inline ) {
        this._inlineDatepicker( target, inst );
    }
}

创建示例数据框

import pandas as pd
import numpy as np
import datetime

功能

model_id = ['M1', 'M2', 'M3']
today = ['2018-01-01', '2018-01-02', '2018-01-01']
client_id = ['C1', 'C2', 'C3']
category1 = ['orange', 'apple', 'beans']
category2 = ['fruit', 'fruit', 'grains']
df = pd.DataFrame({'today':today, 'model_id': model_id, 'client_id':client_id,
                   'category1': category1, 'category2':category2})
df['today'] = pd.to_datetime(df['today'])
df

调用函数以将具有新值的行追加到现有数据框

def update_history(df, client_id, model_id, category1, category2):
        today=pd.to_datetime('today')
        # Create a temp dataframe with new values. 
        # Column names in this dataframe should match the existing dataframe
        temp = pd.DataFrame({'today':[today], 'model_id': [model_id], 'client_id':[client_id],
                   'category1': [category1], 'category2':[category2]})
        df = df.append(temp)
        return df

Answer 2

您可以尝试一下。如果一次要添加多个行，则将字典添加到列表，然后一次将它们添加到数据帧会更快。

modelid = ['MOD1', 'MOD2', 'MOD3']
today = ['2018-07-15', '2018-07-18', '2018-07-20']
clients = ['CLA', 'CLA', 'CLB']
cat_1 = ['CAT1', 'CAT2', 'CAT3']
cat_2 = ['CAT11', 'CAT12', 'CAT13']

mdf = pd.DataFrame({"model_id": modelid, "today": today, "client_id": clients, "cat_1":cat_1, "cat_2":cat_2})

def update_history(df, client_id, model_id):
    today = pd.to_datetime('today')
    row = df[df.model_id==model_id].iloc[0]
    rows_list = []
    dict = {"today":today, "client_id":client_id,
        "model_id":model_id,"cat_1":row["cat_1"],
        "cat_2":row["cat_2"]}
    rows_list.append(dict)
    df2 = pd.DataFrame(rows_list)
    df = df.append(df2) 
    return df



mdf = update_history(mdf,"CLC","MOD1")

Answer 3

这就是我最终要做的。我仍然认为还有一个更优雅的解决方案，所以请让我知道！

#create dataframe
modelid = ['MOD1', 'MOD2', 'MOD3']
today = ['2018-07-15', '2018-07-18', '2018-07-20']
clients = ['CLA', 'CLA', 'CLB']
cat_1 = ['CAT1', 'CAT2', 'CAT3']
cat_2 = ['CAT11', 'CAT12', 'CAT13']

mdf = pd.DataFrame({"model_id": modelid, "today": today, "client_id": clients, "cat_1":cat_1, "cat_2":cat_2})
#reorder columns
mdf = mdf[['cat_1', 'cat_2', 'model_id', 'client_id', 'today']] 

#create lookup table
lookup=mdf[['cat_1','cat_2','model_id']]
lookup.drop_duplicates(inplace=True)

#get values
client_id = input("ENTER Client ID:  ")      
model_id = input("ENTER Model ID:  ")

#append model id to list
model_id_lst=[]
model_id_lst.append(model_id)

today=pd.to_datetime('today')

#grab associated cat_1, and cat_2 from lookup table
temp=lookup[lookup['model_id'].isin(model_id_lst)]
out=temp.values.tolist()
out[0].extend([client_id, today])

#add this as a row to the df
mdf.loc[len(mdf)]=out[0]

熊猫使用查找值添加行

3 个答案: