在列上应用函数以创建另一个列

时间:2019-05-14 02:44:40

标签: python-3.x pandas dataframe dictionary apply

我正在尝试在“数据框”列上应用函数以评估和分类行值。我为每种情况定义了该函数,并将此函数应用到该列上,但是出现两个错误。

我试图在循环外定义函数,采用三个参数而不是一个,在循环内定义函数,仅采用一个值,但是它们都具有相同的错误。

for i in list(df['segment'].unique()): 
    temp = df.query('segment== "%s"' %i)
    for t in list(temp['area_tipe'].unique()):
        temp2 = temp.query('area_tipe== "%s"' %t)
        a = temp2.quantile(q=0.33)
        b = temp2.quantile(q=0.66)
        def classifierprice(x):
            if float(x) < a:
                rep = 'low'
            elif float(x) > a:
                if float(x) < b:
                    rep = 'medium'
            elif float(x) > b:
                rep = 'high'
            return rep 
        temp2['price_class'] = temp2['price'].map(lambda x: classifierprice(x), axis=1)

TypeError: map() got an unexpected keyword argument 'axis'

使用Apply而不是map时,我遇到了相同的错误,如果我删除了轴,则同时应用和map时,我得到了以下代码/错误:

for i in list(df['segment'].unique()): 
    temp = df.query('segment== "%s"' %i)
    for t in list(temp['area_tipe'].unique()):
        temp2 = temp.query('area_tipe== "%s"' %t)
        a = temp2.quantile(q=0.33)
        b = temp2.quantile(q=0.66)
        def classifierprice(x):
            if float(x) < a:
                rep = 'low'
            elif float(x) > a:
                if float(x) < b:
                    rep = 'medium'
            elif float(x) > b:
                rep = 'high'
            return rep 
        temp2['price_class'] = temp2['price'].map(lambda x: classifierprice(x))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

有人知道如何解决吗?

我正在另一种分类中执行相同的映射方法,该方法不涉及拆分数据框及其正常工作,如下所示:

def grow(x):
    if x > 0:
        a = 'growing'
    elif x < 0:
        a = 'declining'
    else:
        a = 'constant'
    return a

insights["text"] = (insights["score"].map(grow))

1 个答案:

答案 0 :(得分:1)

您需要使用.quantile()方法在此提取实际值,我们得到了一个包含1个值的序列对象,但是pandas并不理解它认为我们正在比较一个序列的单个值,因此错误,我们使用.values[0]

获取内部数字
import pandas as pd
import numpy as np

### making some sample data
df = pd.DataFrame({"area_tipe":np.random.choice(["m","n","o"],100)
                    , "price" : np.random.randint(1,10,100)    
                    , "segment":np.random.choice(["p","q","r"],100)})

### keeping the function ot of the for loop
def classifierprice(x, a, b):
    x = float(x)
    if x <= a:
        rep = 'low'
    elif a < x < b:
        rep = 'medium'
    elif x >= b:
        rep = 'high'
    return rep 

for i in list(df['segment'].unique()): 
    temp = df.query('segment== "%s"' %i)
    for t in list(temp['area_tipe'].unique()):
        temp2 = temp.query('area_tipe== "%s"' %t)

        a = temp2.quantile(q=0.33).values[0]
        b = temp2.quantile(q=0.66).values[0]
        temp2['price_class'] = temp2['price'].apply(lambda x: classifierprice(x,a,b))

输出:

enter image description here

您可以无循环地执行此操作,您将立即获得所有输出df! -尝试作为入门-

def grouped_classifierprice(df_filt):
    a = df_filt.quantile(q=0.33).values[0]
    b = df_filt.quantile(q=0.66).values[0]
    return df_filt.price.apply(lambda x: classifierprice(x,a,b))

outdf = df.groupby(["area_tipe","segment"]).apply(grouped_classifierprice)