DataFrame中Apply函数的输出

时间:2019-12-13 04:09:51

标签: python pandas dataframe dictionary apply

请考虑以下数据框:

Data=[[0,'ABC SCHOOL BOARDING',['ABC','SCHOOL','BOARDING']],
      [1,'UNIVERSITY BOARDING INSTITUTE',['UNIVERSITY','BOARDING','INSTITUTE']],
      [2,'MARIE INSTITUTE SCHOOL',['MARIE', 'INSTITUTE','SCHOOL']],
      [3,'RALPH ELEMENTARY SCHOOL',['RALPH','ELEMENTARY','SCHOOL']],
      [4,'BOARDING SCHOOL',['BOARDING','SCHOOL']]]

df=pd.DataFrame(Data, columns=['id','name', 'name_list'])

我正在使用apply函数,该函数为每一行返回一个字典:

def classify(row, df_start, df_end):
    #df = pd.DataFrame(columns=['word','classification'])
    d={}
    for word in row.name_list:
        flag=False
        if word in df_start.values:
            #df=df.append(pd.DataFrame({'word':[word], 'classification':['start']}))
            d[word]='start'
            flag=True
        if word in df_end.values:
            #df=df.append(pd.DataFrame({'word':[word], 'classification':['end']}))
            d[word]='end'
            flag=True
        if (not flag):
            #df=df.append(pd.DataFrame({'word':[word], 'classification':['none']}))
            d[word]='none'
    return d

我正在对每一行应用apply来调用上述函数:

df_start=pd.DataFrame(columns=['name'])
df_end=pd.DataFrame(columns=['name'])
df_start= df.name.str.split().str.get(0).drop_duplicates(keep="last")
df_end= df.name.str.split().str.get(-1).drop_duplicates(keep="last")

d={}

d = df.apply(classify, args=[df_start, df_end],axis=1)

for k, v in d.items():
    print(k)

但是,返回的字典中的键值打印如下:

0
1
2
3
4

字典的值如下:

{'ABC': 'start', 'SCHOOL': 'end', 'BOARDING': 'end'}
{'UNIVERSITY': 'start', 'BOARDING': 'end', 'INSTITUTE': 'end'}
{'MARIE': 'start', 'INSTITUTE': 'end', 'SCHOOL': 'end'}
{'RALPH': 'start', 'ELEMENTARY': 'none', 'SCHOOL': 'end'}
{'BOARDING': 'end', 'SCHOOL': 'end'}

在功能classify中打印时,键和值符合预期。这些键:

ABC
SCHOOL
BOARDING
UNIVERSITY
BOARDING
INSTITUTE
MARIE
INSTITUTE
SCHOOL
RALPH
ELEMENTARY
SCHOOL
BOARDING
SCHOOL

这些值:

start
end
end
start
end
end
start
end
end
start
none
end
end
end

为什么从apply函数返回时要加上数字?如何获得期望的字典以将其转换为DataFrame?

感谢您的贡献:)

1 个答案:

答案 0 :(得分:0)

在您的情况下,apply函数将返回pandas Series,而不是字典。检查type(d)。要将d的值合并为一个字典,请使用以下代码:

da = {}
for r in d:
    da.update(r)