为什么在熊猫数据框上使用.apply时会给出错误的结果?我的循环版本有效

时间:2019-03-28 06:51:49

标签: python pandas dataframe

我有两个Pandas DataFrame:

  1. df_topics_temp contains带有列id的矩阵
  2. df_mapping containsidparentID的映射

我试图用parent.id中的df_topics_temp填充parentID中的df_mapping列。

我写了一个使用循环的解决方案,尽管它很麻烦。有用。我对.apply使用大熊猫df_topics_temp的解决方案不起作用

解决方案1(有效):


def isnan(value):
  try:
      import math
      return math.isnan(float(value))
  except:
      return False

for x in range(0, df_topics_temp['id'].count()):
    topic_id_loop = df_topics_temp['topic.id'].iloc[x]
    mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
    parent_id = mapping_row['parentId'].iloc[0]

    if isnan(parent_id):
        df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
    else:     
        df_topics_temp['parent.id'].iloc[x] = topic_id_loop

解决方案2(无效):


def map_function(x):
        df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
        temp = df_topics_temp['parentId'].iloc[0]
        return temp

df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)

df_topics_temp.head() 

第二个解决方案(熊猫.apply)没有填充parent.id中的df_topics_temp列。

感谢您的帮助

更新1

<ipython-input-68-a2e8d9a21c26> in map_function(row)
      1 def map_function(row):
----> 2         row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
      3         return row

IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')

1 个答案:

答案 0 :(得分:0)

如果我理解正确,那么“ apply”会占用一行并返回一行。 因此,您希望函数返回一行。您的返回值。 例如:

#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict({'name':['alice','bob'],'id':[1,2]})
mapping = pd.DataFrame.from_dict({'id':[1,2,3,4],'parent_id':[100,200,100,200]})

#mapping function
def f(row):
    if any(mapping['id']==row['id']):
        row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
    else: # missing value
        row['parent_id'] = np.nan
    return row

df1.apply(f,axis=1)