我有两个Pandas DataFrame:
df_topics_temp contains
带有列id
的矩阵df_mapping contains
到id
到parentID
的映射我试图用parent.id
中的df_topics_temp
填充parentID
中的df_mapping
列。
我写了一个使用循环的解决方案,尽管它很麻烦。有用。我对.apply
使用大熊猫df_topics_temp
的解决方案不起作用
解决方案1(有效):
def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False
for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]
if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop
解决方案2(无效):
def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp
df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)
df_topics_temp.head()
第二个解决方案(熊猫.apply
)没有填充parent.id
中的df_topics_temp
列。
感谢您的帮助
<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row
IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')
答案 0 :(得分:0)
如果我理解正确,那么“ apply”会占用一行并返回一行。 因此,您希望函数返回一行。您的返回值。 例如:
#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict({'name':['alice','bob'],'id':[1,2]})
mapping = pd.DataFrame.from_dict({'id':[1,2,3,4],'parent_id':[100,200,100,200]})
#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row
df1.apply(f,axis=1)