我需要它合并' companies
'数据集和' locations
'关于' target_cw_id
'的数据集AND' source_cw_id
'在' relations
'数据集
公司
row_id cw_id cik company_name source_type source_id
0 1 1 20.0 MOTHER COMPANY filers 35791
1 2 2 1750.0 FATHER COMPANY filers 40788
2 3 3 1800.0 LITTLE SISTER filers 60238
3 4 4 1800.0 MIDDLE SISTER filers 60238
4 5 5 2132.0 BABY BROTHER filers 8286
5 6 6 543.0 NAUGHTY COUSIN filers 8286
6 7 7 4546.0 BIG BROTHER filers 8286
关系
relation_id target_cw_id source_cw_id relation_type relation_origin origin_id year
0 1 3 1 NaN relationships 2507504 2010
1 2 4 1 NaN relationships 824847 2005
2 3 5 2 NaN relationships 841281 2006
3 4 6 2 NaN relationships 864758 2007
4 5 7 2 NaN relationships 1288382 2008
位置
cw_id country_code
0 1 US
1 2 AT
2 3 US
3 4 US
5 5 SU
6 6 US
7 7 US
这可以按预期工作,但我想减少冗余
merged = pd.merge(left=relations, right=companies, left_on="source_cw_id", right_on="cw_id", how="left")
merged = pd.merge(left=merged, right=companies, left_on="target_cw_id", right_on="cw_id", how="left", suffixes=('_source', '_target'))
merged = pd.merge(left=merged, right=locations, left_on="source_cw_id", right_on="cw_id", how="left")
merged = pd.merge(left=merged, right=locations, left_on="target_cw_id", right_on="cw_id", how="left", suffixes=('_source', '_target'))
所以我正在尝试map
和lambda
merged = pd.DataFrame()
dfs = [relations, merged, merged, merged]
dfs2 = [companies, companies, locations, locations]
ids = ["source_cw_id","target_cw_id","source_cw_id","target_cw_id"]
merged = map(lambda x, y, z: pd.merge(left=x, right=y, left_on=z, right_on="cw_id", how="left",suffixes=('_source','_target')), dfs,dfs2,ids)
然而,第一次迭代返回一个列表而不是dataframe,然后我得到一个
KeyError "target_cw_id"
这些是我在最终文件中期望的列名:
[u'relation_id', u'source_cw_id', u'target_cw_id', u'relation_type',
u'relation_origin', u'origin_id', u'year', u'row_id_source',
u'cw_id_source', u'cik_source', u'company_name_source',
u'source_type_source', u'source_id_source', u'row_id_target',
u'cw_id_target', u'cik_target', u'company_name_target',
u'source_type_target', u'source_id_target', u'cw_id_source',
u'country_code_source', u'cw_id_target', u'country_code_target']
任何想法都赞赏!
答案 0 :(得分:0)
首先,您使用context
错误(docs)
param
第一个参数是一个函数,你有正确的。第二个是可迭代的,你有这个错误。
你进行了4次合并,我希望有4个项目可以迭代。我做了以下事情。
map
请记住,我无法对您的数据进行测试,因为您没有提供任何示例数据或预期输出。
MaxU建议阅读https://stackoverflow.com/help/mcve。我也是。