熊猫地图返回列表

时间:2016-05-25 12:56:36

标签: python dictionary pandas lambda

我需要它合并' companies'数据集和' locations'关于' target_cw_id'的数据集AND' source_cw_id'在' relations'数据集

公司

 row_id     cw_id   cik     company_name    source_type     source_id
0   1   1   20.0    MOTHER COMPANY  filers  35791
1   2   2   1750.0  FATHER COMPANY  filers  40788
2   3   3   1800.0  LITTLE SISTER   filers  60238
3   4   4   1800.0  MIDDLE SISTER   filers  60238
4   5   5   2132.0  BABY BROTHER    filers  8286
5   6   6   543.0   NAUGHTY COUSIN  filers  8286
6   7   7   4546.0  BIG BROTHER     filers  8286

关系

    relation_id     target_cw_id    source_cw_id    relation_type   relation_origin     origin_id   year
0   1   3   1   NaN     relationships   2507504     2010
1   2   4   1   NaN     relationships   824847  2005
2   3   5   2   NaN     relationships   841281  2006
3   4   6   2   NaN     relationships   864758  2007
4   5   7   2   NaN     relationships   1288382     2008

位置

    cw_id   country_code
0   1   US
1   2   AT
2   3   US
3   4   US
5   5   SU
6   6   US
7   7   US

这可以按预期工作,但我想减少冗余

merged = pd.merge(left=relations, right=companies, left_on="source_cw_id", right_on="cw_id", how="left")
merged = pd.merge(left=merged, right=companies, left_on="target_cw_id", right_on="cw_id", how="left",  suffixes=('_source', '_target'))
merged = pd.merge(left=merged, right=locations, left_on="source_cw_id", right_on="cw_id", how="left")
merged = pd.merge(left=merged, right=locations, left_on="target_cw_id", right_on="cw_id", how="left",  suffixes=('_source', '_target'))

所以我正在尝试maplambda

merged = pd.DataFrame()

dfs = [relations, merged, merged, merged]
dfs2 = [companies, companies, locations, locations]
ids = ["source_cw_id","target_cw_id","source_cw_id","target_cw_id"]

merged = map(lambda x, y, z: pd.merge(left=x, right=y, left_on=z, right_on="cw_id", how="left",suffixes=('_source','_target')), dfs,dfs2,ids)

然而,第一次迭代返回一个列表而不是dataframe,然后我得到一个

KeyError "target_cw_id"

这些是我在最终文件中期望的列名:

[u'relation_id', u'source_cw_id', u'target_cw_id', u'relation_type',
       u'relation_origin', u'origin_id', u'year', u'row_id_source',
       u'cw_id_source', u'cik_source', u'company_name_source',
       u'source_type_source', u'source_id_source', u'row_id_target',
       u'cw_id_target', u'cik_target', u'company_name_target',
       u'source_type_target', u'source_id_target', u'cw_id_source',
       u'country_code_source', u'cw_id_target', u'country_code_target']

任何想法都赞赏!

1 个答案:

答案 0 :(得分:0)

首先,您使用context错误(docs

param

第一个参数是一个函数,你有正确的。第二个是可迭代的,你有这个错误。

你进行了4次合并,我希望有4个项目可以迭代。我做了以下事情。

map

请记住,我无法对您的数据进行测试,因为您没有提供任何示例数据或预期输出。

MaxU建议阅读https://stackoverflow.com/help/mcve。我也是。