Python Pandas 条件逻辑/在合并数据帧时使用 where

时间:2021-01-18 18:01:34

标签: python pandas merge conditional-statements

我有这些 DF

df1

user_id     code     name     code_equivalence             name_equivalence
51          123    bi lovers            542                bi for marketing
51          123    bi lovers            545                i love bi
51          234    datascience          345                data and science
51          234    datascience          555                data lovers
51          255    antiquity history    429                roma
51          255    antiquity history    430                greece
52          123    bi lovers            542                bi for marketing
52          123    bi lovers            545                i love bi
52          256    modern history       500                france
52          256    modern history       501                germany
52          200    arts                 400                arts I
52          200    arts                 401                arts II

df2

user_id     code     name       status
51          123    bi lovers    ongoing
51          430    greece       ongoing
52          501    germany      ongoing
52          050    numbers      ongoing

我想通过检查 df2 代码是否与 df1 代码或 df1 code_equivalence 相同并且 df2 名称与 df1 名称或 df1 name_equivalence 相同来合并它们以获得 df2 状态。 像这样:

合并 df

user_id     code     name               code_equivalence    name_equivalence        status
51          123    bi lovers            542                 bi for marketing        ongoing
51          123    bi lovers            545                 i love bi               ongoing
51          234    datascience          345                 data and science        (null)
51          234    datascience          555                 data lovers             (null)
51          255    antiquity history    429                 roma                    (null)
51          255    antiquity history    430                 greece                  ongoing
52          123    bi lovers            542                 bi for marketing        (null)
52          123    bi lovers            545                 i love bi               (null)
52          256    modern history       500                 france                  (null)
52          256    modern history       501                 germany                 ongoing
52          200    arts                 400                 arts I                  (null)
52          200    arts                 401                 arts II                 (null)

之后,我想转换数据以创建一个新的df,如下所示:

最终 df

user_id     code     name               code_equivalence    name_equivalence                    status
51          123    bi lovers            [542, 545]          [bi for marketing, i love bi]       ongoing
51          234    datascience          [345, 555]          [data and science, data lovers]     (null)
51          255    antiquity history    [429, 430]          [roma, greece]                      ongoing
52          123    bi lovers            [542, 545]          [bi for marketing, i love bi]       (null)
52          256    modern history       [500, 501]          [france, germany]                   ongoing
52          200    arts                 [400, 401]          [arts I, arts II]                   (null)

有人可以帮我吗?

2 个答案:

答案 0 :(得分:2)

不确定我的问题是否正确,但从我读到的内容来看,您进行了合并,现在您想要获得 final result?如果是这样,考虑到 merged 是您的合并数据框,这应该可以完成工作。

 >>> merged.groupby(['user_id','code','name']).agg(list).reset_index()
   user_id  code               name code_equivalence                 name_equivalence              status
0       51   123          bi lovers       [542, 545]    [bi for marketing, i love bi]  [ongoing, ongoing]
1       51   234        datascience       [345, 555]  [data and science, data lovers]    [(null), (null)]
2       51   255  antiquity history       [429, 430]                   [roma, greece]   [(null), ongoing]
3       52   123          bi lovers       [542, 545]    [bi for marketing, i love bi]    [(null), (null)]
4       52   200               arts       [400, 401]                [arts I, arts II]       [(null), nan]
5       52   256     modern history       [500, 501]                [france, germany]   [(null), ongoing]

这里是完整的解决方案,如果您只有 df1df2

 >>> (pd
     ...: .merge(df1,df2, left_on=['user_id','code','name'], right_on=['user_id','code','name'], how='left')
     ...: .groupby(['user_id','code','name'])
     ...: .agg(list)
     ...: .reset_index())

   user_id  code               name code_equivalence                 name_equivalence              status
0       51   123          bi lovers       [542, 545]    [bi for marketing, i love bi]  [ongoing, ongoing]
1       51   234        datascience       [345, 555]  [data and science, data lovers]          [nan, nan]
2       51   255  antiquity history       [429, 430]                   [roma, greece]          [nan, nan]
3       52   123          bi lovers       [542, 545]    [bi for marketing, i love bi]          [nan, nan]
4       52   200               arts       [400, 401]                [arts I, arts II]          [nan, nan]
5       52   256     modern history       [500, 501]                [france, germany]          [nan, nan]

答案 1 :(得分:0)

这就是我通过三个步骤获得 merge_df DataFrame 的方法:

  1. 在第一个条件下合并

  2. 在第二个条件下合并

  3. 用步骤 2 中的匹配项填充步骤 1 中缺失的匹配项。

    merge_df = pd.merge(df1, df2[["code","status"]], left_on=["code"], right_on=["code",], how="left")
    merge_df2 = pd.merge(df1, df2[["code","status"]], left_on=["code_equivalence"], right_on=["code",], how="left")
    merge_df["status"].fillna(merge_df2["status"], inplace=True)
    

但是我想知道是否有一种单线可以做到这一点(可能是的)。