如何在Python中比较两个不同DataFrame的单元格值?

时间:2018-06-06 09:15:08

标签: python pandas dataframe

我有两个DataFrame:

Person_df

Name  Emplid  Country

    0  DK     123    India

    1  JS     456    India

    2  RM     789    China

    3  MS     111    China

    4  SR     222    China

Target_df

Country Category    Target

    0   India   Marketing   Reduce spend by $xy.

    1   India   R&D         Increase spend by $dd.

    2   India   Infra       Reduce spend by $kn.

    3   China   Marketing   Increase spend by $eg.

    4   China   R&D         Increase spend by $cb.

    5   China   Infra       Reduce spend by $mn.

我的目标是根据每个人的国家/地区创建第三个DataFrame,如下所示:

Individual_df

TargetID    Category    Target

    DK12301     Marketing   Reduce spend by $xy.

    DK12302     R&D         Increase spend by $dd.

    DK12303     Infra       Reduce spend by $kn.

    JS45601     Marketing   Reduce spend by $xy.

    JS45602     R&D         Increase spend by $dd.

    JS45603     Infra       Reduce spend by $kn.

    RM78901     Marketing   Increase spend by $eg.

    RM78902     R&D         Increase spend by $cb.

    RM78903     Infra       Reduce spend by $mn.

    MS11101     Marketing   Increase spend by $eg.

    MS11102     R&D         Increase spend by $cb.

    MS11103     Infra       Reduce spend by $mn.

    SR22201     Marketing   Increase spend by $eg.

    SR22202     R&D         Increase spend by $cb.

    SR22203     Infra       Reduce spend by $mn.

基本上我必须从Person_df中取一个人,将他/她的国家与Target_df中提到的国家匹配,然后将每个目标分配给此人(并存储在Individual_df中)。

问题是,我是python的新手,无法真正弄清楚如何进行这种国家比较。

我写了下面的代码:

for index, row in Person_df.iterrows():

     

        for index1, row1 in Goals_df.iterrows():

            If Person_df['country'] == Person_df['country'] : #I know this is incorrect

                data = [] 

                #populate data[] with selected values for one person.

                #append data[] to Individual_df

我在这里需要几点帮助:

1)我如何才能在这里对每个人的国家进行比较。

2)即使我知道如何比较,我写的代码效率也不高,因为我在这里做了大量不必要的迭代。任何指示如何改善这一点?

谢谢!

1 个答案:

答案 0 :(得分:2)

试试这个,

Write-Output "Begin a lengthy process..."
$i = 0
while ($i -le 100)
{
  Start-Sleep 1
  Write-Output "Inner code executed"
  $i += 10
}
Write-Output "Completed."

输出:

Individual_df = pd.merge(Person_df, Target_df2, on=['Country'], how='left')
Individual_df['TargetID'] = Individual_df['Name'] + df3['Emplid'].astype(str) + ((df3.groupby('Emplid').cumcount() + 1).astype(str).str.zfill(2))
Individual_df = Individual_df[['TargetID', 'Category', 'Target']]
print Individual_df

说明:

  1. 使用Person_df和Target_df执行左连接
  2. 然后根据名称和员工ID以及cumcount为emp id
  3. 创建TargetID
  4. 提取所需的列
  5. 当用户请求通过for循环获取行时:

       TargetID   Category                  Target
    0   DK12301  Marketing    Reduce spend by $xy.
    1   DK12302        R&D  Increase spend by $dd.
    2   DK12303      Infra    Reduce spend by $kn.
    3   JS45601  Marketing    Reduce spend by $xy.
    4   JS45602        R&D  Increase spend by $dd.
    5   JS45603      Infra    Reduce spend by $kn.
    6   RM78901  Marketing  Increase spend by $eg.
    7   RM78902        R&D  Increase spend by $cb.
    8   RM78903      Infra    Reduce spend by $mn.
    9   MS11101  Marketing  Increase spend by $eg.
    10  MS11102        R&D  Increase spend by $cb.
    11  MS11103      Infra    Reduce spend by $mn.
    12  SR22201  Marketing  Increase spend by $eg.
    13  SR22202        R&D  Increase spend by $cb.
    14  SR22203      Infra    Reduce spend by $mn.
    

    说明:

    1. 找到Person_df的唯一元素

    2. 迭代Individual_df至for循环

    3. 检查国家/地区是否存在独特元素(国家/地区) 如果存在,则执行所需的操作。