使用Pandas基于另一列的数据将列数据插入另一列

时间:2018-04-27 02:12:01

标签: python excel pandas

我有这个问题将数据放入列中,具体取决于行中的数据为PrimarySecondary

的值

这是我的例子:

这是我source.xlsx我获取数据的地方。

Source      Dummy   Data

Secondary   DUMMY   22134007
Secondary   DUMMY   27543350
Secondary   DUMMY   22128972
Primary     DUMMY   29579399
Secondary   DUMMY   23781175
Primary     DUMMY   1000185771
Primary     DUMMY   22135458
Secondary   DUMMY   022130241
Primary     DUMMY   22137751
Primary     DUMMY   27543359

此处我将Data列中的source,xlsx列数据放到output.xlsx

svc_no      MDF      Primary Data   Secondary Data
1000185771  DUMMY   
22134007    DUMMY       
27543350    DUMMY       
22135458    DUMMY       
22137751    DUMMY       
22128972    DUMMY       
27543359    DUMMY       
29579399    DUMMY       
23781175    DUMMY       

现在我想通过在Data

中查找output.xlsx列的值,将Source列的值放在source.xlsx

像这样:

这应该是FinalOutput.xlsx

的输出
svc_no      MDF      Primary Data   Secondary Data
1000185771  DUMMY    1000185771         
22134007    DUMMY                   22134007
27543350    DUMMY                   27543350
22135458    DUMMY    22135458
22137751    DUMMY                   22137751
22128972    DUMMY                   22128972
27543359    DUMMY                   27543359
29579399    DUMMY   29579399    
23781175    DUMMY                   23781175

Datasource.xlsx的值与svc_no中的output.xlsx相匹配,但应该知道它是否在PrimarySecondary列中。

这就是我所做的。

import pandas as pd

df_1 = pd.read_excel("output.xlsx")
df_2 = pd.read_excel("sourcefile2.xlsx", "v0.02")

df_1 = pd.DataFrame(df_1)
df_2 = pd.DataFrame(df_2)

Primary = df_2['Source'].str.contains('Primary')
Secondary = df_2['Source'].str.contains('Secondary')

df_1 = df_1[df_1['svc_no'].isin(df_2[Primary]['Data'])]
df_1['Primary Data'] = df_1['svc_no']

df_1 = df_1[df_1['svc_no'].isin(df_2[Secondary]['Data'])]
df_1['Secondary Data'] = df_1['svc_no']

df_1.to_excel('FinalOutput.xlsx')

1 个答案:

答案 0 :(得分:1)

使用pivot

df.reset_index().pivot(index='index',columns='Source',values='Data').fillna('')
Out[179]: 
Source      Primary    Secondary
index                           
0                     2.2134e+07
1                    2.75434e+07
2                     2.2129e+07
3       2.95794e+07             
4                    2.37812e+07
5       1.00019e+09             
6       2.21355e+07             
7                    2.21302e+07
8       2.21378e+07             
9       2.75434e+07             

concat之后

df.Data=df.Data.astype(str)
pd.concat([df,df.reset_index().pivot(index='index',columns='Source',values='Data').fillna('')],axis=1)
Out[182]: 
      Source  Dummy        Data     Primary Secondary
0  Secondary  DUMMY    22134007              22134007
1  Secondary  DUMMY    27543350              27543350
2  Secondary  DUMMY    22128972              22128972
3    Primary  DUMMY    29579399    29579399          
4  Secondary  DUMMY    23781175              23781175
5    Primary  DUMMY  1000185771  1000185771          
6    Primary  DUMMY    22135458    22135458          
7  Secondary  DUMMY    22130241              22130241
8    Primary  DUMMY    22137751    22137751          
9    Primary  DUMMY    27543359    27543359