从一个Excel获取数据并使用Pandas中的相同格式写入另一个Excel

时间:2018-05-04 10:38:03

标签: python excel pandas

我试图在python中使用pandas读取两个Excel文件。从一个获取数据然后在一个条件下将数据写入另一个。让我们调用文件 sourcefile1.xlsx sourcefile2.xlsx

以下是Excel文件的内容:

  1. sourcefile1.xlsx 结对和字段结果是合并的主标题。 cable_type,cable_name,cable_pair,caller_id,result是表示每列的辅助标头。

    -----------------Pair---------------   -----Field Result-----
    cable_type   cable_name   cable_pair   caller_id   result
    primary      2            103          n/a         not match
    primary      1            33           22222222    match
    primary      5            342          22222222    match 
    secondary    2            12           n/a         not match
    secondary    4            144          44444444    match
    
  2. sourcefile2.xlsx

    -blank-     -----Secondary Pairs----    ------Primary Pairs------
    caller_id   caller_id  adsl  result       caller_id   adsl   result
    11111111               4/144                          2/103  
    22222222               2/12                           4/144 
    44444444               7/55                           4/144
    NULL                   8/123                          1/11
    NULL                   NULL                           2/22
    
  3. 预期输出将根据此伪代码写入 sourcefile2

    if caller_id(sourcefile1) != 'N/A':
        if cable_type(sourcefile1) = 'primary':
            if caller_id(sourcefile1) = caller_id(sourcefile2) - primary pairs:
                write caller_id(sourcefile1) to caller_id(sourcefile2) - primary pairs
                write result(sourcefile1) to result(sourcefile2) - primary pairs
    
    elif caller_id(sourcefile1) != 'N/A':
        if cable_type(sourcefile1) = 'secondary':
            if caller_id(sourcefile1) = caller_id(sourcefile2) - secondary pairs:
                write caller_id(sourcefile1) to caller_id(sourcefile2) - secondary pairs
                write result(sourcefile1) to result(sourcefile2) - secondary pairs
    
    elif caller_id(sourcefile1) = 'N/A':
        if cable_type(sourcefile1) = 'primary':
            if cable_name + cable_pair(sourcefile1) = adsl(sourcefile2) - primary pairs:
                write caller_id(sourcefile1) to caller_id(sourcefile2) - primary pairs
                write result(sourcefile1) to result(sourcefile2) - primary pairs
    
    elif caller(sourcefile1) = 'N/A':
        if cable_type(sourcefile1) = 'secondary':
            if cable_name + cable_pair(sourcefile1) = adsl(sourcefile2) - secondary pairs:
                write caller_id(sourcefile1) to caller_id(sourcefile2) - secondary pairs
                write result(sourcefile1) to result(sourcefile2) - secondary pairs
    

    这是我想要达到的输出。

    -blank-     -----Secondary Pairs----    ------Primary Pairs------
    caller_id   caller_id  adsl   result       caller_id   adsl   result
    11111111               4/144              n/a         2/103  not match
    22222222    n/a        2/12   not match    22222222    4/144  match
    44444444    44444444   7/55   match                    4/144
    NULL                   8/123                          1/11
    NULL                   NULL                           2/22
    

    我正在尝试将来自 sourcefile1 的caller_id与 sourcefile2 相匹配,并将其写入主要对次要对基于 cable_type 。如果 caller_id 不适用,那么我需要匹配的是 adsl 结果是给定的数据,我只需要使用 caller_id adsl 获取同一行中的所有内容。

    到目前为止,我能够匹配 caller_id ,但我重新创建了 sourcefile1 sourcefile2 并删除了主标头。这是我的代码:

    import pandas as pd
    
    df1 = pd.read_excel('sourcefile2.xlsx')
    df2 = pd.read_excel('sourcefile1.xlsx', 'v0.02')
    
    forPrimary1 = df1.columns[40]
    forSecondary1 = df1.columns[23]
    ComparisonResult = df2.columns[22]
    
    forAdsl = df1.columns[39]
    CallerID = df2.columns[13]
    forPrimary = df1.columns[37]
    forSecondary = df1.columns[16]
    
    df3 = pd.read_excel('PrimarySecondary.xlsx')
    df4 = pd.read_excel('adslFile.xlsx')
    df5 = pd.read_excel('PrimarySecondary2.xlsx')
    
    # df1['svc_no'] = df1['svc_no']
    df2['Adsl'] = df2[['cable_name', 'pair']].apply(lambda x: '/'.join(x.astype(str)), axis=1)
    newPrim = df2[[caller_id, 'result', 'Adsl']] [(df2['cable_type'] == 'Primary')]
    newSec = df2[[caller_id, 'result']] [(df2['cable_type'] == 'Secondary')]
    newPrim.to_excel('newPrimary.xlsx')
    newSec.to_excel('newSecondary.xlsx')
    
    frame = pd.read_excel('newPrimary.xlsx')
    frame1 = pd.read_excel('newSecondary.xlsx')
    
    df1['b_line_stat'] = df1['b_line_stat'].fillna('NULL')
    df1['DP_e_pr'] = df1['DP_e_pr'].fillna('NULL')
    df1['DP_e_st'] = df1['DP_e_st'].fillna('NULL')
    df1['DP'] = df1['DP'].fillna('NULL')
    df1['CAB_d_st'] = df1['CAB_d_st'].fillna('NULL')
    df1['CAB_d_pr'] = df1['CAB_d_pr'].fillna('NULL')
    df1['port_status'] = df1['port_status'].fillna('NULL')
    
    name1 = df1.columns[17]
    name2 = df1.columns[18]
    name3 = df1.columns[19]
    name4 = df1.columns[20]
    name5 = df1.columns[21]
    name6 = df1.columns[22]
    name7 = df1.columns[38]
    
    df1[name1] = df1['b_line_stat']
    df1[name2] = df1['CAB_d_st']
    df1[name3] = df1['CAB_d_pr']
    df1[name4] = df1['DP_e_st']
    df1[name5] = df1['DP_e_pr']
    df1[name6] = df1['DP']
    df1[name7] = df1['port_status']
    
    frame = frame[frame['caller_id'].isin(df1['caller_id'])]
    df1[forPrimary1] = frame['result']
    
    frame1 = frame1[frame1['caller_id'].isin(df1['caller_id'])]
    df1[forSecondary1] = frame1['result']
    
    df1[df1['caller_id'].isin(df3['Primary'])]
    df1[forPrimary] = df1['caller_id'].fillna('n/a')
    
    df1[df1['adsl'].isin(df2['Adsl'])]
    df1[forAdsl] = df1['adsl'].fillna('NULL')
    
    df1[df1['caller_id'].isin(df3['Secondary'])]
    df1[forSecondary] = df1['caller_id'].fillna('n/a')
    
    df1['caller_id'] = df1['caller_id'].fillna('NULL')
    df1['adsl'] = df1['adsl'].fillna('NULL')
    
    df1.to_excel('dp_util_ANT715-M.xlsx', index=False)
    
    writer = pd.ExcelWriter('dp_util_ANT715-M.xlsx', engine='xlsxwriter')
    df1.to_excel(writer, sheet_name='Sheet1')
    

    编辑:我在我的脚本中使用的变量来匹配我的问题。

0 个答案:

没有答案