使用Python

时间:2019-02-20 15:15:01

标签: python excel

我希望使用Python重塑Excel工作表中的数据。这就是我的数据的样子

AuditDate      Fields                     ModifiedBy
1/1/2019 7:58  Status: Assigned  (0)                
               Site Group: XXX                      
               Region: xxx                          
               Site: xxxxx                          
               Summary: xxxx                        
               Location Company: xxx                
               Support Organization: XXXX           
               Support Group Name: xxxxx            
               Last Name: xxxx                      
               First Name: xxxx                     
               Categorization Tier 1:               
               Categorization Tier 2:               
               Categorization Tier 3:               
               Company: xxxx                        
               Priority: xxx                        
               Work Order Type: xxx                 
               Company3: xxxxx                      
               Request Manager:                     
               Product Cat Tier 1(2):               
               Product Cat Tier 2 (2):              
               Product Cat Tier 3 (2):              
               ASORG: IT Shoreside                  
               ASCPY: xxxx                          
               ASGRP: xxx                           
               Request Assignee:                    
               Status History: XXXX       XXXX           
1/1/2019 8:31  Request Assignee: XXXX     XXXX      
1/1/2019 15:02 Status: Pending  (1)       XXXX      
1/3/2019 13:00 Status: Completed  (5)     XXXX      
1/9/2019 2:46  Status: Closed  (8)        XXXX      

因此,如果您在第一行上方看到多行,则将冒号(:)之前的数据转换为多列。

在这里,来自FieldsChanged我只关心要转换为列的状态,优先级,请求受让人和ASGRP。输出结果看起来像这样

AuditDate       Status     Priority RequestAssignee ASGRP ModifiedBy
1/1/2019 7:58   Assigned   XX       XXX             XXX   XXXX
1/1/2019 8:31                       XXXX                  XXXX
1/1/2019 15:02  Pending                                   XXXX
1/3/2019 13:00  Completed                                 XXXX
1/9/2019 2:46   Closed                                    XXXX

相同的数据也可以出现在其他行中。重塑数据后,这就是excel的外观。

如果有人可以提供帮助,我将不胜感激

2 个答案:

答案 0 :(得分:0)

我建议使用熊猫库。这遵循直观的表格样式格式(类似于excel)

import pandas as pd
pd.read_excel('tmp.xlsx', index_col=0) 

然后,您可以根据需要过滤并调整读取的数据框(表)的形状,或删除带有na的行(即使用审核日期列)。

答案 1 :(得分:0)

我将假定工作表已转换为csv文件。因此,您可以使用csv模块首先解析行,然后解析Fields字段。您可以直接使用相同的csv模块直接构建结果csv文件。

假设输入的csv文件是(请注意多行字段周围的引号):

AuditDate,Fields,ModifiedBy
1/1/2019 7:58,"Status: Assigned (0)
Site Group: XXX
Region: xxx
Site: xxxxx
Summary: xxxx
Location Company: xxx
Support Organization: XXXX
Support Group Name: xxxxx
Last Name: xxxx
First Name: xxxx
Categorization Tier 1:
Categorization Tier 2:
Categorization Tier 3:
Company: xxxx
Priority: xxx
Work Order Type: xxx
Company3: xxxxx
Request Manager:
Product Cat Tier 1(2):
Product Cat Tier 2 (2):
Product Cat Tier 3 (2):
ASORG: IT Shoreside
ASCPY: xxxx
ASGRP: xxx
Request Assignee:
Status History: XXXX",XXXX
1/1/2019 8:31,Request Assignee: XXXX,XXXX
1/1/2019 15:02,Status: Pending (1),XXXX
1/3/2019 13:00,Status: Completed (5),XXXX
1/9/2019 2:46,Status: Closed (8),XXXX

您可以通过这种方式轻松处理它:

with open('input.csv', newline='') as fd, open('output.csv', 'w', newline='') as fdout:
    rd = csv.DictReader(fd)       # directly use a DictReader for reading
    # declare a DictWriter for the required fields ignoring any additional field (extrasaction)
    wr = csv.DictWriter(fdout, ['AuditDate', 'Status', 'Priority', 'Request Assignee',
                                'ASGRP', 'ModifiedBy'], extrasaction='ignore')
    wr.writeheader()               # write the headers
    for row in rd:
        with io.StringIO(row['Fields']) as ffd:     # process Fields
            frd = csv.reader(ffd,delimiter=':', skipinitialspace=True)
            row.update(dict(frd))  # update the row dictionary with the "sub-fields"
        _ = wr.writerow(row)       # and directly use that

您应该得到预期的结果:

AuditDate,Status,Priority,Request Assignee,ASGRP,ModifiedBy
1/1/2019 7:58,Assigned (0),xxx,,xxx,XXXX
1/1/2019 8:31,,,XXXX,,XXXX
1/1/2019 15:02,Pending (1),,,,XXXX
1/3/2019 13:00,Completed (5),,,,XXXX
1/9/2019 2:46,Closed (8),,,,XXXX