我希望使用Python重塑Excel工作表中的数据。这就是我的数据的样子
AuditDate Fields ModifiedBy
1/1/2019 7:58 Status: Assigned (0)
Site Group: XXX
Region: xxx
Site: xxxxx
Summary: xxxx
Location Company: xxx
Support Organization: XXXX
Support Group Name: xxxxx
Last Name: xxxx
First Name: xxxx
Categorization Tier 1:
Categorization Tier 2:
Categorization Tier 3:
Company: xxxx
Priority: xxx
Work Order Type: xxx
Company3: xxxxx
Request Manager:
Product Cat Tier 1(2):
Product Cat Tier 2 (2):
Product Cat Tier 3 (2):
ASORG: IT Shoreside
ASCPY: xxxx
ASGRP: xxx
Request Assignee:
Status History: XXXX XXXX
1/1/2019 8:31 Request Assignee: XXXX XXXX
1/1/2019 15:02 Status: Pending (1) XXXX
1/3/2019 13:00 Status: Completed (5) XXXX
1/9/2019 2:46 Status: Closed (8) XXXX
因此,如果您在第一行上方看到多行,则将冒号(:)之前的数据转换为多列。
在这里,来自FieldsChanged我只关心要转换为列的状态,优先级,请求受让人和ASGRP。输出结果看起来像这样
AuditDate Status Priority RequestAssignee ASGRP ModifiedBy
1/1/2019 7:58 Assigned XX XXX XXX XXXX
1/1/2019 8:31 XXXX XXXX
1/1/2019 15:02 Pending XXXX
1/3/2019 13:00 Completed XXXX
1/9/2019 2:46 Closed XXXX
相同的数据也可以出现在其他行中。重塑数据后,这就是excel的外观。
如果有人可以提供帮助,我将不胜感激
答案 0 :(得分:0)
我建议使用熊猫库。这遵循直观的表格样式格式(类似于excel)
import pandas as pd
pd.read_excel('tmp.xlsx', index_col=0)
然后,您可以根据需要过滤并调整读取的数据框(表)的形状,或删除带有na的行(即使用审核日期列)。
答案 1 :(得分:0)
我将假定工作表已转换为csv文件。因此,您可以使用csv模块首先解析行,然后解析Fields
字段。您可以直接使用相同的csv模块直接构建结果csv文件。
假设输入的csv文件是(请注意多行字段周围的引号):
AuditDate,Fields,ModifiedBy
1/1/2019 7:58,"Status: Assigned (0)
Site Group: XXX
Region: xxx
Site: xxxxx
Summary: xxxx
Location Company: xxx
Support Organization: XXXX
Support Group Name: xxxxx
Last Name: xxxx
First Name: xxxx
Categorization Tier 1:
Categorization Tier 2:
Categorization Tier 3:
Company: xxxx
Priority: xxx
Work Order Type: xxx
Company3: xxxxx
Request Manager:
Product Cat Tier 1(2):
Product Cat Tier 2 (2):
Product Cat Tier 3 (2):
ASORG: IT Shoreside
ASCPY: xxxx
ASGRP: xxx
Request Assignee:
Status History: XXXX",XXXX
1/1/2019 8:31,Request Assignee: XXXX,XXXX
1/1/2019 15:02,Status: Pending (1),XXXX
1/3/2019 13:00,Status: Completed (5),XXXX
1/9/2019 2:46,Status: Closed (8),XXXX
您可以通过这种方式轻松处理它:
with open('input.csv', newline='') as fd, open('output.csv', 'w', newline='') as fdout:
rd = csv.DictReader(fd) # directly use a DictReader for reading
# declare a DictWriter for the required fields ignoring any additional field (extrasaction)
wr = csv.DictWriter(fdout, ['AuditDate', 'Status', 'Priority', 'Request Assignee',
'ASGRP', 'ModifiedBy'], extrasaction='ignore')
wr.writeheader() # write the headers
for row in rd:
with io.StringIO(row['Fields']) as ffd: # process Fields
frd = csv.reader(ffd,delimiter=':', skipinitialspace=True)
row.update(dict(frd)) # update the row dictionary with the "sub-fields"
_ = wr.writerow(row) # and directly use that
您应该得到预期的结果:
AuditDate,Status,Priority,Request Assignee,ASGRP,ModifiedBy
1/1/2019 7:58,Assigned (0),xxx,,xxx,XXXX
1/1/2019 8:31,,,XXXX,,XXXX
1/1/2019 15:02,Pending (1),,,,XXXX
1/3/2019 13:00,Completed (5),,,,XXXX
1/9/2019 2:46,Closed (8),,,,XXXX