应用错误收集

早上好，

我试图转换一些数据，如下所示。

---------------------Page 1---------------------

Class Sessions Detail Report

Course Number: CRS0001290                       Trainer:                                      Location:
Course Version: 1                               Begin:    1/1/2017 12:59 PM                   Capacity:     250
Document Version:                               End:      1/1/2017 12:59 PM                   Total Enrolled:    225

lastname, 1st name             PSN0001004                                Academy                                  Enrolled

lastname, 1st name                  PSN0001005                                Academy                                  Enrolled


Page        1/83                                                                                              Wednesday, April 26, 2017
---------------------Page 2---------------------

Class Sessions Detail Report

Course Number: CRS0001290                        Trainer:                                       Location:
Course Version: 1                                Begin:     1/1/2017 12:59 PM                   Capacity:     250
Document Version:                                End:       1/1/2017 12:59 PM                   Total Enrolled:    225

在编号225之后，列出了另一个受训人员名单。这反复重复。

理想情况下，我希望格式按列COURSE，NAME，ID和STATUS分解。部门是不必要的我有一点Visual Basic经验，所以这可能是尝试这个的最佳语言。

最后，结果如下：

（打开链接到.csv）https://drive.google.com/file/d/0Bzvy0h4-5229ZFY5Qk5BRm1WX1E/view?usp=sharing

-Al

for each $line in the file: if $line is blank or $line starts with "---------------------Page" or $line starts with "Class Sessions Detail Report" or $line starts with "Page " then: # ignore that line else if $line starts with "Course Number: " then: $course = the string of non-blank characters following "Course Number: " else if $line starts with "Course Version:" then: $start = the string of characters after "Begin:" else if $line starts with "Document Version:" then: $end = the string of characters after "End:" else: # It's a line that has information about a trainee Split $line into $fields. # e.g., if the fields are tab-delimited, then split on tab characters # Extract the fields you're interested in: $name = $fields[1] $id = $fields[2] $status = $fields[4] # And then output the fields you want: print $course, $name, $id, $start, $end, $status end if end for

将非结构化数据转换为结构化数据

1 个答案: