我有一个文本(.txt)文件,如下所示:
Person Person Name Person Approval Supervisor Payroll Name Application Supplier Start Date End Date Archived Type Number Status Name Name Agency D'Cunha, Yionue 123456 NOT ENTERED Power, Projects CONTRACT Contractor Mehash SUPPLIER_1 10-DEC-16 16-DEC-16 No Employee Vughila, 132456 WORKING Miro, Company-abcde INPayroll 10-DEC-16 16-DEC-16 No Proshont Profal Monthly 10-DEC-16 16-DEC-16 No Employee Diiri, Maaor 113456 NOT ENTERED Kargannkir,Company-abcde INPayroll Bivnath Monthly 10-DEC-16 16-DEC-16 No Employee Kimit, Gongobhar111111 WORKING Chondorkor,Company-abcde INProjects 10-DEC-16 16-DEC-16 No Avissku Monthly Employee Kalvornu, 110077 WORKING Kindipur, Company-abcde INPayroll 10-DEC-16 16-DEC-16 No Churali Barinakir Monthly Agency Dhilorii, 100009 NOT ENTERED Nook, Projects CONTRACT ContractorBohishik Lurukont SUPPLIER_2
我从软件生成的报告中获取此文件。我想解析文件并将数据导出为CSV。我尝试了this,但这没有用,因为我的数据结构是如此不同。
然后我尝试了这个:
$input = Get-Content "C:\Users\user.name\Desktop\GBS\text_file.txt"
$data = $input[1..($input.Length - 1)]
$maxLength = 0
$objects = foreach ($record in $data) {
$split = $record -split "\s{2,}|\t+"
if ($split.Length -gt $maxLength) {
$maxLength = $split.Length
}
$props = @{}
for ($i=0; $i -lt $split.Length; $i++) {
$props.Add([String]($i+1), $split[$i])
}
New-Object -TypeName PSObject -Property $props
}
$headers = [String[]](1..$maxLength)
$objects |
Select-Object $headers |
Export-Csv -NoTypeInformation -Path "C:\Users\user.name\Desktop\GBS\out.csv"
但这搞砸了每排的第二行。问题是在原始文本文件中,每隔一行也是第一行的一部分。在某些情况下,甚至第三行也是第一行数据的一部分。
如果有任何可以提供的信息来更好地表达我的问题,请告诉我。
在@Assgar的评论之后,我尝试了这个:
# read text file into single string and remove header
$rawText = Get-Content 'C:\path\to\input.txt' | Out-String
# split string into individual records
$data = $rawText -replace "`r" -split '\n\n+' | Select-Object -Skip 1
$parsedData = foreach ($record in $data) {
$prop = @{}
$record -split '\n' | ForEach-Object {
$prop['PersonType'] += $_.Substring(0, 10).Trim()
$prop['PersonName'] += $_.Substring(10, 16).Trim()
$prop['PersonNumber'] += $_.Substring(26, 9).Trim()
$prop['ApprovalStatus'] += $_.Substring(35, 13).Trim()
$prop['Supervisor'] += $_.Substring(48, 11).Trim()
$prop['PayrollName'] += $_.Substring(59, 16).Trim()
$prop['ApplicationName'] += $_.Substring(75, 13).Trim()
$prop['Supplier'] += $_.Substring(88, 9).Trim()
$prop['StartDate'] += $_.Substring(97, 12).Trim()
$prop['EndDate'] += $_.Substring(109, 9).Trim()
$prop['Archived'] += $_.Substring(118, 8).Trim()
}
New-Object -Type PSObject -Property $prev
}
$parsedData | Export-Csv 'C:\path\to\output.txt' -NoType
但是现在我在目标文件夹中得到一个空白输出CSV文件。我在某个地方遗失了什么吗?
答案 0 :(得分:0)
我有一个解决方案,但是......
它使用两个拆分,第一个采用单词(Person | Agency | Employee)
分裂记录(有缺陷需要if),
第二个在换行符时拆分,然后解析偏移量+长度
由于样本数据不一致,这也不完美。
$InFile = 'Q:\Test\2016-12\19\41225200.txt'
$OutFile= 'C:\path\to\output.txt'
$Delimiter = '(Person|Agency|Employee)'
#'$Escaped = [regex]::Escape($Delimiter)
$Split = "(?!^)(?=$Delimiter)"
$parsedData = (Get-Content $InFile -Raw) -split $Split |
ForEach-Object {
$prop = @{}
If ($_.Length -ge 30 ) {
ForEach ($Line in $_.split("`n")) {
$Line+=" "*130
$prop['PersonType'] += $Line.Substring( 0, 10).Trim()
$prop['PersonName'] += $Line.Substring(10, 16).Trim()
$prop['PersonNumber'] += $Line.Substring(26, 9).Trim()
$prop['ApprovalStatus'] += $Line.Substring(35, 13).Trim()
$prop['Supervisor'] += $Line.Substring(48, 11).Trim()
$prop['PayrollName'] += $Line.Substring(59, 16).Trim()
$prop['ApplicationName'] += $Line.Substring(75, 12).Trim()
$prop['Supplier'] += $Line.Substring(87, 10).Trim()
$prop['StartDate'] += $Line.Substring(97, 9).Trim()
$prop['EndDate'] += $Line.Substring(108, 9).Trim()
$prop['Archived'] += $Line.Substring(117, 8).Trim()
}
}
New-Object -TypeName PSObject -Property $prop
}
$parsedData
输出
Supervisor : ApplicatioName
ApplicationName : t Date End DName
Archived :
PersonType : Person AType
PersonName : pproval Supe
Supplier : ate Archiv
StartDate : ed
ApprovalStatus : yroll NameStatus
PayrollName : n Supplier Star
PersonNumber : rvisor PaNumber
EndDate :
Supervisor : Power,Mehash
ApplicationName : Projects
Archived : No
PersonType : AgencyContractor
PersonName : D'Cunha, Yionue
Supplier : CONTRACTSUPPLIER_1
StartDate : 10-DEC-16
ApprovalStatus : NOT ENTERED
PayrollName :
PersonNumber : 123456
EndDate : 16-DEC-16
Supervisor : Miro,Profal
ApplicationName : Payroll
Archived : NoNo
PersonType : Employee
PersonName : Vughila,Proshont
Supplier :
StartDate : 10-DEC-1610-DEC-16
ApprovalStatus : WORKING
PayrollName : Company-abcde INMonthly
PersonNumber : 132456
EndDate : 16-DEC-1616-DEC-16
我尝试export-csv也是空的。