我有一个数据集,我试图将其标准化为PsCustomObject
。我一直在尝试使用ConvertFrom-String
的机器学习模板功能,但取得了部分成功。一个问题是我能找到的所有例子都有相同结构的数据集。我的情况并非都一样。
我确定一个wiz可以直接从原始数据中做到这一点,但是我已经稍微操纵它以到达我所在的位置。
IDE00001-ENG99061-Production mode-Access control
IDE00001-ENG115730-Production mode-Aussenbeleuchtung
IDE00001-ENG112304-Production mode-Heckwischer
IDE00001-ENG98647-Production mode-Interior lighting
IDE00001-ENG115729-Production mode-Scheinwerferreinigung
IDE00001-ENG115731-Production mode-Virtuel_pedal
IDE00002-Transport mode
IDE00820-Activating and deactivating all development messages
IDE01550-Service position
IDE02152-Characteristics in production mode
IDE02269-MAS04382-Acknowledgement signals-Optical feedback during locking
IDE02332-Deactivate production mode
IDE02488-DWA Interior monitoring
IDE02711-ENG116690-Rear Window Wiper-Automatisches Heckwischen
使用以下脚本:
$lines = $testText.Split("`n") #$testText is the above data wrapped in a here-string
$NewLines = @()
foreach($line in $lines)
{
[regex]$regex = '-'
$HyphenCount = $regex.Matches($line).count
#$HyphenCount
switch ($HyphenCount)
{
1{
$newLines += $line -replace "-",","
}
2{
$split = $line.Split("-",2)
$newlines += $split -join ","
}
3{
if($line.Contains("mode-"))
{
#$line
$split = $line.Split("-",4)
$newlines += $split -join ","
}
else
{
$split = $line.Split("-",3)
$newlines += $split -join ","
}
}
4{
$split = $line.Split("-",3) #this assumes the fourth hyphen is part of description
$newlines += $split -join ","
}
5{
$split = $line.Split("-",4)
$newlines += $split -join ","
}
}
}
我的原始数据看起来像是:
IDE00001,ENG99061,Production mode,Access control
IDE00001,ENG115730,Production mode,Aussenbeleuchtung
IDE00001,ENG112304,Production mode,Heckwischer
IDE00001,ENG98647,Production mode,Interior lighting
IDE00001,ENG115729,Production mode,Scheinwerferreinigung
IDE00001,ENG115731,Production mode,Virtuel_pedal
IDE00002,Transport mode
IDE00820,Activating and deactivating all development messages
IDE01550,Service position
IDE02152,Characteristics in production mode
IDE02269,MAS04382,Acknowledgement signals-Optical feedback during locking
IDE02332,Deactivate production mode
IDE02488,DWA Interior monitoring
IDE02711,ENG116690,Rear Window Wiper-Automatisches Heckwischen
IDE99999,Test-two hyphens
IDE99999,ENG123456,Test-four-Hyphens
IDE99999,ENG123456,Production mode,test-five-hyphens
通过以下模板传递上述数据让我尽可能接近我的需要,但它仍有一些问题:
$template = @'
{object*:{ide:IDE00001},{code?:ENG99061},{mode?:Production mode},{description?:Access control}}
{object*:{ide:IDE00001},{code?:ENG115730},{mode?:Dev mode},{description?:Aussenbeleuchtung}}
{object*:{ide:IDE00001},{code?:ENG115731},{mode?:Production mode},{description?:Virtuel_pedal}}
{object*:{ide:IDE02711},{code?:ENG116690},{description?:Rear Window Wiper-Automatisches Heckwischen}}
{object*:{ide:IDE00820},{description?:{!mode?:{!code?:Activating and deactivating all development messages}}}}
{object*:{ide:IDE01550},{description?:{!mode?:{!code?:Service position}}}}
{object*:{ide:IDE02488},{description?:{!mode?:{!code?:DWA Interior monitoring}}}}
{object*:{ide:IDE00002},{mode?:Transport mode}}
'@
$testText | ConvertFrom-String -TemplateContent $template -OutVariable out | Out-Null
$out.object
结果如下:
ide code mode description
--- ---- ---- -----------
IDE00001 ENG99061 Production mode Access control
IDE00001 ENG115730 Production mode Aussenbeleuchtung
IDE00001 ENG112304 Production mode Heckwischer
IDE00001 ENG98647 Production mode Interior lighting
IDE00001 ENG115729 Production mode Scheinwerferreinigung
IDE00001 ENG115731 Production mode Virtuel_pedal
IDE00002 Transport mode Transport mode
IDE00820 Activating and deactivating all development messages
IDE01550 Service position
IDE02152 production mode Characteristics in production mode
IDE02269 MAS04382 Acknowledgement signals-Optical feedback during locking
IDE02332 production mode Deactivate production mode
IDE02488 DWA Interior monitoring
IDE02711 ENG116690 Rear Window Wiper-Automatisches Heckwischen
IDE99999 Test-two hyphens
IDE99999 ENG123456 Test-four-Hyphens
IDE00002 Transport mode Transport mode
IDE02152 production mode Characteristics in production mode
IDE02332 production mode Deactivate production mode
Transport mode
不应该在description
列中。production mode
不应位于mode
列中。它以某种方式从description
。我无法解决这个问题。因此,如果有人有任何想法......
答案 0 :(得分:0)
作为替代方案,如果您的输入数据足够系统,您可以使用正则表达式解析它:
$inputText = @"
IDE00001-ENG99061-Production mode-Access control
IDE00001-ENG115730-Production mode-Aussenbeleuchtung
IDE00001-ENG112304-Production mode-Heckwischer
IDE00001-ENG98647-Production mode-Interior lighting
IDE00001-ENG115729-Production mode-Scheinwerferreinigung
IDE00001-ENG115731-Production mode-Virtuel_pedal
IDE00002-Transport mode
IDE00820-Activating and deactivating all development messages
IDE01550-Service position
IDE02152-Characteristics in production mode
IDE02269-MAS04382-Acknowledgement signals-Optical feedback during locking
IDE02332-Deactivate production mode
IDE02488-DWA Interior monitoring
IDE02711-ENG116690-Rear Window Wiper-Automatisches Heckwischen
"@ -split "`n"
$pattern = '^((?<ide>[IDE0-9]+)-)((?<code>[A-Z0-9]+)-)?((?<mode>Production mode|Transport mode)-?)?(?<description>.*?)$'
foreach ($line in $inputText)
{
$isMatch = $line -match $pattern
if (-not $isMatch)
{
Write-Warning "Cannot parse expression: $line"
continue
}
New-Object psobject -Property ([ordered]@{
'Ide' = $Matches.ide
'Code' = $Matches.code
'Mode' = $Matches.mode
'Description' = $Matches.description
})
}
您说您的数据结构不是一样的。也许你的正则表达式需要比上面给出的要复杂得多。或者,如果可以识别可能发生的所有不同结构,则使用不同的正则表达式多次运行解析。