我不是开发人员也不是CSV专家。我能够把这段代码放在一起,它正在完成这项工作。 为了快速概述,我需要处理嵌套在CSV中的一些JSON数据。所以我正在阅读JSON并将其拆分为额外的列,然后我将保存CSV。
现在,我的问题是虽然这很好用,但我现在需要处理一个1.5Gb的CSV文件,我不希望处理需要2天......
因此,如果你们可以帮助我调整我的脚本以便它在合理的时间内运行,我将非常感激:)
$file = Get-Content -Path 'input.csv' | Select-Object -Skip 2 | ConvertFrom-Csv
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_StreamingEndpointName -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_Id -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_AppServicePlanUri -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ImageType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ServiceType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_VMName -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_UsageType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_DatabaseAccount -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_CollectionRid -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ResourceCategory -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_displayName -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_ACCESSED-VIA-INTERNET' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_APP-NAME' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_APP-TYPE' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_APPTYPE -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_CHARGECODE -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_COMMENTS -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_COUNTRY -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_EY-REGION' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_IT-ENV' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_OWNER -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_OWNER-EMAIL' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_SERVICELINE -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_SUB-TYPE' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_TECHCONTACTS -value $null
$count=1
ForEach ($line in $file) {
Write-Output "Processing line: $count"
$count++
try{
if ($line.AdditionalInfo -ne $null -Or $line.Tags -ne $null){
$line.AdditionalInfo_StreamingEndpointName = ($line.AdditionalInfo | ConvertFrom-JSON).StreamingEndpointName
$line.AdditionalInfo_Id = ($line.AdditionalInfo | ConvertFrom-JSON).Id
$line.AdditionalInfo_AppServicePlanUri = ($line.AdditionalInfo | ConvertFrom-JSON).AppServicePlanUri
$line.AdditionalInfo_ImageType = ($line.AdditionalInfo | ConvertFrom-JSON).ImageType
$line.AdditionalInfo_ServiceType = ($line.AdditionalInfo | ConvertFrom-JSON).ServiceType
$line.AdditionalInfo_VMName = ($line.AdditionalInfo | ConvertFrom-JSON).VMName
$line.AdditionalInfo_UsageType = ($line.AdditionalInfo | ConvertFrom-JSON).UsageType
$line.AdditionalInfo_DatabaseAccount = ($line.AdditionalInfo | ConvertFrom-JSON).DatabaseAccount
$line.AdditionalInfo_CollectionRid = ($line.AdditionalInfo | ConvertFrom-JSON).CollectionRid
$line.AdditionalInfo_ResourceCategory = ($line.AdditionalInfo | ConvertFrom-JSON).ResourceCategory
$line.Tags_displayName = ($line.Tags | ConvertFrom-JSON).displayName
$line.'Tags_ACCESSED-VIA-INTERNET' = ($line.Tags | ConvertFrom-JSON).'ACCESSED-VIA-INTERNET'
$line.'Tags_APP-NAME' = ($line.Tags | ConvertFrom-JSON).'APP-NAME'
$line.'Tags_APP-TYPE' = ($line.Tags | ConvertFrom-JSON).'APP-TYPE'
$line.Tags_APPTYPE = ($line.Tags | ConvertFrom-JSON).APPTYPE
$line.Tags_CHARGECODE = ($line.Tags | ConvertFrom-JSON).CHARGECODE
$line.Tags_COMMENTS = ($line.Tags | ConvertFrom-JSON).COMMENTS
$line.Tags_COUNTRY = ($line.Tags | ConvertFrom-JSON).COUNTRY
$line.'Tags_EY-REGION' = ($line.Tags | ConvertFrom-JSON).'EY-REGION'
$line.'Tags_IT-ENV' = ($line.Tags | ConvertFrom-JSON).'IT-ENV'
$line.Tags_OWNER = ($line.Tags | ConvertFrom-JSON).OWNER
$line.'Tags_OWNER-EMAIL' = ($line.Tags | ConvertFrom-JSON).'OWNER-EMAIL'
$line.Tags_SERVICELINE = ($line.Tags | ConvertFrom-JSON).SERVICELINE
$line.'Tags_SUB-TYPE' = ($line.Tags | ConvertFrom-JSON).'SUB-TYPE'
$line.Tags_TECHCONTACTS = ($line.Tags | ConvertFrom-JSON).TECHCONTACTS
}
}
catch {}
}
#write-output $info
$file | Export-Csv 'C:\output.csv' -NoTypeInformation
答案 0 :(得分:0)
Add-Member
表现糟糕。所以将所有内容一遍又一遍地保存到变量中。您可能有更好的运气,只需将所有内容保存在一个管道中并使用Select-Object
计算属性:
Get-Content -Path 'input.csv' |
Select-Object -Skip 2 |
ConvertFrom-Csv |
Select-Object -ErrorAction SilentlyContinue -Property *,
@{n = 'AdditionalInfo_StreamingEndpointName' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).StreamingEndpointName}},
@{n = 'AdditionalInfo_Id' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).Id}},
@{n = 'AdditionalInfo_AppServicePlanUri' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).AppServicePlanUri}},
@{n = 'AdditionalInfo_ImageType' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ImageType}},
@{n = 'AdditionalInfo_ServiceType' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ServiceType}},
@{n = 'AdditionalInfo_VMName' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).VMName}},
@{n = 'AdditionalInfo_UsageType' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).UsageType}},
@{n = 'AdditionalInfo_DatabaseAccount' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).DatabaseAccount}},
@{n = 'AdditionalInfo_CollectionRid' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).CollectionRid}},
@{n = 'AdditionalInfo_ResourceCategory' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ResourceCategory}},
@{n = 'Tags_displayName' ; e = {($_.Tags | ConvertFrom-JSON).displayName}},
@{n = 'Tags_ACCESSED-VIA-INTERNET' ; e = {($_.Tags | ConvertFrom-JSON).'ACCESSED-VIA-INTERNET'}},
@{n = 'Tags_APP-NAME' ; e = {($_.Tags | ConvertFrom-JSON).'APP-NAME'}},
@{n = 'Tags_APP-TYPE' ; e = {($_.Tags | ConvertFrom-JSON).'APP-TYPE'}},
@{n = 'Tags_APPTYPE' ; e = {($_.Tags | ConvertFrom-JSON).APPTYPE}},
@{n = 'Tags_CHARGECODE' ; e = {($_.Tags | ConvertFrom-JSON).CHARGECODE}},
@{n = 'Tags_COMMENTS' ; e = {($_.Tags | ConvertFrom-JSON).COMMENTS}},
@{n = 'Tags_COUNTRY' ; e = {($_.Tags | ConvertFrom-JSON).COUNTRY}},
@{n = 'Tags_EY-REGION' ; e = {($_.Tags | ConvertFrom-JSON).'EY-REGION'}},
@{n = 'Tags_IT-ENV' ; e = {($_.Tags | ConvertFrom-JSON).'IT-ENV'}},
@{n = 'Tags_OWNER' ; e = {($_.Tags | ConvertFrom-JSON).OWNER}},
@{n = 'Tags_OWNER-EMAIL' ; e = {($_.Tags | ConvertFrom-JSON).'OWNER-EMAIL'}},
@{n = 'Tags_SERVICELINE' ; e = {($_.Tags | ConvertFrom-JSON).SERVICELINE}},
@{n = 'Tags_SUB-TYPE' ; e = {($_.Tags | ConvertFrom-JSON).'SUB-TYPE'}},
@{n = 'Tags_TECHCONTACTS' ; e = {($_.Tags | ConvertFrom-JSON).TECHCONTACTS}} |
Export-Csv 'C:\output.csv' -NoTypeInformation
说起来也可能更快:
Get-Content -Path 'input.csv' |
Select-Object -Skip 2 |
ConvertFrom-Csv |
ForEach-Object {
$AdditionalInfo = $_.AdditionalInfo | ConvertFrom-Json;
$Tags = $_.Tags | ConvertFrom-Json;
$_ | Select-Object -Property *,
@{n = 'AdditionalInfo_StreamingEndpointName' ; e = {$AdditionalInfo.StreamingEndpointName}},
@{n = 'AdditionalInfo_Id' ; e = {$AdditionalInfo.Id}},
...
} | Export-Csv ...
这样你每行只会转换一次JSON。
但是,我怀疑要获得不错的表现,你必须使用.Net方法写一些东西。我建议您使用Microsoft.VisualBasic.FileIO.TextFieldParser
逐行解析CSV,并可能JSON.Net使用JsonConvert.DeserializeObject()
反序列化JSON。即便如此,这也不会超级快。 1.5 GB必须是几百万行。您可能最好将整个CSV导入SQL Server 2016+并使用内置JSON解析的查询。