在文本文件中组织分类数据并转换为CSV的最快方法

时间:2018-12-29 06:56:08

标签: powershell csv

我有一个包含数百行的文本文件。数据字段和值之间用冒号和一个空行分隔每个数据集。看起来像这样...

icon:rain
temperatureHigh:55.37
temperatureLow:42.55
humidity:0.97
windSpeed:6.7
precipType:rain
precipProbability:0.97

icon:partly-cloudy-day
temperatureHigh:34.75
temperatureLow:27.1
humidity:0.8
windSpeed:15.32
precipType:snow
precipProbability:0.29

icon:clear-day
temperatureHigh:47
temperatureLow:31.72
humidity:0.64
windSpeed:9.27
precipType:rain
precipProbability:0.01

我正在努力将其格式化为具有所需输出的CSV格式...

"icon","temperatureHigh","temperatureLow","humidity","windSpeed","precipType","precipProbability"
"rain","55.37","42.55","0.97","6.7","rain","0.97"
"partly-cloudy-day","34.75","27.1","0.8","15.32","snow","0.29"
"clear-day","47","31.72","0.64","9.27","rain","0.01"
...and so on, and so forth. 

我一直在尝试使用Get-Content和replace,但是可以使用Import-CsvConvertTo-Csv吗?

5 个答案:

答案 0 :(得分:2)

尝试一下:

$CurrentElement=[pscustomobject]@{}

#get all rows and add element list when row empty is founded
Get-Content "c:\temp\test.txt" | %{

    if ($_ -eq "")
    {
        $CurrentElement
        $CurrentElement=[pscustomobject]@{}
    }
    else
    {
       $Row=$_.split(':')
       Add-Member -InputObject $CurrentElement -MemberType NoteProperty -Name $Row[0] -Value $Row[1]
    }

}  | export-csv "c:\temp\result.csv" -notype

$CurrentElement  | export-csv "c:\temp\result.csv" -notype -Append

答案 1 :(得分:2)

最简单的方法是在两个连续的换行符之间分割数据,并通过ConvertFrom-StringData将数据块转换为哈希表(您还必须将:替换为=才能起作用) 。然后可以将哈希表转换为自定义对象,并导出为CSV。

$data = Get-Content 'C:\path\to\input.txt' -Raw

$data -replace ':', '=' -split '\r?\n\r?\n' | ForEach-Object {
    [PSCustomObject]($_ | ConvertFrom-StringData)
} | Export-Csv 'C:\path\to\output.csv' -NoType

请注意,以上要求使用PowerShell v3或更高版本。对于旧版PowerShell,您需要按以下方式调整代码:

$data = Get-Content 'C:\path\to\input.txt' | Out-String

$data -replace ':', '=' -split '\r?\n\r?\n' | ForEach-Object {
    $prop = $_ | ConvertFrom-StringData
    New-Object -Type PSObject -Property $prop
} | Export-Csv 'C:\path\to\output.csv' -NoType

如果您希望CSV字段按特定顺序排列,可以在Select-ObjectForEach-Object之间放置Export-Csv

... | ForEach-Object {
    ...
} | Select-Object icon, temperatureHigh, ... | Export-Csv ...

Import-Csv期望将输入数据组织为每行一个数据集。它不能用于像输入数据一样的key:value对块。

ConvertTo-Csv需要与上面的示例代码中的Export-Csv相同的准备。唯一的区别是输出未写入文件。

答案 2 :(得分:0)

regex是必经之路:

$data = @'
icon:rain
temperatureHigh:55.37
temperatureLow:42.55
humidity:0.97
windSpeed:6.7
precipType:rain
precipProbability:0.97

icon:partly-cloudy-day
temperatureHigh:34.75
temperatureLow:27.1
humidity:0.8
windSpeed:15.32
precipType:snow
precipProbability:0.29

icon:clear-day
temperatureHigh:47
temperatureLow:31.72
humidity:0.64
windSpeed:9.27
precipType:rain
precipProbability:0.01

'@

$head = $data
$head = $head -replace '([^\s]+):([^\s]+)', '"$1",'
$head = $head -replace '\n\n', '::'
$head = $head -replace '\n', ''
$head = $head -replace '(.*?)::.*', '$1'
$head = $head -replace ',\s*$', ''
$head

$rows = $data
$rows = $rows -replace '([^\s]+):([^\s]+)', '"$2",'
$rows = $rows -replace '\n\n', '::'
$rows = $rows -replace '\n', ''
$rows = $rows + "::"
$rows = $rows -replace '::', "`n"
$rows = $rows -replace ',\s*\n', "`n"
$rows

输出:

"icon","temperatureHigh","temperatureLow","humidity","windSpeed","precipType","precipProbability"
"rain","55.37","42.55","0.97","6.7","rain","0.97"
"partly-cloudy-day","34.75","27.1","0.8","15.32","snow","0.29"
"clear-day","47","31.72","0.64","9.27","rain","0.01"

答案 3 :(得分:0)

这是通过简单的正则表达式模式和字符串运算符的组合来完成此工作的另一种方法。

$InStuff = @'
column1:value1
column2:value2
column3:value3
column4:value4
column5:value5

column1:value6
column2:value7
column3:value8 
column4:value9
column5:value10

column1:value11 
column2:value12
column3:value13 
column4:value14
column5:value15
'@


$SplitInStuff = $InStuff -split ([environment]::NewLine * 2)

$HeaderLine = ($SplitInStuff[0] -replace '(?m):.+$').Split([environment]::NewLine) -join ', '

$CSV_Text = [System.Collections.Generic.List[string]]::new()
$CSV_Text.Add($HeaderLine)

foreach ($SIS_Item in $SplitInStuff)
    {
    $CSV_Text.Add(($SIS_Item  -replace '(?m)^.+:').Split([environment]::NewLine).Where({$_}) -join ', ')
    }

$Results = $CSV_Text |
    ConvertFrom-Csv

# on screen
$Results |
    Format-Table

# to CSV
$Results |
    Export-Csv -LiteralPath "$env:TEMP\JohnnyCarino_ReformatedData.csv" -NoTypeInformation

输出...

column1  column2 column3  column4 column5
-------  ------- -------  ------- -------
value1   value2  value3   value4  value5 
value6   value7  value8   value9  value10
value11  value12 value13  value14 value15

CSV文件内容...

"column1","column2","column3","column4","column5"
"value1","value2","value3","value4","value5"
"value6","value7","value8 ","value9","value10"
"value11 ","value12","value13 ","value14","value15"

答案 4 :(得分:0)

一种通过简单的代码(希望可以清除代码)完成所需操作的方法。我没有使用复杂的PS对象,方法或函数,因此它既清晰又简单。输入应该在名为in1.txt的文本文件中。我假设每个日期集最多有7行(在遇到空格或文件结尾之前)。我没有使它通用,也没有包括错误检查等。不用说,还有许多其他方法可以做到这一点。如果您有任何意见,请通知我。

#======================
# Function used by code
#======================

Function func-PrintSet
{

 $s1=''
 $del= ','
 $q='"'
 foreach ($element in $arr1) {
     $s1=$s1+$q+$element+$q + $del 
 }
 $s1

 $s1=""
 foreach ($element in $arr2) {
     $s1=$s1+$q+$element+$q +  $del 
 }
 $s1

}

#=====================
# Main code
#=====================

# simple initialization of arrays.

$arr1=0,0,0,0,0,0,0
$arr2=0,0,0,0,0,0,0
$i=-1
$reader = [System.IO.File]::OpenText("in1.txt")
while ($null -ne ($line = $reader.ReadLine())) 
{
    IF ($line)
    {

         $items = $line.split(':')
         $i=$i+1
         $arr1[$i]= $items[0]
         $arr2[$i]= $items[1]
    }
    ELSE
    {

        func-PrintSet   
        $i=-1
    }
}
func-PrintSet

"Done :)"

# Code end