从文件

时间:2016-11-06 19:55:09

标签: windows powershell csv parsing

我有一个逗号分隔的数据文件,但没有新的行将标题字段与数据字段分开,并且无法更改。此外,即使在标题部分之后也没有任何新的行,例如CR / LF,我看到的唯一一致性是分隔符字符。数据基本上是同一行上的一个大字符串,只有逗号分隔符分隔字段。

样本标题数据

"success":true,"dev":"id":999999999,"name":"device name","tags":"id":99999,"name":"devicesname","dataType":"Int","description":"my description","alarmHint":"","value":0.0,"quality":"good","deviceTagId":99,

带标题和数据的示例数据

"success":true,"dev":"id":999999999,"name":"device name","tags":"id":99999,"name":"devicesname","dataType":"Int","description":"my description","alarmHint":"","value":0.0,"quality":"good","deviceTagId":99,"history":"date":"2016-11-05T21:15:47Z","value":0.0,"date":"2016-11-05T21:15:48Z","value":1.0,"date":"2016-11-05T21:15:50Z","value":0.0,"date":"2016-11-05T21:15:53Z","value":0.0,"date":"2016-11-05T21:15:57Z","value":0.0,"date":"2016-11-05T21:16:00Z","value":1.0,"date":"2016-11-05T21:16:02Z","value":1.0,"date":"2016-11-05T21:16:04Z","value":1.0,"date":"2016-11-05T21:16:07Z"1.0

不知何故,我必须获取这些数据并解析出整个标题部分,例如在第11个逗号之前删除所有内容,然后我需要接受其余的并解析出来,只保留“值”和“的值” date“带有回车符号的字段和值字段数据值后的换行符。

看来字段/列名称和该字段中数据的实际值是用冒号分隔的,我把它扔了。

我正在使用Windows,并且更喜欢PowerShell解决方案,即使它需要进行.NET调用或其他任何东西,但我对任何人都可以实现的任何Windows解决方案都持开放态度。

对于任何可以帮助我解决这个问题的人来说,我会永远感激和欠你的债务,因为我已经被困在这么多时间做了很多事情而且无法弄清楚如何做到这一点。数据来自一个数据无法更改的来源,但也许有一种我没有找到的方法。

结束数据重新格式化/解析

"2016-11-05T21:15:47Z",0.0
"2016-11-05T21:15:48Z",1.0
"2016-11-05T21:15:50Z",0.0
"2016-11-05T21:15:53Z",:0.0
"2016-11-05T21:15:57Z",:0.0
"2016-11-05T21:16:00Z",1.0
"2016-11-05T21:16:02Z",1.0
"2016-11-05T21:16:04Z",1.0
"2016-11-05T21:16:07Z",1.0

1 个答案:

答案 0 :(得分:2)

即使您的数据包含以逗号分隔的字段,也不是 CSV 数据。

数据行后面没有标题行;相反,在一行上只有一系列名称 - 值对,其中名称不是唯一

以下基于正则表达式的解决方案适用于您的示例输入:

# Replace the literal with `Get-Content YourFile` to load data from a file.
$s='"success":true,"dev":"id":999999999,"name":"device name","tags":"id":99999,"name":"devicesname","dataType":"Int","description":"my description","alarmHint":"","value":0.0,"quality":"good","deviceTagId":99,"history":"date":"2016-11-05T21:15:47Z","value":0.0,"date":"2016-11-05T21:15:48Z","value":1.0,"date":"2016-11-05T21:15:50Z","value":0.0,"date":"2016-11-05T21:15:53Z","value":0.0,"date":"2016-11-05T21:15:57Z","value":0.0,"date":"2016-11-05T21:16:00Z","value":1.0,"date":"2016-11-05T21:16:02Z","value":1.0,"date":"2016-11-05T21:16:04Z","value":1.0,"date":"2016-11-05T21:16:07Z","value":1.0'

# - Remove the part of the line before the first "date" entry.
# - Then extract the values from adjacent "date"-"value" pairs and output 
#   each value pair on a separate line.
$s -replace '^.+?("date":.+)', '$1' -replace '.+?:([^,]+),.+?:([^,]+)', ('$1,$2' + "`r`n")