我有一个临时记录器(每天)读取多个传感器,并在每组日期/时间和温度之前将数据保存到单个.csv中,并附带一大堆标题信息。该文件看起来像这样:
"readerinfo","onlylistedonce"
"downloadinfo",YYYY/MM/DD 00:00:00
"timezone",-8
"headerstuff","headersuff"
"sensor1","sensorstuff"
"serial#","0000001"
"about15lines","ofthisstuff"
"header1","header2"
datetime,temp
datetime,temp
datetime,temp
"sensor2","sensorstuff"
"serial#","0000002"
"about15lines","ofthisstuff"
"header1","header2"
datetime,temp
datetime,temp
datetime,temp
"downloadcomplete"
我的目标是为每个传感器提取日期/时间和临时数据并将其保存为新文件,以便我可以在其上运行一些基本统计数据(hi / lo / avg temp)。 (如果我可以根据标题信息中列出的序列号以某种方式识别数据来自哪个传感器,那将是很美妙的,但这不如将数据分成几组那么重要。日期/时间列表的长度从传感器变化传感器根据他们记录的时间长短而传感器的数量也每天都在变化。即使我可以将传感器数据,标题信息和所有文件分成许多文件,但也有传感器,这将是一个良好的开端。
答案 0 :(得分:1)
这不是传统意义上的CSV文件。考虑到你对文件内容的描述,我想你已经知道了这一点。
如果datetime,temp
真正的行中没有任何双引号,则根据您的示例数据,以下脚本应该有效。此脚本是自包含的,因为它以内联方式声明示例数据。
重要:您需要修改包含$SensorList
变量声明的行。您必须使用传感器的名称填充此变量,或者您可以参数化脚本以接受传感器名称数组。
更新:我将脚本更改为参数化。
脚本的结果如下:
脚本的内容应如下所示。将脚本文件保存到文件夹(例如c:\test\test.ps1
),然后执行它。
# Declare text as a PowerShell here-string
$Text = @"
"readerinfo","onlylistedonce"
"downloadinfo",YYYY/MM/DD 00:00:00
"timezone",-8
"headerstuff","headersuff"
"sensor1","sensorstuff"
"serial#","0000001"
"about15lines","ofthisstuff"
"header1","header2"
datetime,tempfromsensor1
datetime,tempfromsensor1
datetime,tempfromsensor1
"sensor2","sensorstuff"
"serial#","0000002"
"about15lines","ofthisstuff"
"header1","header2"
datetime,tempfromsensor2
datetime,tempfromsensor2
datetime,tempfromsensor2
"downloadcomplete"
"@.Split("`n");
# Declare the list of sensor names
$SensorList = @('sensor1', 'sensor2');
$CurrentSensor = $null;
# WARNING: Clean up all CSV files in the same directory as the script
Remove-Item -Path $PSScriptRoot\*.csv;
# Iterate over each line in the text file
foreach ($Line in $Text) {
#region Line matches double quote
if ($Line -match '"') {
# Parse the property/value pairs (where double quotes are present)
if ($Line -match '"(.*?)",("(?<value>.*)"|(?<value>.*))') {
$Entry = [PSCustomObject]@{
Property = $matches[1];
Value = $matches['value'];
};
if ($matches[1] -in $SensorList) {
$CurrentSensor = $matches[1];
Write-Host -ForegroundColor Green -Object ('Current sensor is: {0}' -f $CurrentSensor);
}
}
}
#endregion Line matches double quote
#region Line does not match double quote
else {
# Parse the datetime/temp pairs
if ($Line -match '(.*?),(.*)') {
$Entry = [PSCustomObject]@{
DateTime = $matches[1];
Temp = $matches[2];
};
# Write the sensor's datetime/temp to its file
Add-Content -Path ('{0}\{1}.csv' -f $PSScriptRoot, $CurrentSensor) -Value $Line;
}
}
#endregion Line does not match double quote
}
答案 1 :(得分:0)
使用您提供的数据样本,此脚本的输出如下:
C:\ sensoroutput_20140204.csv
sensor1,datetime,temp
sensor1,datetime,temp
sensor1,datetime,temp
sensor2,datetime,temp
sensor2,datetime,temp
sensor2,datetime,temp
我相信这就是你要找的东西。这里的假设是新行字符。 get-content
行正在读取数据并通过使用2个新行字符作为要拆分的分隔符将其分为“集合”。我选择使用环境(Windows)换行符。您的源文件可能具有不同的换行符。您可以使用Notepad ++查看它们是哪些字符,例如\ r \ n,\ n等
$newline = [Environment]::NewLine
$srcfile = "C:\sensordata.log"
$dstpath = 'C:\sensoroutput_{0}.csv' -f (get-date -f 'yyyyMMdd')
# Reads file as a single string with out-string
# then splits with a delimiter of two new line chars
$datasets = get-content $srcfile -delimiter ($newline * 2)
foreach ($ds in $datasets) {
$lines = ($ds -split $newline) # Split dataset into lines
$setname = $lines[0] -replace '\"(\w+).*', '$1' # Get the set or sensor name
$lines | % {
if ($_ -and $_ -notmatch '"') { # No empty lines and no lines with quotes
$data = ($setname, ',', $_ -join '') # Concats set name, datetime, and temp
Out-File -filepath $dstpath -inputObject $data -encoding 'ascii' -append
}
}
}