我有一些可预测的.toml
文件,其内容结构如下:
key1 = "someID"
key2 = "someVersionNumber"
key3 = "someTag"
key4 = "someOtherTag"
key5 = [] #empty array, sometimes contains strings
key6 = "long text"
key7 = "more text"
key8 = """
- text
- more text
- so much text
"""
我想像这样将其转换为CSV:
"key1","key2","key3","key4","key5","key6","key7","key8"
"someID","someVersionNumber","someTag","someOtherTag","","long text","more text", "- text- more text- so much text"
我可以用几行bash命令来做到这一点吗?
如果我想将CSV的所有行合并为一个行,例如
"key1","key2","key3","key4","key5","key6","key7","key8"
"someID","someVersionNumber","someTag","someOtherTag","","long text","more text", "- text- more text- so much text"
"someID","someVersionNumber","someTag","someOtherTag","","long text","more text", "- text- more text- so much text"
"someID","someVersionNumber","someTag","someOtherTag","","long text","more text", "- text- more text- so much text"
...即输出将是每个.toml
文件的一行CSV加上顶部的标题(由于.toml
文件是可预测的,因此始终是相同的CSV标题和列数)。
我在看sed,awk还是更简单的东西?我看了一些相关的问题,但是觉得我必须丢失一些东西,因为我获得了太多的功能:
答案 0 :(得分:1)
如果只有一个输入文件,我将使用Perl单行代码。不幸的是,它变得相当复杂:
perl -pe 'if(/"""/&&s/"""/"/.../"""/&&s/"""/"\n/){s/[\n\r]//;};if(/ = \[([^]]*)]/){$r=$1eq""?"\"\"":$1=~s/"\s*,\s*"/ /gr;s/ = \[([^]]*)]/ = $r/};s/"\s*#[^"\n]*$/"/' one.toml | perl -ne 'if(/^([^"]+) = "(.*)"/){push@k,$1;push@v,"\"$2\""}END{print((join",",@k),"\n",join",",@v)}'
只有当我们需要一次处理多个(*
)文件时,情况才会变得更糟:
perl -ne 'if(/"""/&&s/"""/"/.../"""/&&s/"""/"\n/){s/[\n\r]//;};if(/ = \[([^]]*)]/){$r=$1eq""?"\"\"":$1=~s/"\s*,\s*"/ /gr;s/ = \[([^]]*)]/ = $r/};s/"\s*#[^"\n]*$/"/;print;print"-\n"if eof' *.toml | perl -ne 'if(/^-$/){push@o,join",",@k if scalar@o==0;push@o,join",",@v;@k=@v=()};if(/^([^"]+) = "(.*)"/){push@k,$1;push@v,"\"$2\""}END{print join"\n",@o}'
这两个因素要求使用结构化脚本。它在Perl中,但是可以使用Python或您喜欢的任何语言来完成:
#!/usr/bin/env perl
use strict; use warnings; my @output;
foreach my $filename (@ARGV) {
my $content, my @lines, my $replace, my @keys, my @values;
open my $fh, "<:encoding(utf8)", $filename or die "Could not open $filename: $!";
{local $/; $content = <$fh>;}
$content =~ s/"""([^"]*)"""/'"' . $1=~s#[\r\n]##rg . '"'/ge;
@lines = split (/[\r\n]/, $content);
foreach my $line (@lines) {
if ($line =~ m/ = \[([^]]*)]/) {
$replace = $1 eq "" ? '""' : $1 =~ s/"\s*,\s*"/ /gr;
$line =~ s/ = \[([^]]*)]/ = $replace/
}
$line =~ s/"\s*#[^"]*$/"/;
$line =~ m/^([^"]+) = "(.*)"/;
push @keys, $1;
push @values, '"' . $2 . '"'
}
push @output, join ",", @keys if scalar @output == 0;
push @output, join ",", @values
}
print join "\n", @output
注释:
很多复杂性是由于必须处理数组(!),注释和多行字符串。每个解决方案都需要进行一些预处理,这占了解决方案大部分时间。此外,还需要有关可能的极端情况以及如何处理它们的其他信息(例如,如何在CSV中容纳字符串数组)。所有这些仅强调了输入数据质量和一致性的重要性。所提出的解决方案绝不是完整的或可靠的,因为它确实对输入数据和所需的输出格式做出了一些假设。这是我解决上述问题的方法:
[]
或字符串数组["my", "array"]
。在OP没有明确说明的情况下,它们会转换为单个字符串,即所有元素字符串的串联。数组内不允许换行,数组也不能包含其他数组。试运行:
$ perl toml-to-csv.pl *.toml
"someID1","someVersionNumber1","someTag1","someOtherTag1","","long text1","more text1","- text- more text- so much text"
"someID2","someVersionNumber2","someTag2","someOtherTag2","Array","long text2","more text2","- text- more text- so much text"
"someID3","someVersionNumber3","someTag3","someOtherTag3","My array","long text3","more text3","- text- more text- so much text"
答案 1 :(得分:0)
var newData = data.Where(dt => ((DateTime)dt["END"]).Hour == 0 && ((DateTime)dt["END"]).Minute == 0 && ((DateTime)dt["END"]).Second == 0)
.Select(dt => new Dictionary<string, object>(dt.ToDictionary(kvp => kvp.Key, kvp => kvp.Key == "END" ? ((DateTime)kvp.Value).AddHours(23).AddMinutes(59).AddSeconds(59) : kvp.Value)))
.ToList();
。
$ cat tst.awk
BEGIN { OFS="," }
{
sub(/[[:space:]]*#[^"]*$/,"")
key = val = $0
}
sub(/^[[:alnum:]]+[[:space:]]+=[[:space:]]+/,"",val) {
sub(/[[:space:]]+.*/,"",key)
keys[++numKeys] = key
gsub(/^("""|\[])$|^"|"$/,"",val)
vals[numKeys] = val
}
/^-[[:space:]]+/ {
vals[numKeys] = vals[numKeys] val
}
/^"""$/ {
if ( !doneHdr++ ) {
for (keyNr=1; keyNr<=numKeys; keyNr++) {
printf "\"%s\"%s", keys[keyNr], (keyNr<numKeys ? OFS : ORS)
}
}
for (keyNr=1; keyNr<=numKeys; keyNr++) {
printf "\"%s\"%s", vals[keyNr], (keyNr<numKeys ? OFS : ORS)
}
}
用输入文件列表替换$ awk -f tst.awk file
"key1","key2","key3","key4","key5","key6","key7","key8"
"someID","someVersionNumber","someTag","someOtherTag","","long text","more text","- text- more text- so much text"
。
我在file
中使用的正则表达式删除以sub(/[[:space:]]*#[^"]*$/,"")
开头的注释,这意味着注释中不能包含双引号。我这样做是为了防止更改出现在数据字符串中的#
。随时找出更好的正则表达式或其他方法来处理您的评论。