Question

我正在比较不同软件的txt输出。每个都有自己的输出格式。它可以是JSON，XML，CSV，也可以是自定义格式。

我正在寻找能够定义我的正则表达式的工具/语言，以便自动化文件解析生成我自己的输出。

我想比10年前的答案是：只需使用perl。

今天我正在使用这个usign python脚本，但我想知道是否有更具体的功能。

其他要求是拥有轻便，独立，便携，易学，易于保护的东西。

有什么建议吗？感谢

修改

根据要求：我正在讨论一些我正在讨论的文件解析。

来源我

<Phase = "phase2 - Name of the phase"  duration = "0.080" />
<Phase = "phase3 - Name of the phase"  duration = "3.670" />
<Phase = "phase4 - Name of the phase"  duration = "0.010" />
<Phase = "phase5 - Name of the phase"  duration = "0.030" />
<Phase = "phase6 - Name of the phase"  duration = "0.000" />

来源II

Round=50 Res one=-119.053794 Res two=0.007623 Value 1=0.011147 Best Res one=-119.053794 Perc accuracy=0.000000 eta =0.100000  time=0.042774
Round=74 Res one=-121.077763 Res two=0.004456 Value 1=0.000000 Best Res one=-121.077763 Perc accuracy=0.112613 eta =0.100000  time=0.049079
Round=75 Res one=-121.077763 Res two=0.000000 Value 1=0.000000 Best Res one=-121.077763 Perc accuracy=0.369369 eta =0.100000  time=0.049541

我想生成CSV，例如

Out 1

"phase2";"Name of the phase";0.080
"phase3";"Name of the phase";3.670
"phase4";"Name of the phase";0.010
"phase5";"Name of the phase";0.030
"phase6";"Name of the phase";0.000

Out 2

50;-119.053794;0.007623;0.011147;-119.053794;0.000000;0.100000;0.042774
74;-121.077763;0.004456;0.000000;-121.077763;0.112613;0.100000;0.049079
75;-121.077763;0.000000;0.000000;-121.077763;0.369369;0.100000;0.049541

Answer 1

我在Awk ONLY 上发布了一次尝试，因为您声明的文件不是实际的XML工具，因为有专门的解析工具，例如xmlstarlet实现这一点。

Awk是一个功能强大的工具，可用于输入示例

awk 'BEGIN{FS="\""}{n=split($2,x," - "); for(i=1; i<=n; i++){ printf "\"%s\";",x[i]} printf "%s\n",$4}' file
"phase2";"Name of the phase";0.080
"phase3";"Name of the phase";3.670
"phase4";"Name of the phase";0.010
"phase5";"Name of the phase";0.030
"phase6";"Name of the phase";0.000

Awk使用语法BEGIN{}{}END{}子句一次解析输入行，BEGIN和END在处理实际文件之前和之后执行分别。有一些built-in special variables，其中重要的是输入和输出字段分隔符，FS和OFS。输入行按FS拆分，可以从$1，$2访问各个字段。

对于第一个例子中的解决方案，

输入字段分隔符设置为双引号"，以便为您的案例解析双引号内的字符串。这里$2包含整个字符串phase2 - Name of the phase，因此要将它们分开，我们使用GNU Awk split function将取消限制器（-）拆分的字段返回到数组{{ 1}}和字数（x）
现在打印使用n找到的值，格式说明符包括双引号。

并且对于第二个例子，

printf

这是一个非常直接的解决方案，关键是将输入字段分隔符设置为awk -F'[^0-9.-]*' '{for(i=1;i<=NF;i++){ if (length($i)){printf "%s;",$i} printf "\n" }' file 50;-119.053794;0.007623;1;0.011147;-119.053794;0.000000;0.100000;0.042774; 74;-121.077763;0.004456;1;0.000000;-121.077763;0.112613;0.100000;0.049079; 75;-121.077763;0.000000;1;0.000000;-121.077763;0.369369;0.100000;0.049541;，即意味着在包含的字符不出现时拆分，即来自[^0-9.-]*的数字，{{1和/或0-9。有了这个，现在很容易解析这一行，使各个字段循环到.，这意味着一行中的最大字段数。 -功能是为了确保打印时不包含空字符。

如果您发现解决方案非常复杂，建议您阅读GAWK: Effective AWK Programming by Arnold D. Robbins以开始使用该语言。

文件解析瑞士刀

1 个答案: