Question

我想逐行从文本文件中提取子字符串。我需要的信息是在特定领域。例如，我有以下文字：

{name:x, version:1.0, info:"test", ...}
{name:y, version:0.1, info:"test again", ...}
{name:z, version:1.1, info:"test over", ...}

我尝试使用以下命令提取所有版本：

cut -d',' -f 2 <file name> | cut -d':' -f 2 > <output>

这并不完美。这适用于上面的示例，但如果我有以下条目：

{name:x, info: "test", ..., version:1.2, ...}

以上命令会报告错误的版本。我们有什么办法可以根据字段名称而不是列来提取信息吗？

预期结果：

1.0
0.1
1.1
1.2

Answer 1

使用此awk：

awk -v f='version' -F ' *[{}:=,] *| +' '{for (i=2; i<=NF; i++) if ($(i-1)==f) 
   {print $i; break}}' file
1.0
0.1
1.1
1.2

Answer 2

使用GNU grep获取-P（PCRE正则表达式）和--only-matching选项，您可以这样做：

$ cat file
{name:x, version:1.0, info:"test", ...}
{name:y, version:0.1, info:"test again", ...}
{name:z, version:1.1, info:"test over", ...}
{name:x, info: "test", ..., version=1.2, ...}
$ grep -oP '(?<=version.)[^,}]+' file
1.0
0.1
1.1
1.2

我们使用version后跟.（匹配任何字符），并在断言（?<=）后面显示正面，并将所有内容打印到,。

Answer 3

使用Grep和PCRE提取字段数据

如果您安装了pcregrep，或者您的grep已经使用PCRE支持进行编译，那么您可以选择所需的字段。例如：

# grep with PCRE support
$ grep -Po 'version:\K[^,}]+' /tmp/corpus
1.0
0.1
1.1
1.2

# pcregrep doesn't need the -P flag
$ pcregrep -o 'version:\K[^,}]+' /tmp/corpus
1.0
0.1
1.1
1.2

无论哪种方式，您都可以通过找到版本字段来开始匹配，使用\K丢弃所有消费的字符，以便匹配仅捕获字段数据，然后匹配除了逗号或右括号。 -o标志告诉grep只打印出结果匹配，而不是整行。

你的Grep没有PCRE？只需使用Perl

如果你没有将Perl兼容的正则表达式（PCRE）编译成grep，那么你应该自己拥有Perl，因为它是Linux Standards Base的一部分。使用Perl：

# NB: Avoid speed penalty for $& when perl > 5.10.0 && perl < 5.20.0.
# Use $& and remove the /p flag if you don't have (or need) the
# ${^MATCH} variable.
$ perl -ne 'print "${^MATCH}\n" if /version:\K[^,}]+/p' /tmp/corpus
1.0
0.1
1.1
1.2

# Use the $& special variable when ${^MATCH} isn't available, or when
# using a version without the speed penalty.
$ perl -ne 'print "$&\n" if /version:\K[^,}]+/' /tmp/corpus 
1.0
0.1
1.1
1.2

Answer 4

通过sed，

$ sed 's/.*version:\([^,}]*\).*/\1/' file
1.0
0.1
1.1
1.2

Answer 5

再次发送

sed 's/^.*version://; s/[,}].*//' < file

1.0
0.1
1.1
1.2

Answer 6

这个perl

perl -nE 'say $3 if m/^\s*{ (([^"]|"[^"]*")*)* \bversion\s*:\s* ([\d.]*)/x'

将

不匹配引号内的version:2.2
不匹配oldversion:1.2

所以对于以下输入：

{name: a, version: 1.1, info: "the version: 9.1 is better", oldversion: 0.1}
{name: b, version: 1.2, oldversion: 0.2, info: "the version: 9.2 is better"}
{name: c, info: "the version: 9.3 is better", version: 1.3, oldversion: 0.3}
{name: d, info: "the version: 9.4 is better", oldversion: 0.4, version: 1.4}

将打印

1.1
1.2
1.3
1.4

Answer 7

sed 's/.* version://;s/[^0-9.].*//' YourFile

假设版本号仅使用点和数字而没有内部值内容version:

Answer 8

这对我有用，

[root@giam20 ~]# cut -f2 -d "," sample.txt | cut -f2 -d ":"
1.0
0.1
1.1

如何通过名称而不是固定列从类似JSON的文本中提取字段？

8 个答案:

使用Grep和PCRE提取字段数据

你的Grep没有PCRE？只需使用Perl