Question

我正在处理从AWK中的curl命令获得的一些结果，但是尽管阅读了有关match和regexp的文章，但我仍然遇到一些问题。我已经编写了所有内容，但是以一种非常怪异的方式使用了很多substr和非常基本的匹配用法，而没有使用正则表达式捕获任何东西。

我的真实数据要复杂一些，但这是一个简化的版本。假设以下内容存储在字符串str中：

[{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]},{"DataA":"190","DataBee":"63100","DataC":[55,22,64,838,2]},{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]}][{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]},{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]}]

关于此数据的一些注意事项：

请注意，第一个括号[]中有3个“集”数据，以{}分隔，第二个括号中有2个数据集。该字符串在每组括号中始终至少包含一组数据，并且在至少一组括号中（即，它将永远不是空字符串，并且始终包含一些有效数据）

括号也用于DataC数据，因此需要以某种方式加以考虑

除分隔符外，字符串中不会出现标点符号-所有实际数据均为字母数字

DataA，DataBee和DataC字段将始终具有这些名称

DataC的数据将始终为正好5个数字，用逗号分隔

我想做的是编写一个循环，该循环将遍历字符串并提取值-a =不管DataA是什么（在第一种情况下为200），b =不管DataBee是什么（在第一种情况下为63500）情况），并且c [1]至c [5]包含来自DataC的值。

我觉得，如果我可以就上述数据获取有关如何执行此操作的想法，可以使用它来使其适应我的需求。到目前为止，我使用substr的循环就像30行长：（

Answer 1

使用awk有趣：

我使用“复杂”的FS和RS变量拆分json。这样，我每列最多有一个值，每行有1个数据（DataA，DataBee，DataC）。

要了解FS和RS的用法，请参阅此命令的工作方式：

 func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [String : AnyObject ]){
    // The info dictionary may contain multiple representations of the image. You want to use the original.
        guard let selectedImage = info[ UIImagePickerControllerOriginalImage ] as? UIImage else {
            fatalError("Expected a dictionary containing an image, but was provided the following: \(info)")
        }
        // Set photoImageView to display the selected image.
        photoImageView.image = selectedImage
        // Dismiss the picker.
        dismiss(animated: true, completion: nil)
 }

（您可以将awk -F",|\":\"|:\\\[" ' {$1=$1}1 ' OFS="\t" RS="\",\"|},{|\\\]" file替换为file或<(curl <your_url>)）

返回：

<(echo <your_json_str>)

现在看来我可以在awk中使用它了：

[{"DataA        200                           
DataBee 63500                                 
DataC"  3       22      64      838     2     

"DataA  190                                   
DataBee 63100                                 
DataC"  55      22      64      838     2     

"DataA  200                                   
DataBee 63500                                 
DataC"  3       22      64      838     2     
}                                             
[{"DataA        200                           
DataBee 63500                                 
DataC"  3       22      64      838     2     

"DataA  200                                   
DataBee 63500                                 
DataC"  3       22      64      838     2     
}

此命令将值存储在变量中，并在设置a和b和c时将其打印出来。

返回：

awk -F",|\":\"|:\\\[" '
    /DataA/{a=$2}
    /DataBee/{b=$2}
    /DataC/{for(i=2;i<=NF;i++){c[i-1]=$i}}
    a!=""&&b!=""&&c[1]!=""{
        print "a: ", a; 
        print "b: ", b; 
        printf "c: "; 
        for(i in c){
            printf "%s, ", c[i]
        }; 
        print ""; 
        a=""; b=""; c[1]=""
    }
' RS="\",\"|},{|\\\]" file

为了使用awk有趣，请匹配此excellent answer：

a:  200
b:  63500
c: 3, 22, 64, 838, 2,
a:  190
b:  63100
c: 55, 22, 64, 838, 2,
a:  200
b:  63500
c: 3, 22, 64, 838, 2,
a:  200
b:  63500
c: 3, 22, 64, 838, 2,
a:  200
b:  63500
c: 3, 22, 64, 838, 2,

返回

awk ' 
function find_all(str, patt) {
        while (match(str, patt, a) > 0) {
            for (i=1; i in a; i++) print a[i]
            str = substr(str, RSTART+RLENGTH)
        }
    }
{
    print "Catching DataA"
    find_all($0, "DataA\":\"([0-9]*)")
    print "Catching DataBee"
    find_all($0, "DataBee\":\"([0-9]*)")
    print "Catching DataC"
    find_all($0, "DataC\":.([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*)")
}
' file

现在您已经看到它很丑陋，看看使用python多么容易：

Catching DataA
200
190
200
200
200
Catching DataBee
63500
63100
63500
63500
63500
Catching DataC
3
22
64
838
2
55
22
64
838
2
3
22
64
838
2
3
22
64
838
2
3
22
64
838
2

返回：

import json

data_str = '[{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]},{"DataA":"190","DataBee":"63100","DataC":[55,22,64,838,2]},{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]}][{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]},{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]}]'

while data_str:
    data, index = json.JSONDecoder().raw_decode(data_str)
    for element in data:
        print("DataA: ", element["DataA"])
        print("DataBee: ", element["DataBee"])
        print("DataC: ", element["DataC"])
    data_str = data_str[index:]

此解决方案不仅更简洁，而且如果您遇到意外结果或意外格式化，它也会更强大。

Answer 2

我建议使用jq，例如：

jq -c '.[]' <<<"$str"

{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]}
{"DataA":"190","DataBee":"63100","DataC":[55,22,64,838,2]}
{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]}
{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]}
{"DataA":"200","DataBee":"63500","DataC":[3,22,64,838,2]}

要提取DataC：

jq -c '.[] | .DataC' <<<"$str"

输出：

[3,22,64,838,2]
[55,22,64,838,2]
[3,22,64,838,2]
[3,22,64,838,2]
[3,22,64,838,2]

如何在AWK中对给定的数据集使用匹配

2 个答案: