动态填充多维awk数组

时间:2017-05-12 21:41:41

标签: arrays multidimensional-array awk gawk

我正在处理一个解析文件的Awk / Gawk脚本,为每一行填充一个多维数组。第一列是句点分隔的字符串,每个值都是对下一级别的数组键的引用。第二列是值

以下是解析内容的示例:

$ echo -e "personal.name.first\t= John\npersonal.name.last\t= Doe\npersonal.other.dob\t= 05/07/87\npersonal.contact.phone\t= 602123456\npersonal.contact.email\t= john.doe@idk\nemployment.jobs.1\t= Company One\nemployment.jobs.2\t= Company Two\nemployment.jobs.3\t= Company Three"
personal.name.first     = John
personal.name.last      = Doe
personal.other.dob      = 05/07/87
personal.contact.phone  = 602123456
personal.contact.email  = john.doe@idk
employment.jobs.1       = Company One
employment.jobs.2       = Company Two
employment.jobs.3       = Company Three

在解析之后,我希望它具有与以下相同的结构:

data["personal"]["name"]["first"]     = "John"
data["personal"]["name"]["last"]      = "Doe"
data["personal"]["other"]["dob"]      = "05/07/87"
data["personal"]["contact"]["phone"]  = "602123456"
data["personal"]["contact"]["email"]  = "john.doe@foo.com"
data["employment"]["jobs"]["1"]       = Company One
data["employment"]["jobs"]["2"]       = Company Two
data["employment"]["jobs"]["3"]       = Company Three

我坚持的部分是如何在构造多维数组时动态填充键。

我发现this SO thread涵盖了一个类似的问题,该问题已通过使用SUBSEP变量解决,该变量最初似乎可以按我的需要工作,但经过一些测试后,它看起来像{ {1}}并不像真正的数组那样对待,例如arr["foo", "bar"] = "baz"会。我的意思是,无法计算数组中任何级别的值:arr["foo"]["bar"] = "baz"只会打印arr["foo", "bar"] = "baz"; print length(arr["foo"])(零)

我发现this SO thread有点帮助,可能指向正确的方向。

在提到的帖子的片段中:

0

非常接近,但我现在遇到的问题是需要动态指定密钥(EG:BEGIN { x=SUBSEP a="Red" x "Green" x "Blue" b="Yellow" x "Cyan" x "Purple" Colors[1][0] = "" Colors[2][0] = "" split(a, Colors[1], x) split(b, Colors[2], x) print Colors[2][3] } Red等),并且可能存在一个或多个键。

基本上,我如何获取Greena_keys字符串,按b_keys拆分它们,并将.a变量填充为多维数组?..

b

任何帮助将不胜感激,谢谢!

2 个答案:

答案 0 :(得分:2)

您只需要:

BEGIN { FS="\t= " }
{
    split($1,d,/\./)
    data[d[1]][d[2]][d[3]] = $2
}

查找

$ cat tst.awk
BEGIN { FS="\t= " }
{
    split($1,d,/\./)
    data[d[1]][d[2]][d[3]] = $2
}
END {
    for (x in data)
        for (y in data[x])
            for (z in data[x][y])
                print x, y, z, "->", data[x][y][z]
}

$ awk -f tst.awk file
personal other dob -> 05/07/87
personal name first -> John
personal name last -> Doe
personal contact email -> john.doe@idk
personal contact phone -> 602123456
employment jobs 1 -> Company One
employment jobs 2 -> Company Two
employment jobs 3 -> Company Three

以上是gawk特定的,因为没有其他awk支持真正的多维数组。

当索引不总是具有相同的深度(例如上面的3)时,要填充多维数组,它会更复杂:

##########
$ cat tst.awk
function rec_populate(a,idxs,curDepth,maxDepth,tmpIdxSet) {
    if ( tmpIdxSet ) {
        delete a[SUBSEP]                # delete scalar a[]
        tmpIdxSet = 0
    }
    if (curDepth < maxDepth) {
        # We need to ensure a[][] exists before calling populate() otherwise
        # inside populate() a[] would be a scalar, but then we need to delete
        # a[][] inside populate() before trying to create a[][][] because
        # creating a[][] below creates IT as scalar. SUBSEP used arbitrarily.

        if ( !( (idxs[curDepth] in a) && (SUBSEP in a[idxs[curDepth]]) ) ) {
            a[idxs[curDepth]][SUBSEP]   # create array a[] + scalar a[][]
            tmpIdxSet = 1
        }
        rec_populate(a[idxs[curDepth]],idxs,curDepth+1,maxDepth,tmpIdxSet)
    }
    else {
        a[idxs[curDepth]] = $2
    }
}

function populate(arr,str,sep,  idxs) {
    split(str,idxs,sep)
    rec_populate(arr,idxs,1,length(idxs),0)
}

{ populate(arr,$1,",") }

END { walk_array(arr, "arr") }

function walk_array(arr, name,      i)
{
    # Mostly copied from the following URL, just added setting of "sorted_in":
    #   https://www.gnu.org/software/gawk/manual/html_node/Walking-Arrays.html
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (i in arr) {
        if (isarray(arr[i]))
            walk_array(arr[i], (name "[" i "]"))
        else
            printf("%s[%s] = %s\n", name, i, arr[i])
    }
}

##########
$ cat file
a uno
b,c dos
d,e,f tres_wan
d,e,g tres_twa
d,e,h,i,j cinco

##########
$ awk -f tst.awk file
arr[a] = uno
arr[b][c] = dos
arr[d][e][f] = tres_wan
arr[d][e][g] = tres_twa
arr[d][e][h][i][j] = cinco

答案 1 :(得分:0)

没有真正的multidim数组,你可以做更多的簿记

try {
    Write-DataTable -ServerInstance sql-server-1 -Database org -TableName employees -Data $($objSearcher.FindAll() | Select-Object -expandproperty properties | Select-Object $selectAttributes | Out-DataTable -ErrorAction Stop) -ErrorAction Stop
}
catch {
    $_
}