Question

我刚刚发了一篇文章，其中我正在寻求修复文本文件的帮助.. 我的问题是我有一个文本文件，其中的行放置不正确..

示例：

其中脚本的目的是以正确的顺序连接每个句子的概率。

所以在这种情况下，最终结果是

许多解决方案之一是：

awk 'NF == 2{ match($1,/^[0-9]+(_[0-9]+){7}/); k = substr($1,RSTART,RLENGTH); next }
     { $NF=""; a[k]=a[k]"\n "$0 }
     END { for(i in a) printf "%s [%s ]\n\n",i,a[i] }' input

我目前正在努力理解为什么它确实有效？它如何准确地连接正确的概率？...

很抱歉做了“转贴”，但我找不到原帖，这就是我必须这样做的原因。

Answer 1

LIKE

这个awk功能到底是做什么的？基本上，它会变成这些：

awk '
NF == 2 {                            # for those records with 2 fields
    match($1,/^[0-9]+(_[0-9]+){7}/)  # look for 1_1_1_1_0_0_1_0_2279
    k = substr($1,RSTART,RLENGTH)    # k=1_1_1_1_0_0_1_0_2279
    next                             # skip to next record
}
{                                    # for all the other kinds of records
    $NF=""                           # delete the ] from the end
    a[k]=a[k]"\n "$0                 # hash into a using k as key, "grouping"
}
END {                                # after all data is grouped into a
    for(i in a)                      # for each key
        printf "%s [%s ]\n\n",i,a[i] # print the key and the data
}' input

进入这个：

1_1_1_1_1_0_1_0_666  [
  1 0 0 ]
1_1_1_1_1_0_1_0_666  [
  0 1 0 ]

它连接它们在输入文件中出现的顺序中的每个句子的概率，并以随机顺序打印出“句子”。

这个awk功能到底是做什么的？

1 个答案: