Question

我有一个4列数据，如下所示：

 a    1    g    1,2,3,4,5,6,7
 b    2    g    3,5,3,2,6,4,3,2
 c    3    g    5,2,6,3,4
 d    4    g    1,5,3,6,4,7

我正在尝试删除整个第四列中的前两个数字和最后两个数字，因此输出如下所示

 a    1    g    3,4,5
 b    2    g    3,2,6,4
 c    3    g    6
 d    4    g    3,6

有人能帮我一个忙吗？我很感激。

Answer 1

您可以使用：

$ awk '{n=split($4, a, ","); for (i=3; i<=n-2; i++) t=t""a[i](i==n-2?"":","); print $1, $2, $3, t; t=""}' file
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6

解释

n=split($4, a, ",")根据逗号作为分隔符将第4个字段分片。当split()返回我们获得的件数时，我们会将其存储在n中，以便以后使用它。
for (i=3; i<=n-2; i++) t=t""a[i](i==n-2?"":",")将t存储在最后一个字段中，循环遍历所有切片。
print $1, $2, $3, t; t=""打印新输出并清空变量t。

Answer 2

这适用于您发布的示例输入：

$ awk '{gsub(/^([^,]+,){2}|(,[^,]+){2}$/,"",$NF)}1' file
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6

如果您的第4个字段中有少于4个逗号的情况，请更新您的问题以说明应如何处理这些逗号。

Answer 3

这使用bash数组操作。这可能有点......粗糙：

while read -a fields; do                      # read the fields for each line
    IFS=, read -a values <<< "${fields[3]}"   # split the last field on comma
    new=("${values[@]:2:${#values[@]}-4}")    # drop the first 2 and last fields
    fields[3]=$(IFS=,; echo "${new[*]}")      # join the new list on comma
    printf "%s\t" "${fields[@]}"; echo        # print the new line
done <<END
 a    1    g    1,2,3,4,5,6,7
 b    2    g    3,5,3,2,6,4,3,2
 c    3    g    5,2,6,3,4
 d    4    g    1,5,3,6,4,7
END

a   1   g   3,4,5   
b   2   g   3,2,6,4 
c   3   g   6   
d   4   g   3,6

awk：删除以逗号分隔的字段的第一个和最后一个条目

3 个答案:

解释