我有一个以空格分隔的输入文本文件。我想使用sed或awk删除列标题大小的列。
输入文件:
id quantity colour shape size colour shape size colour shape size
1 10 blue square 10 red triangle 8 pink circle 3
2 12 yellow pentagon 3 orange rectangle 9 purple oval 6
期望的输出:
id quantity colour shape colour shape colour shape
1 10 blue square red triangle pink circle
2 12 yellow pentagon orange rectangle purple oval
答案 0 :(得分:6)
awk
命令awk '
NR==1{
for(i=1;i<=NF;i++)
if($i!="size")
cols[i]
}
{
for(i=1;i<=NF;i++)
if(i in cols)
printf "%s ",$i
printf "\n"
}' input > output
column -t -s ' ' output
id quantity colour shape colour shape colour shape
1 10 blue square red triangle pink circle
2 12 yellow pentagon orange rectangle purple oval
答案 1 :(得分:3)
使用awk
的一般解决方案。 columns_to_delete
块中有一个硬编码变量(BEGIN
),用于指示要删除的字段的位置。然后,脚本将计算每个字段的宽度,并删除与变量位置匹配的字段。
假设infile
包含问题的内容以及script.awk
的以下内容:
BEGIN {
## Hard-coded positions of fields to delete. Separate them with spaces.
columns_to_delete = "5 8 11"
## Save positions in an array to handle it better.
split( columns_to_delete, arr_columns )
}
## Process header.
FNR == 1 {
## Split header with a space followed by any non-space character.
split( $0, h, /([[:space:]])([^[:space:]])/, seps )
## Use FIELDWIDTHS to handle fixed format of data. Set that variable with
## length of each field, taking into account spaces.
for ( i = 1; i <= length( h ); i++ ) {
len = length( h[i] seps[i] )
FIELDWIDTHS = FIELDWIDTHS " " (i == 1 ? --len : i == length( h ) ? ++len : len)
}
## Re-calculate fields with new FIELDWIDTHS variable.
$0 = $0
}
## Process header too, and every line with data.
{
## Flag to know if 'p'rint to output a field.
p = 1
## Go throught all fields, if found in the array of columns to delete, reset
## the 'print' flag.
for ( i = 1; i <= NF; i++ ) {
for ( j = 1; j <= length( arr_columns ); j++ ) {
if ( i == arr_columns[j] ) {
p = 0
break
}
}
## Check 'print' flag and print if set.
if ( p ) {
printf "%s", $i
}
else {
printf " "
}
p = 1
}
printf "\n"
}
像以下一样运行:
awk -f script.awk infile
使用以下输出:
id quantity colour shape colour shape colour shape
1 10 blue square red triangle pink circle
2 12 yellow pentagon orange rectangle purple oval
编辑:哦,刚刚意识到输出不对,因为两个字段之间的连接。修复这将是太多的工作,因为在开始处理任何事情之前将检查每行的最大列大小。但是有了这个脚本,我希望你能得到这个想法。现在不是时候,也许我可以稍后尝试修复它,但不确定。
编辑2 :修复了为删除的每个字段添加额外空间的问题。这比预期容易: - )
编辑3 :见评论。
我修改了BEGIN
块以检查是否提供了一个额外的变量作为参数。
BEGIN {
## Check if a variable 'delete_col' has been provided as argument.
if ( ! delete_col ) {
printf "%s\n", "Usage: awk -v delete_col=\"column_name\" -f script.awk " ARGV[1]
exit 0
}
}
并添加到FNR == 1
模式计算要删除的列数的过程:
## Process header.
FNR == 1 {
## Find column position to delete given the name provided as argument.
for ( i = 1; i <= NF; i++ ) {
if ( $i == delete_col ) {
columns_to_delete = columns_to_delete " " i
}
}
## Save positions in an array to handle it better.
split( columns_to_delete, arr_columns )
## ...
## No modifications from here until the end. Same code as in the original script.
## ...
}
现在运行它:
awk -v delete_col="size" -f script.awk infile
结果将是相同的。
答案 2 :(得分:1)
使用cut
:
$ cut -d' ' -f1-4,6,7,9,10 < in.txt
id quantity colour shape colour shape colour shape
1 10 blue square red triangle pink circle
2 12 yellow pentagon orange rectangle purple oval
答案 3 :(得分:0)
给定固定的文件格式:
cut -f 1-4,6-7,9-10 infile
答案 4 :(得分:0)
如果你有GNU cut可用,可以这样做:
columns=$(head -n1 INPUT_FILE \
| tr ' ' '\n' \
| cat -n \
| grep size \
| tr -s ' ' \
| cut -f1 \
| tr -d ' ' \
| paste -sd ",")
cut --complement -d' ' -f$columns INPUT_FILE
根据标题生成以逗号分隔的列表,然后从INPUT_FILE中删除该列表的补充。