如何从文本文件中随机选择列

时间:2017-07-06 15:29:16

标签: bash shell

我有一个文本文件(制表符分隔)

Name Attribute1 Attribute2 ..........Attribute1000    
aaaa   1           100                  0    
bbbb   0            20                  50    
cccc   10           20                  30

我想从属性中随机选择25列,但始终保留第一列(名称)。我怎样才能做到这一点?

2 个答案:

答案 0 :(得分:1)

好的 - 鉴于评论中的所有信息,我们得到如下列:

$ columns=1,$( for (( ii=2; ii<=1000; ii++ )); do echo $ii; done | sort -R | head -25 | sort -n | tr '\n' ',' | sed 's/,$//' )

然后,我们打印出这样的列:

$ cut --fields=${columns} columns.txt 

我只是将标题放在一行文件中:

$ for (( ii=1; ii<=1000; ii++ )); do
> echo -n "col${ii}<tab>" >> columns.txt
> done

(请注意,<tab>只是我放置的占位符,因此您可以看到它。实际上不要键入<tab>。)

现在,当我运行这些命令时:

$ columns=1,$( for (( ii=2; ii<=1000; ii++ )); do echo $ii; done | sort -R | head -25 | sort -n | tr '\n' ',' | sed 's/,$//' ) 

$ echo $columns
1,2,99,122,129,158,187,268,323,351,353,385,404,408,441,464,538,548,575,617,670,705,716,718,721,810

$ cut --fields=${columns} columns.txt 
col1    col2    col99   col122  col129  col158  col187  col268  col323  col351  col353  col385  col404  col408  col441  col464  col538  col548  col575      col617  col670  col705  col716  col718  col721  col810

答案 1 :(得分:1)

首先,制作要显示的列的序列:第一个列后跟25个独特的随机列。

TOTAL_COLUMN_AMOUNT=1001
TARGET_COLUMN_AMOUNT=25

columns=(1)               # initialize an array with the columns that must always be picked
while [ ${#columns[@]} -lt $(( TARGET_COLUMN_AMOUNT + 1 )) ]; do #until we have enough data
    current_column=$(( (RANDOM % TOTAL_COLUMN_AMOUNT) + 1 ))     #pick a random index
    if [[ ! " ${columns[@]} " =~ " $current_column " ]]; then    #if it's not already there
        columns[${#columns[@]}]=$current_column                  #append it to the array
    fi
done
#echo ${columns[@]} 

然后,您可以使用awk仅显示所选列:

awk -v var="${columns[*]}" 'BEGIN{split(columns,awk_columns," ")};{for (i in awk_columns) {print $i}}'