我有一个文本文件(制表符分隔)
Name Attribute1 Attribute2 ..........Attribute1000
aaaa 1 100 0
bbbb 0 20 50
cccc 10 20 30
我想从属性中随机选择25列,但始终保留第一列(名称)。我怎样才能做到这一点?
答案 0 :(得分:1)
好的 - 鉴于评论中的所有信息,我们得到如下列:
$ columns=1,$( for (( ii=2; ii<=1000; ii++ )); do echo $ii; done | sort -R | head -25 | sort -n | tr '\n' ',' | sed 's/,$//' )
然后,我们打印出这样的列:
$ cut --fields=${columns} columns.txt
我只是将标题放在一行文件中:
$ for (( ii=1; ii<=1000; ii++ )); do
> echo -n "col${ii}<tab>" >> columns.txt
> done
(请注意,<tab>
只是我放置的占位符,因此您可以看到它。实际上不要键入<tab>
。)
现在,当我运行这些命令时:
$ columns=1,$( for (( ii=2; ii<=1000; ii++ )); do echo $ii; done | sort -R | head -25 | sort -n | tr '\n' ',' | sed 's/,$//' )
$ echo $columns
1,2,99,122,129,158,187,268,323,351,353,385,404,408,441,464,538,548,575,617,670,705,716,718,721,810
$ cut --fields=${columns} columns.txt
col1 col2 col99 col122 col129 col158 col187 col268 col323 col351 col353 col385 col404 col408 col441 col464 col538 col548 col575 col617 col670 col705 col716 col718 col721 col810
答案 1 :(得分:1)
首先,制作要显示的列的序列:第一个列后跟25个独特的随机列。
TOTAL_COLUMN_AMOUNT=1001
TARGET_COLUMN_AMOUNT=25
columns=(1) # initialize an array with the columns that must always be picked
while [ ${#columns[@]} -lt $(( TARGET_COLUMN_AMOUNT + 1 )) ]; do #until we have enough data
current_column=$(( (RANDOM % TOTAL_COLUMN_AMOUNT) + 1 )) #pick a random index
if [[ ! " ${columns[@]} " =~ " $current_column " ]]; then #if it's not already there
columns[${#columns[@]}]=$current_column #append it to the array
fi
done
#echo ${columns[@]}
然后,您可以使用awk
仅显示所选列:
awk -v var="${columns[*]}" 'BEGIN{split(columns,awk_columns," ")};{for (i in awk_columns) {print $i}}'