如何替换BASH环境中其他列中的随机数列中的数字列表

时间:2019-05-10 10:58:27

标签: bash unix awk split

我有一个带有两列的标签文件

5 6 14 22 23 25 27 84 85 88 89 94 95 98 100             6 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406   205 284 307 406
2 10 13 40 47 58                                        2 13 40 87

并且期望的输出应该是

5 6 14 22 23 25 27 84 85 88 89 94 95 98 100             14 27
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406   6 209 299 305
2 10 13 23 40 47 58 87                                  10 23 40 58

我想将第二列中的数字更改为第一列中的随机数,从而导致第二列中的输出具有相同的数字数。我的意思是如果x行的第二列中有四个数字,则输出必须为此行的第一列中有四个随机数,依此类推...

我尝试通过AWK创建两个数组,然后将第二列中的每个数字拆分并替换为第一列中的数字,但不是以随机的方式。我已经看过rand()函数,但是我不完全知道脚本中如何将这两件事结合在一起。是否可以在BASH环境中进行操作,或者还有其他更好的方法在BASH环境中进行操作?预先感谢

3 个答案:

答案 0 :(得分:1)

假设有一个制表符来分隔两列,并且每一列都是一个用空格分隔的列表:

<!-- reset/override all class names with a binding  -->
<div class="bad curly special"
     [class]="badCurly">Bad curly</div>

答案 1 :(得分:1)

awk来营救!

$ awk -F'\t' 'function shuf(a,n)
                 {for(i=1;i<n;i++)
                    {j=i+int(rand()*(n+1-i));
                     t=a[i]; a[i]=a[j]; a[j]=t}}
             function join(a,n,x,s)
                  {for(i=1;i<=n;i++) {x=x s a[i]; s=" "}
                   return x}
             BEGIN{srand()}
                  {an=split($1,a," ");
                   shuf(a,an);
                   bn=split($2,b," ");
                   delete m; delete c; j=0;
                   for(i=1;i<=bn;i++) m[b[i]];
                   # pull elements from a upto required sample size, 
                   # not intersecting with the previous sample set
                   for(i=1;i<=an && j<bn;i++) if(!(a[i] in m)) c[++j]=a[i];
                   cn=asort(c);
                   print $1 FS join(c,cn)}' file


5 6 14 22 23 25 27 84 85 88 89 94 95 98 100     85 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406   20 205 294 295
2 10 13 23 40 47 58 87  10 13 47 87

混洗(标准算法)输入数组,采样所需元素数量,附加要求是与现有采样集不相交。帮助程序结构图,用于保留现有样本集并用于 in 测试。其余的应该易于阅读。

答案 2 :(得分:0)

尝试一下:

# This can be an external file of course
# Note COL1 and COL2 seprated by hard TAB

cat <<EOF > d1.txt
5 6 14 22 23 25 27 84 85 88 89 94 95 98 100     6 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406   205 284 307 406
2 10 13 40 47 58        2 13 40 87
EOF

# Loop to read each line, not econvert TAB to:, though could have used IFS

cat d1.txt | sed 's/    /:/' | while read LINE
do
   # Get the 1st column data

   COL1=$( echo ${LINE} | cut -d':' -f1 )

   # Get col1 number of items

   NUM_COL1=$( echo ${COL1} | wc -w )

   # Get col2 number of items

   NUM_COL2=$( echo ${LINE} | cut -d':' -f2 | wc -w )

   # Now split col1 items into an array

   read -r -a COL1_NUMS <<< "${COL1}"


   COL2=" "

   # THis loop runs once for each COL2 item

   COUNT=0
   while [ ${COUNT} -lt ${NUM_COL2} ]
   do

      # Generate a random number to use as teh random index for COL1

      COL1_IDX=${RANDOM}
      let "COL1_IDX %= ${NUM_COL1}"

      NEW_NUM=${COL1_NUMS[${COL1_IDX}]}

      # Check for duplicate

      DUP_FOUND=$( echo "${COL2}" | grep ${NEW_NUM} )

      if [ -z "${DUP_FOUND}" ]
      then
         # Not a duplicate, increment loop conter and do next one

         let "COUNT = COUNT + 1 "

         # Add the random COL1 item to COL2

         COL2="${COL2} ${COL1_NUMS[${COL1_IDX}]}"
      fi
   done

   # Sort COL2

   COL2=$( echo ${COL2} | tr ' ' '\012' | sort -n | tr '\012' ' ' )

   # Print

   echo ${COL1} :: ${COL2}
done

输出:

5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 :: 88 95
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 :: 20 299 304 305
2 10 13 40 47 58 :: 2 10 40 58