Question

我有一个文件（filename.txt），结构如下：

>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ

我想将不启动机智>的字符串中的字符洗牌。输出（例如）如下所示：

>line1
DGC
FEI
HBA
>line2
JRP
OKN
QML
>line3
SZV
YXT
UW

这就是我试图为每个>line[number]：ruby -lpe '$_ = $_.chars.shuffle * "" if !/^>/' filename.txt洗牌的内容。该命令有效（请参阅我的帖子BASH - Shuffle characters in strings from file），但它会逐行洗牌。我想知道如何修改命令以在每个>line[number]的所有字符串之间混洗字符。使用ruby不是必需的。

Answer 1

首先，我们需要解决问题：如何在多行中混洗所有字符：

echo -e 'ABC\nDEF\nGHI' |grep -o . |shuf |tr -d '\n'
GDAFHEIBC

并且，我们还需要一个数组来记录原始字符串中每一行的长度。

s=GDAFHEIBC
lens=(3 3 3)
start=0
for len in "${lens[@]}"; do
    echo ${s:${start}:${len}}
    ((start+=len))
done
GDA
FHE
IBC

所以，原点有多行：

ABC
DEF
GHI

已被洗牌：

GDA
FHE
IBC

现在，我们可以完成我们的工作：

lens=()
string=""

function shuffle_lines {
    local start=0
    local shuffled_string=$(grep -o . <<< ${string} |shuf |tr -d '\n')
    for len in "${lens[@]}"; do
        echo ${shuffled_string:${start}:${len}}
        ((start+=len))
    done
    lens=()
    string=""
}

while read -r line; do
    if [[ "${line}" =~ ^\> ]]; then
        shuffle_lines
        echo "${line}"
    else
        string+="${line}"
        lens+=(${#line})
    fi
done <filename.txt

shuffle_lines

示例：

$ cat filename.txt
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
>line4
0123
456
78
9
$ ./solution.sh
>line1
HFG
BED
AIC
>line2
JOP
KMQ
RLN
>line3
UVW
TYZ
XS
>line4
1963
245
08
7

Answer 2

#!/bin/bash

# echo > output.txt         # uncomment to write in a file output.txt

mix(){
    {
        echo "$title"
        line="$( fold -w1 <<< "$line"  | shuf  )"
        echo "${line//$'\n'}" | fold -w3
    }  # >> output.txt         # uncomment to write in a file output.txt
    unset line
}

while read -r; do
    if [[ $REPLY =~ ^\> ]]; then
        mix
        title="$REPLY"
    else
        line+="$REPLY"
    fi
done < filename.txt
mix       # final mix after loop's exit, otherwise line3 will be not mixed

exit

编辑了gniourf-gniourf

的评论

Answer 3

首先创建一个测试文件。

str =<<FINI
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
FINI

File.write('test', str)
  #=> 56

现在阅读文件并执行所需的操作。

result = File.read('test').split(/(>line\d+)/).map do |s|
  if s.match?(/\A(?:|>line\d+)\z/)
    s
  else
    a = s.scan(/\p{Lu}/).shuffle
    s.gsub(/\p{Lu}/) { a.shift }
  end
end.join
  # ">line1\nECF\nHIA\nGBD\n>line2\nJNP\nKLR\nOQM\n>line3\nTXY\nUZV\nSW\n"

puts result
>line1
ECF
HIA
GBD
>line2
JNP
KLR
OQM
>line3
TXY
UZV
SW

要从命令执行此操作，请将代码转换为带有分号分隔的语句的字符串。

ruby -e "puts (File.read('test').split(/(>line\d+)/).map do |s|; if s.match?(/\A(?:|>line\d+)\z/); s; else; a = s.scan(/\p{Lu}/).shuffle; s.gsub(/\p{Lu}/) { a.shift }; end; end).join"

步骤如下。

a = File.read('test')
  #=> ">line1\nABC\nDEF\nGHI\n>line2\nJKL\nMNO\nPQR\n>line3\nSTU\nVWX\nYZ\n"
b = a.split(/(>line\d+)/)
  #=> ["", ">line1", "\nABC\nDEF\nGHI\n", ">line2", "\nJKL\nMNO\nPQR\n",
  #    ">line3", "\nSTU\nVWX\nYZ\n"]

请注意，split的参数的正则表达式将>line\d+放在捕获组中。如果不这样做，">line1"，">line2"和">line3"就不会包含在b中。

c = b.map do |s|
  if s.match?(/\A(?:|>line\d+)\z/)
    s
  else
    a = s.scan(/\p{Lu}/).shuffle
    s.gsub(/\p{Lu}/) { a.shift }
  end
end
  #=> ["", ">line1", "\nEAC\nIHB\nDGF\n", ">line2", "\nKQJ\nROL\nMPN\n",
  #    ">line3", "\nSUY\nXTV\nZW\n"]
c.join
  #=> ">line1\nEAC\nIHB\nDGF\n>line2\nKQJ\nROL\nMPN\n>line3\nSUY\nXTV\nZW\n"

现在更仔细地考虑c的计算。 b的第一个元素传递给块，块变量s设置为其值：

s = ""

然后我们计算

s.match?(/\A(?:|>line\d+)\z/)
  #=> true

所以从块中返回""。正则表达式可以表示如下。

/
\A          # match the beginning of the string
(?:         # begin a non-capture group
            # match an empty space
  |         # or
  >line\d+  # match '>line' followed by one or more digits
)           # end non-capture group
\z          # match the end of the string
/x          # free-spacing regex definition mode.

在非捕获组中，匹配了一个空白区域。

然后将b的下一个元素传递给块。

s = ">line1"

再次

s.match?(/\A(?:|>line\d+)\z/)
  #=> true

所以s从块中返回。

现在b的第三个元素被传递给块。（最后，有趣的事。）

s = "\nABC\nDEF\nGHI\n"
d = s.scan(/\p{Lu}/)
  #=> ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
a = d.shuffle
  #=> ["D", "C", "G", "H", "B", "F", "I", "E", "A"]
s.gsub(/\p{Lu}/) { a.shift }
  #=> "\nDCG\nHBF\nIEA\n"

其余的计算方法类似。

BASH - 来自多行的字符串中的随机字符

3 个答案: