我希望能够分割一个字符串,例如
"AUGCUGAUGCCUAGUCUGC"
进入数组
[ "AUG", "CUG", "AUG", "CCU", "AGU", "CUG" ]
这样每个数组元素只包含三个字符。
有没有办法以高效,可读的方式做到这一点?
答案 0 :(得分:8)
使用String#scan
是一种方式:
"AUGCUGAUGCCUAGUCUG".scan(/.../)
#=> ["AUG", "CUG", "AUG", "CCU", "AGU", "CUG"]
正则表达式/.../
匹配3个字符。
答案 1 :(得分:3)
通常,最快的方法是使用unpack
,这是撕掉二进制字符串的核心方法。
类似的东西:
str = "AUGCUGAUGCCUAGUCUGC"
str.unpack('A3' * (str.size / 3))
# => ["AUG", "CUG", "AUG", "CCU", "AGU", "CUG"]
请注意,unpack
将删除尾随"C"
,因为该行不是偶数3个字符的边界:
str.size # => 19
可以使用以下方法修复:
str.unpack('A3' * (str.size / 3) + 'A*')
# => ["AUG", "CUG", "AUG", "CCU", "AGU", "CUG", "C"]
'A3' * (str.size / 3)
你在问什么?
'A3' * (str.size / 3) # => "A3A3A3A3A3A3"
关于速度:unpack
很快。理解格式字符串有点神秘,但是值得花时间学习它:
require 'fruity'
STR = "AUGCUGAUGCCUAGUCUGC"
UNPACK_FORMAT_STR = 'A3' * (STR.size / 3) + 'A*'
compare do
unpack1 { STR.unpack('A3'*(STR.size/3) + 'A*') }
unpack2 { STR.unpack(UNPACK_FORMAT_STR) }
scan_it { STR.scan(/.{1,3}/) }
end
# >> Running each test 4096 times. Test will take about 1 second.
# >> unpack2 is faster than unpack1 by 60.00000000000001% ± 10.0%
# >> unpack1 is faster than scan_it by 2.6x ± 0.1
您可以看到预计算UNPACK_FORMAT_STR
的影响,因为unpack2
仅比unpack1
快60%,因为预先计算了值。
增加STR
的尺寸:
STR = "AUGCUGAUGCCUAGUCUGC" * 1000
UNPACK_FORMAT_STR = 'A3' * (STR.size / 3) + 'A*'
compare do
unpack1 { STR.unpack('A3'*(STR.size/3) + 'A*') }
unpack2 { STR.unpack(UNPACK_FORMAT_STR) }
scan_it { STR.scan(/.{1,3}/) }
end
# >> Running each test 8 times. Test will take about 1 second.
# >> unpack2 is similar to unpack1
# >> unpack1 is faster than scan_it by 3x ± 0.1
请注意,Fruity减少了测试次数。如果有更多的循环,那么两个unpack*
测试之间的差异会更大,而Fruity会反映出与第一次测试中类似的差异。在任何一种情况下,scan
仍会落后。
答案 2 :(得分:1)
我认为String#scan (mentioned by Yu)是最好的方法。另一种“红宝石”的做法显然更加冗长:
[5] pry(main)> str
=> "AUGCUGAUGCCUAGUCUG"
[6] pry(main)> str.chars.each_slice(3).map(&:join)
=> ["AUG", "CUG", "AUG", "CCU", "AGU", "CUG"]
答案 3 :(得分:-1)
int len = str.length(); int arrLen = len / 3;
String arr[] = new String[arrLen];
int k=0;
for(int i=3;i<=len;i+=3)
{
arr[k] = str.substring(i-3, i);
k++;
}
for(int i=0;i<arr.length;i++)
{
System.out.println(arr[i]);
}