使用代码点处理所有字符

Question

如何将字符串"Thequickbrownfoxjumps"拆分为Java中相等大小的子字符串。例如。等于4的"Thequickbrownfoxjumps"应该给出输出。

["Theq","uick","brow","nfox","jump","s"]

类似问题：

Split string into equal-length substrings in Scala

Answer 1

这是正则表达式的单行版本：

System.out.println(Arrays.toString(
    "Thequickbrownfoxjumps".split("(?<=\\G.{4})")
));

\G是一个零宽度断言，匹配上一个匹配结束的位置。如果之前没有匹配，则它与输入的开头匹配，与\A相同。封闭的lookbehind匹配从最后一场比赛结束开始的四个字符的位置。

lookbehind和\G都是高级正则表达式功能，并非所有版本都支持。此外，\G并未在支持它的各种风格中实现一致。这个技巧（例如）可以在Java，Perl，.NET和JGSoft中使用，但不能在PHP（PCRE），Ruby 1.9+或TextMate（都是Oniguruma）中使用。 JavaScript /y（粘性标记）不像\G那样灵活，即使JS确实支持lookbehind，也不能以这种方式使用。

我应该提一下，如果您有其他选择，我不一定推荐此解决方案。其他答案中的非正则表达式解决方案可能更长，但它们也是自我记录的;这个只是与相反的。 ;）

此外，这在Android中不起作用，Android不支持在外观中使用\G。

Answer 2

嗯，通过蛮力来做这件事很容易：

public static List<String> splitEqually(String text, int size) {
    // Give the list the right capacity to start with. You could use an array
    // instead if you wanted.
    List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);

    for (int start = 0; start < text.length(); start += size) {
        ret.add(text.substring(start, Math.min(text.length(), start + size)));
    }
    return ret;
}

我认为使用正则表达式并不值得。

编辑：我没有使用正则表达式的理由：

这不使用正则表达式的任何实际模式匹配。它只算数。
我怀疑以上会更有效率，虽然在大多数情况下都没关系
如果你需要在不同的地方使用变量大小，你可能需要重复或辅助函数来根据参数构建正则表达式 - ick。
另一个答案中提供的正则表达式首先没有编译（无效转义），然后没有工作。我的代码第一次工作。这更像是对正则表达式与普通代码IMO的可用性的证明。

Answer 3

使用Google Guava：

非常容易

for(final String token :
    Splitter
        .fixedLength(4)
        .split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

输出：

Theq
uick
brow
nfox
jump
s

或者如果您需要将结果作为数组，则可以使用以下代码：

String[] tokens =
    Iterables.toArray(
        Splitter
            .fixedLength(4)
            .split("Thequickbrownfoxjumps"),
        String.class
    );

参考：

注意：拆分器结构如上所示，但由于拆分器是不可变的并且可重用，因此将它们存储在常量中是一种很好的做法：

private static final Splitter FOUR_LETTERS = Splitter.fixedLength(4);

// more code

for(final String token : FOUR_LETTERS.split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

Answer 4

如果您正在使用Google的guava通用库（而且老实说，任何新的Java项目可能都应该），这对于Splitter来说是非常微不足道的类：

for (String substring : Splitter.fixedLength(4).split(inputString)) {
    doSomethingWith(substring);
}

那是它。很容易！

Answer 5

public static String[] split(String src, int len) {
    String[] result = new String[(int)Math.ceil((double)src.length()/(double)len)];
    for (int i=0; i<result.length; i++)
        result[i] = src.substring(i*len, Math.min(src.length(), (i+1)*len));
    return result;
}

Answer 6

public String[] splitInParts(String s, int partLength)
{
    int len = s.length();

    // Number of parts
    int nparts = (len + partLength - 1) / partLength;
    String parts[] = new String[nparts];

    // Break into parts
    int offset= 0;
    int i = 0;
    while (i < nparts)
    {
        parts[i] = s.substring(offset, Math.min(offset + partLength, len));
        offset += partLength;
        i++;
    }

    return parts;
}

Answer 7

您可以使用substring中的String.class（处理例外情况）或Apache lang commons（它处理您的例外情况）

static String   substring(String str, int start, int end)

把它放在一个循环中，你很高兴。

Answer 8

这是一个单行版本，该版本使用 Java 8 IntStream来确定切片起点的索引：

String x = "Thequickbrownfoxjumps";

String[] result = IntStream
                    .iterate(0, i -> i + 4)
                    .limit((int) Math.ceil(x.length() / 4.0))
                    .mapToObj(i ->
                        x.substring(i, Math.min(i + 4, x.length())
                    )
                    .toArray(String[]::new);

Answer 9

这是一个使用Java8流的单一代码实现：

String input = "Thequickbrownfoxjumps";
final AtomicInteger atomicInteger = new AtomicInteger(0);
Collection<String> result = input.chars()
                                    .mapToObj(c -> String.valueOf((char)c) )
                                    .collect(Collectors.groupingBy(c -> atomicInteger.getAndIncrement() / 4
                                                                ,Collectors.joining()))
                                    .values();

它提供以下输出：

[Theq, uick, brow, nfox, jump, s]

Answer 10

我宁愿这个简单的解决方案：

String content = "Thequickbrownfoxjumps";
while(content.length() > 4) {
    System.out.println(content.substring(0, 4));
    content = content.substring(4);
}
System.out.println(content);

Answer 11

如果您想要向后平均分割字符串，例如从右到左分割字符串，例如，将1010001111拆分为[10, 1000, 1111]，请输入以下代码：

/**
 * @param s         the string to be split
 * @param subLen    length of the equal-length substrings.
 * @param backwards true if the splitting is from right to left, false otherwise
 * @return an array of equal-length substrings
 * @throws ArithmeticException: / by zero when subLen == 0
 */
public static String[] split(String s, int subLen, boolean backwards) {
    assert s != null;
    int groups = s.length() % subLen == 0 ? s.length() / subLen : s.length() / subLen + 1;
    String[] strs = new String[groups];
    if (backwards) {
        for (int i = 0; i < groups; i++) {
            int beginIndex = s.length() - subLen * (i + 1);
            int endIndex = beginIndex + subLen;
            if (beginIndex < 0)
                beginIndex = 0;
            strs[groups - i - 1] = s.substring(beginIndex, endIndex);
        }
    } else {
        for (int i = 0; i < groups; i++) {
            int beginIndex = subLen * i;
            int endIndex = beginIndex + subLen;
            if (endIndex > s.length())
                endIndex = s.length();
            strs[i] = s.substring(beginIndex, endIndex);
        }
    }
    return strs;
}

Answer 12

我使用以下Java 8解决方案：

public static List<String> splitString(final String string, final int chunkSize) {
  final int numberOfChunks = (string.length() + chunkSize - 1) / chunkSize;
  return IntStream.range(0, numberOfChunks)
                  .mapToObj(index -> string.substring(index * chunkSize, Math.min((index + 1) * chunkSize, string.length())))
                  .collect(toList());
}

Answer 13

另一种蛮力解决方案可能是，

    String input = "thequickbrownfoxjumps";
    int n = input.length()/4;
    String[] num = new String[n];

    for(int i = 0, x=0, y=4; i<n; i++){
    num[i]  = input.substring(x,y);
    x += 4;
    y += 4;
    System.out.println(num[i]);
    }

代码只是通过子字符串遍历字符串

Answer 14

@Test
public void regexSplit() {
    String source = "Thequickbrownfoxjumps";
    // define matcher, any char, min length 1, max length 4
    Matcher matcher = Pattern.compile(".{1,4}").matcher(source);
    List<String> result = new ArrayList<>();
    while (matcher.find()) {
        result.add(source.substring(matcher.start(), matcher.end()));
    }
    String[] expected = {"Theq", "uick", "brow", "nfox", "jump", "s"};
    assertArrayEquals(result.toArray(), expected);
}

Answer 15

这是基于RegEx和Java 8流的我的版本。值得一提的是，自Java 9以来public static List<String> splitString(String input, int splitSize) { Matcher matcher = Pattern.compile("(?:(.{" + splitSize + "}))+?").matcher(input); return matcher.results().map(MatchResult::group).collect(Collectors.toList()); } @Test public void shouldSplitStringToEqualLengthParts() { String anyValidString = "Split me equally!"; String[] expectedTokens2 = {"Sp", "li", "t ", "me", " e", "qu", "al", "ly"}; String[] expectedTokens3 = {"Spl", "it ", "me ", "equ", "all"}; Assert.assertArrayEquals(expectedTokens2, splitString(anyValidString, 2).toArray()); Assert.assertArrayEquals(expectedTokens3, splitString(anyValidString, 3).toArray()); }方法可用。

包括测试。

Downloading and Extracting Packages:
keras_applications-1 |   45 KB | ############### | 100%
keras-2.2.0          |  444 KB | ############### | 100% 
keras-preprocessing- |   43 KB | ############### | 100% 
Preparing transaction: done
Verifying transaction: failed

Answer 16

我向@Alan Moore发表评论accepted solution如何处理换行符。他建议使用DOTALL。

根据他的建议，我创建了一个小样本：

public void regexDotAllExample() throws UnsupportedEncodingException {
    final String input = "The\nquick\nbrown\r\nfox\rjumps";
    final String regex = "(?<=\\G.{4})";

    Pattern splitByLengthPattern;
    String[] split;

    splitByLengthPattern = Pattern.compile(regex);
    split = splitByLengthPattern.split(input);
    System.out.println("---- Without DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is a single entry longer than the desired split size:
    ---- Without DOTALL ----
    [Idx: 0, length: 26] - [B@17cdc4a5
     */


    //DOTALL suggested in Alan Moores comment on SO: https://stackoverflow.com/a/3761521/1237974
    splitByLengthPattern = Pattern.compile(regex, Pattern.DOTALL);
    split = splitByLengthPattern.split(input);
    System.out.println("---- With DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is as desired 7 entries with each entry having a max length of 4:
    ---- With DOTALL ----
    [Idx: 0, length: 4] - [B@77b22abc
    [Idx: 1, length: 4] - [B@5213da08
    [Idx: 2, length: 4] - [B@154f6d51
    [Idx: 3, length: 4] - [B@1191ebc5
    [Idx: 4, length: 4] - [B@30ddb86
    [Idx: 5, length: 4] - [B@2c73bfb
    [Idx: 6, length: 2] - [B@6632dd29
     */

}

但我也喜欢https://stackoverflow.com/a/3760193/1237974中的@Jon Skeets解决方案。为了在大型项目中的可维护性，不是每个人都在正则表达式中有相同的经验，我可能会使用Jons解决方案。

Answer 17

public static String[] split(String input, int length) throws IllegalArgumentException {

    if(length == 0 || input == null)
        return new String[0];

    int lengthD = length * 2;

    int size = input.length();
    if(size == 0)
        return new String[0];

    int rep = (int) Math.ceil(size * 1d / length);

    ByteArrayInputStream stream = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_16LE));

    String[] out = new String[rep];
    byte[]  buf = new byte[lengthD];

    int d = 0;
    for (int i = 0; i < rep; i++) {

        try {
            d = stream.read(buf);
        } catch (IOException e) {
            e.printStackTrace();
        }

        if(d != lengthD)
        {
            out[i] = new String(buf,0,d, StandardCharsets.UTF_16LE);
            continue;
        }

        out[i] = new String(buf, StandardCharsets.UTF_16LE);
    }
    return out;
}

Answer 18

Java 8解决方案（类似于this，但更简单）：

public static List<String> partition(String string, int partSize) {
  List<String> parts = IntStream.range(0, string.length() / partSize)
    .mapToObj(i -> string.substring(i * partSize, (i + 1) * partSize))
    .collect(toList());
  if ((string.length() % partSize) != 0)
    parts.add(string.substring(string.length() / partSize * partSize));
  return parts;
}

Answer 19

    import static java.lang.System.exit;
   import java.util.Scanner;
   import Java.util.Arrays.*;


 public class string123 {

public static void main(String[] args) {


  Scanner sc=new Scanner(System.in);
    System.out.println("Enter String");
    String r=sc.nextLine();
    String[] s=new String[10];
    int len=r.length();
       System.out.println("Enter length Of Sub-string");
    int l=sc.nextInt();
    int last;
    int f=0;
    for(int i=0;;i++){
        last=(f+l);
            if((last)>=len) last=len;
        s[i]=r.substring(f,last);
     // System.out.println(s[i]);

      if (last==len)break;
       f=(f+l);
    } 
    System.out.print(Arrays.tostring(s));
    }}

结果

 Enter String
 Thequickbrownfoxjumps
 Enter length Of Sub-string
 4

 ["Theq","uick","brow","nfox","jump","s"]

Answer 20

StringBuilder版本：

public static List<String> getChunks(String s, int chunkSize)
{
 List<String> chunks = new ArrayList<>();
 StringBuilder sb = new StringBuilder(s);

while(!(sb.length() ==0)) 
{           
   chunks.add(sb.substring(0, chunkSize));
   sb.delete(0, chunkSize);

}
return chunks;

}

Answer 21

使用代码点处理所有字符

这是一个解决方案：

适用于所有 143,859 Unicode 个字符
如果您有进一步的逻辑要处理，则允许您检查或操作每个结果字符串。

要使用所有 Unicode 字符，请避免使用过时的 char 类型。并避免基于 char 的实用程序。而是使用 code point 整数。

调用 String#codePoints 以获取 IntStream 对象，即 int 值流。在下面的代码中，我们将这些 int 值收集到一个数组中。然后我们循环数组，对于每个整数，我们将分配给该数字的字符附加到我们的 StringBuilder 对象。每第 n 个字符，我们向主列表添加一个字符串，并清空 StringBuilder。

String input = "Thequickbrownfoxjumps";

int chunkSize = 4 ;
int[] codePoints = input.codePoints().toArray();  // `String#codePoints` returns an `IntStream`. Collect the elements of that stream into an array.
int initialCapacity = ( ( codePoints.length / chunkSize ) + 1 );
List < String > strings = new ArrayList <>( initialCapacity );

StringBuilder sb = new StringBuilder();
for ( int i = 0 ; i < codePoints.length ; i++ )
{
    sb.appendCodePoint( codePoints[ i ] );
    if ( 0 == ( ( i + 1 ) % chunkSize ) ) // Every nth code point.
    {
        strings.add( sb.toString() ); // Remember this iteration's value.
        sb.setLength( 0 ); // Clear the contents of the `StringBuilder` object.
    }
}
if ( sb.length() > 0 ) // If partial string leftover, save it too. Or not… just delete this `if` block.
{
    strings.add( sb.toString() ); // Remember last iteration's value.
}

System.out.println( "strings = " + strings );

<块引用>

strings = [Theq, uick, brow, nfox, jump, s]

这适用于非拉丁字符。在这里，我们将 q 替换为 FACE WITH MEDICAL MASK。

String text = "The?uickbrownfoxjumps"

<块引用>

strings = [The?, uick, brow, nfox, jump, s]

Answer 22

public static List<String> getSplittedString(String stringtoSplit,
            int length) {

        List<String> returnStringList = new ArrayList<String>(
                (stringtoSplit.length() + length - 1) / length);

        for (int start = 0; start < stringtoSplit.length(); start += length) {
            returnStringList.add(stringtoSplit.substring(start,
                    Math.min(stringtoSplit.length(), start + length)));
        }

        return returnStringList;
    }

在Java中将字符串拆分为相等长度的子字符串

22 个答案:

使用代码点处理所有字符