Question

所以给定一个字符串如：0100101，我想返回一个1（1,5,6）位置的随机单个索引。

到目前为止，我正在使用：

protected int getRandomBirthIndex(String s) {
        ArrayList<Integer> birthIndicies = new ArrayList<Integer>();
        for (int i = 0; i < s.length(); i++) {
            if ((s.charAt(i) == '1')) {
                birthIndicies.add(i);
            }
        }
        return birthIndicies.get(Randomizer.nextInt(birthIndicies.size()));
    }

然而，它导致我的代码出现瓶颈（此方法占用了45％的CPU时间），因为字符串长度超过4000个字符。谁能想到更有效的方法呢？

Answer 1

您可以使用String.indexOf(int)查找每个1（而不是迭代每个字符）。我还希望编程到List界面并使用菱形运算符<>。像，

private static Random rand = new Random();
protected int getRandomBirthIndex(String s) {
    List<Integer> birthIndicies = new ArrayList<>();
    int index = s.indexOf('1');
    while (index > -1) {
        birthIndicies.add(index);
        index = s.indexOf('1', index + 1);
    }
    return birthIndicies.get(rand.nextInt(birthIndicies.size()));
}

最后，如果您需要多次执行此操作，请将List保存为字段并重新使用（而不是每次都计算索引）。例如memoization，

private static Random rand = new Random();
private static Map<String, List<Integer>> memo = new HashMap<>();

protected int getRandomBirthIndex(String s) {
    List<Integer> birthIndicies;
    if (!memo.containsKey(s)) {
        birthIndicies = new ArrayList<>();
        int index = s.indexOf('1');
        while (index > -1) {
            birthIndicies.add(index);
            index = s.indexOf('1', index + 1);
        }
        memo.put(s, birthIndicies);
    } else {
        birthIndicies = memo.get(s);
    }
    return birthIndicies.get(rand.nextInt(birthIndicies.size()));
}

Answer 2

嗯，一种方式是每次都通过基于字符串本身缓存列表来删除列表的创建，假设字符串的使用频率高于它们的更改。如果他们不是，那么缓存方法将无济于事。

缓存方法涉及的是一个包含以下内容的对象，而不仅仅是一个字符串：

当前字符串;
缓存字符串;和
基于缓存字符串的列表。

您可以为客户端提供一个函数，以便从给定的字符串创建这样的对象，并将字符串和缓存的字符串设置为传入的内容，然后计算列表。另一个函数将用于将当前字符串更改为其他字符串。

然后getRandomBirthIndex()函数接收此结构（而不是字符串）并遵循规则集：

如果当前和缓存的字符串不同，请将缓存的字符串设置为与当前字符串相同，然后根据该字符串重新计算列表。
在任何情况下，都会从列表中返回一个随机元素。

这样，如果列表很少更改，则可以避免在不需要的情况下进行昂贵的重新计算。

在伪代码中，类似这样的东西就足够了：

# Constructs fastie from string.
# Sets cached string to something other than
# that passed in (lazy list creation).

def fastie.constructor(string s):
    me.current = s
    me.cached = s + "!"

# Changes current string in fastie. No list update in
# case you change it again before needing an element.

def fastie.changeString(string s):
    me.current = s

# Get a random index, will recalculate list first but
# only if necessary. Empty list returns index of -1.

def fastie.getRandomBirthIndex()
    me.recalcListFromCached()
    if me.list.size() == 0:
        return -1
    return me.list[random(me.list.size())]

# Recalculates the list from the current string.
# Done on an as-needed basis.

def fastie.recalcListFromCached():
    if me.current != me.cached:
        me.cached = me.current
        me.list = empty
        for idx = 0 to me.cached.length() - 1 inclusive:
            if me.cached[idx] == '1':
                me.list.append(idx)

您还可以选择加快实际搜索1字符，例如，使用indexOf()使用底层Java库定位它们，而不是单独检查每个字符。代码（再次，伪代码）：

def fastie.recalcListFromCached():
    if me.current != me.cached:
        me.cached = me.current
        me.list = empty
        idx = me.cached.indexOf('1')
        while idx != -1:
            me.list.append(idx)
            idx = me.cached.indexOf('1', idx + 1)

即使不缓存值，也可以使用此方法。使用Java可能优化的字符串搜索代码比使用它自己更快。

但是，您应该记住，您在该代码中花费45％的时间的假设问题可能根本不是问题。这不是那么多花在那里的时间，因为它是绝对的时间量。

通过这个，我的意思是，如果它在0.001秒内完成（并且你不想每秒处理数千个字符串），那么在该函数中花费的时间百分比可能没什么区别。如果影响以某种方式对您的软件用户变得明显，您应该真正担心。否则，优化几乎是浪费精力。

Answer 3

如果您对1其中一个职位的单个索引感兴趣，并且假设您的输入中至少有一个1，则可以执行以下操作：

    String input = "0100101"; 
    final int n=input.length();
    Random generator = new Random();
    char c=0;
    int i=0;
    do{
        i = generator.nextInt(n);           
        c=input.charAt(i);
    }while(c!='1');
    System.out.println(i);

此解决方案速度快，不会消耗太多内存，例如1和0均匀分布时。正如@paxdiablo强调的那样，在某些情况下它可能表现不佳，例如当1稀缺时。

Answer 4

如果你的字符串很长和，你确定它包含很多1（或你正在寻找的字符串），它可能更快在字符串中随机“捅”直到找到你要找的东西。所以你节省了迭代字符串的时间：

String s = "0100101";
int index = ThreadLocalRandom.current().nextInt(s.length());

while(s.charAt(index) != '1') {
    System.out.println("got not a 1, trying again");
    index = ThreadLocalRandom.current().nextInt(s.length());
}
System.out.println("found: " + index + " - " + s.charAt(index));

我不确定统计数据，但很少有可能发生这种解决方案比迭代解决方案花费更长时间的情况。大小写是一个长字符串，只有很少出现的搜索字符串。

如果Source-String根本不包含搜索字符串，则此代码将永久运行！

Answer 5

您甚至可以使用最佳案例复杂度O(1)来尝试这种情况，在最坏的情况下，它可能会转到O(n)或纯粹最坏的情况可能是无限的，因为它纯粹取决于您Randomizer的功能正在使用。

private static Random rand = new Random(); 
protected int getRandomBirthIndex(String s) {
    List<Integer> birthIndicies = new ArrayList<>();
    int index = s.indexOf('1');
    while (index > -1) {
        birthIndicies.add(index);
        index = s.indexOf('1', index + 1);
    }
    return birthIndicies.get(rand.nextInt(birthIndicies.size())); 
}

Answer 6

一种可能性是使用短路的Fisher-Yates式洗牌。创建一个indices的数组并开始改变它。只要下一个洗牌元素指向一个，就返回该索引。如果您发现自己已经通过indices迭代而未找到一个，则此字符串仅包含零，因此返回-1。

如果字符串的长度始终相同，则数组indices可以是static，如下所示，并且不需要重新初始化新的调用。如果没有，您必须将indices的声明移动到方法中，并使用正确的索引集每次初始化它。下面的代码是针对长度为7的字符串编写的，例如0100101。

的示例

// delete this and uncomment below if string lengths vary
private static int[] indices = { 0, 1, 2, 3, 4, 5, 6 };

protected int getRandomBirthIndex(String s) {
   int tmp;
   /*
    * int[] indices = new int[s.length()];
    * for (int i = 0; i < s.length(); ++i) indices[i] = i;
    */
   for (int i = 0; i < s.length(); i++) {
      int j = randomizer.nextInt(indices.length - i) + i;
      if (j != i) {   // swap to shuffle
         tmp = indices[i];
         indices[i] = indices[j];
         indices[j] = tmp;
      }
      if ((s.charAt(indices[i]) == '1')) {
         return indices[i];
      }
   }
   return -1;
}

如果1是密集的，这种方法很快就会终止，即使没有任何1，也可以保证在s.length()次迭代后终止，并且返回的位置在整个集合中是均匀的1＆＃39; S

Java - 返回字符串中特定字符的随机索引

6 个答案: