我试图按照每个字符的出现次数对字符串进行排序,最开始时最频繁,最后最少。排序后,我需要删除所有字符重复。因为示例总是更清晰,所以程序应该执行以下操作:
String str = "aebbaaahhhhhhaabbbccdfffeegh";
String output = sortByCharacterOccurrencesAndTrim(str);
在这种情况下,< sortByCharacterOccurrencesAndTrim'方法应该返回:
String output = "habefcdg"
如果2个字符具有相同的出现次数,则它们在返回的字符串中的顺序并不重要。所以" habefcdg"也可以等于" habfecgd",因为两者都是' f'并且' e'发生3次,并且都是' d'并且' g'发生一次。
"habefcdg" would effectively be the same as "habfecgd"
注意: 我想在这种情况下指出性能很重要,所以我更倾向于采用最有效的方法。我这样说是因为字符串长度的范围从1到最大长度(我认为它与Integer.MAX_VALUE相同,但不确定),所以我想尽量减少任何潜在的瓶颈。
答案 0 :(得分:5)
“地图和几个while循环”当然是最简单的方法,而且可能会非常快。这个想法是:
for each character
increment its count in the map
Sort the map in descending order
Output the map keys in that order
但是100,000,000个地图查找可能会非常昂贵。您可以通过创建一个65,536整数计数(如果它是ASCII的128个字符)的数组来加速它。然后:
for each character
array[(int)ch] += 1
然后,您浏览该数组并创建一个非零计数字符的地图:
for i = 0 to 65535
if array[i] > 0
map.add((char)i, array[i])
然后按降序对地图进行排序,并按顺序输出字符。
这可能会表现得相当快,仅仅因为索引到一个阵列100,000,000次可能比进行100,000,000次地图查找要快得多。
答案 1 :(得分:4)
注意:这不是一个答案,只是通过Jim Mischel和Óscar López显示答案的效果测试代码(并行流以响应comment by OP)。
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.function.Function;
import java.util.stream.Collectors;
public class Test {
public static void main(String[] args) {
long start = System.currentTimeMillis();
String s = buildString();
System.out.println("buildString: " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
String result1 = testUsingArray(s);
System.out.println("testUsingArray: " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
String result2 = testUsingMap(s);
System.out.println("testUsingMap: " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
String result3 = testUsingStream(s);
System.out.println("testUsingStream: " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
String result4 = testUsingParallelStream(s);
System.out.println("testUsingParallelStream: " + (System.currentTimeMillis() - start) + "ms");
System.out.println(result1);
System.out.println(result2);
System.out.println(result3);
System.out.println(result4);
}
private static String buildString() {
Random rnd = new Random();
char[] buf = new char[100_000_000];
for (int i = 0; i < buf.length; i++)
buf[i] = (char)(rnd.nextInt(127 - 33) + 33);
return new String(buf);
}
private static String testUsingArray(String s) {
int[] count = new int[65536];
for (int i = 0; i < s.length(); i++)
count[s.charAt(i)]++;
List<CharCount> list = new ArrayList<>();
for (int i = 0; i < 65536; i++)
if (count[i] != 0)
list.add(new CharCount((char)i, count[i]));
Collections.sort(list);
char[] buf = new char[list.size()];
for (int i = 0; i < buf.length; i++)
buf[i] = list.get(i).ch;
return new String(buf);
}
private static String testUsingMap(String s) {
Map<Character, CharCount> map = new HashMap<>();
for (int i = 0; i < s.length(); i++)
map.computeIfAbsent(s.charAt(i), CharCount::new).count++;
List<CharCount> list = new ArrayList<>(map.values());
Collections.sort(list);
char[] buf = new char[list.size()];
for (int i = 0; i < buf.length; i++)
buf[i] = list.get(i).ch;
return new String(buf);
}
private static String testUsingStream(String s) {
int[] output = s.codePoints()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.sorted(Map.Entry.<Integer, Long>comparingByValue().reversed())
.mapToInt(Map.Entry::getKey)
.toArray();
return new String(output, 0, output.length);
}
private static String testUsingParallelStream(String s) {
int[] output = s.codePoints()
.parallel()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.parallelStream()
.sorted(Map.Entry.<Integer, Long>comparingByValue().reversed())
.mapToInt(Map.Entry::getKey)
.toArray();
return new String(output, 0, output.length);
}
}
class CharCount implements Comparable<CharCount> {
final char ch;
int count;
CharCount(char ch) {
this.ch = ch;
}
CharCount(char ch, int count) {
this.ch = ch;
this.count = count;
}
@Override
public int compareTo(CharCount that) {
return Integer.compare(that.count, this.count); // descending
}
}
示例输出
buildString: 974ms
testUsingArray: 48ms
testUsingMap: 216ms
testUsingStream: 1279ms
testUsingParallelStream: 442ms
UOMP<FV{KHt`(-q6;Gl'R9nxy+.Y[=2a7^45v?E@e,>|AD_\ILpJ}8sow"Z&bCmNW1$!Sd0c]~g3BjX#fz:Q*Tkui%/r)h
UOMP<FV{KHt`(-q6;Gl'R9nxy+.Y[=2a7^45v?E@e,>|AD_\ILpJ}8sow"Z&bCmNW1$!Sd0c]~g3BjX#fz:Q*Tkui%/r)h
UOMP<FV{KHt`(-q6;Gl'R9nxy+.Y[=2a7^45v?E@e,>|AD_\ILpJ}8sow"Z&bCmNW1$!Sd0c]~g3BjX#fz:Q*Tkui%/r)h
UOMP<FV{KHt`(-q6;Gl'R9nxy+.Y[=2a7^45v?E@e,>|AD_\ILpJ}8sow"Z&bCmNW1$!Sd0c]~g3BjX#fz:Q*Tkui%/r)h
答案 2 :(得分:3)
只是为了好玩(而且我并没有声称这是最有效的解决方案):一些Java 8 lambdas +并行流怎么样?
public String sortByCharacterOccurrencesAndTrim(String str) {
// build a frequency map, for each code point store its count
Map<Integer, Long> frequencies =
str.codePoints()
.parallel()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
// sort by descending frequency and collect code points into array
int[] output =
frequencies.entrySet()
.parallelStream()
.sorted(Map.Entry.<Integer, Long>comparingByValue().reversed())
.mapToInt(Map.Entry::getKey)
.toArray();
// create output string from code point array
return new String(output, 0, output.length);
}
如果你想要一个超级高效的解决方案,你可以使用显式循环重写上述算法,但这是很多代码而且对我来说已经很晚了:)。然而,这个想法将是相同的:构建一个char频率图,按频率按降序排序,并用字符构建一个字符串。
答案 3 :(得分:-1)
我对流和lambdas一无所知,但我会这样做:
result <- merge(df, labels, by="D")[, union(names(df), names(labels))]
计算出现次数。然后,它只是在事后从最高到最低排序。