我必须对包含只有数字的长字符串(长度高达1百万字符)的大型数据集进行排序。 还将所有字符串仅视为大正数。
我修改了合并排序代码,如果字符串的长度在18以内,那么对于大型数据集(数组大小为200000)非常合适(因此我可以将其转换为长数进行比较)。
我还实现了一个逻辑,理论上应该适用于任何长度的字符串(数字串)。 但是我的代码中存在一些小问题,即不允许使用长(长度> 18)字符串对数组进行排序。 我在下面的代码块中添加了一个大写后面的评论。
注意:对于所有长度的数据集,代码在几秒钟内成功执行,并提供不正确的输出,如末尾所示。
以下是我的代码:
package algorithms;
import java.math.BigInteger;
import java.util.Scanner;
public class BigSort {
static void merge(String arr[], int l, int m, int r)
{
int n1 = m - l + 1;
int n2 = r - m;
String L[] = new String [n1];
String R[] = new String [n2];
for (int i=0; i<n1; ++i)
L[i] = arr[l + i];
for (int j=0; j<n2; ++j)
R[j] = arr[m + 1+ j];
int i = 0, j = 0;
int k = l;
while (i < n1 && j < n2){
if (L[i].length() <= R[j].length()){
if(L[i].length()<=18 && R[j].length() <=18) {
if(BigInteger.valueOf(Long.parseLong(L[i])).compareTo(BigInteger.valueOf(Long.parseLong(R[j]))) <=0){
//this will convert strings to numbers and compare them.
//I have used it just to possibly decrease load of-
//comparing each characters for sorting smaller strings.
arr[k] = L[i];
i++;
}else{
arr[k] = R[j];
j++;
}
}else{//THIS ELSE PART IS HAVING SOME PROBLEM.
//if length of string is greater than 18digits
//it will compare two string character by character to find
//the larger string or if they are equal.
char[] c1 = L[i].toCharArray();
char[] c2 = R[j].toCharArray();
int c1leng= c1.length;
int c2leng= c2.length;
//int shorter= c1leng < c2leng ? c1leng : c2leng ;
if(c1leng==c2leng){
for(int p=0; p<c1leng; p++){
if(c1[p]==c2[p]){
if(p == c1leng-1) {
arr[k] = L[i];
i++;
break;
}
continue;
}else if(c1[p]<c2[p]){
arr[k] = L[i];
i++;
break;
}else if(c1[p]>c2[p]){
arr[k] = R[j];
j++;
break;
}
}
}else{
arr[k] = R[j];
j++;
}
}
}else{
arr[k] = R[j];
j++;
}
k++;
}
while (i < n1){
arr[k] = L[i];
i++;
k++;
}
while (j < n2){
arr[k] = R[j];
j++;
k++;
}
}
static void sort(String arr[], int l, int r)
{
if (l < r){
int m = (l+r)/2;
sort(arr, l, m);
sort(arr , m+1, r);
merge(arr, l, m, r);
}
}
static String[] bigSorting(String[] arr) {
sort(arr, 0, arr.length-1);
return arr;
}
public static void main(String[] args){
Scanner in = new Scanner(System.in);
int n = in.nextInt();
String[] arr = new String[n];
for(int arr_i = 0; arr_i < n; arr_i++){
arr[arr_i] = in.next().trim();
}
System.out.println("result is:");
String[] result = bigSorting(arr);
for (int i = 0; i < result.length; i++) {
System.out.print(result[i] + (i != result.length - 1 ? "\n" : ""));
}
in.close();
}
}
这些是我正在使用的输入(第一行占用字符串数,然后跟随要排序的所有字符串。输出是每行中已排序的数字字符串):
input(1)
10
5454545454
212101225515
51212
5141215
52
521
52145
5
5
5
Output(1)//correct
5
5
5
52
521
51212
52145
5141215
5454545454
212101225515
Input(2)
10
5454545454
212101225515
51212
5141215
52
5465156165164215612616546954512202496421
2121564
216451564561564651564561256065
11
55
Output(2)//incorrect
11
52
55
216451564561564651564561256065
51212
2121564
5465156165164215612616546954512202496421
5141215
5454545454
212101225515
答案 0 :(得分:2)
您可以改为使用new BigInteger(String)
。
static void merge(String arr[], int l, int m, int r) {
int n1 = m - l + 1;
int n2 = r - m;
String L[] = new String[n1];
String R[] = new String[n2];
for (int i = 0; i < n1; ++i)
L[i] = arr[l + i];
for (int j = 0; j < n2; ++j)
R[j] = arr[m + 1 + j];
int i = 0, j = 0;
int k = l;
while (i < n1 && j < n2) {
if (L[i].length() <= R[j].length()) {
if (new BigInteger(L[i]).compareTo(new BigInteger(R[j])) <= 0) {
//this will convert strings to numbers and compare them.
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
} else {
arr[k] = R[j];
j++;
}
k++;
}
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
}
答案 1 :(得分:2)
也许我错过了什么,但这应该可以胜任:
class MyComparator implements Comparator<String>
{
@Override
public int compare(String s1,
String s2)
{
BigInteger i1;
BigInteger i2;
i1 = new BigInteger(s1);
i2 = new BigInteger(s2);
return (i1.compareTo(i2));
}
} // class MyComparator
String[] my_array;
...
Arrays.sort(my_array, new MyComparator());
答案 2 :(得分:0)
非常感谢所有人清除我对compareTo方法的误解。
但是在大数据集的情况下,compareTo方法不可行(在CPU使用率上花费了太多时间),这也是我实现这种手动排序代码的原因之一。 由于主要问题是在这个代码中(我想解决并现在解决)我现在接受我自己的答案。非常感谢@ MrSmith42和@OldCurmudgeon
else{//THIS ELSE PART IS HAVING SOME PROBLEM.
//if length of string is greater than 18digits
//it will compare two string character by character to find
//the larger string or if they are equal.
char[] c1 = L[i].toCharArray();
char[] c2 = R[j].toCharArray();
int c1leng= c1.length;
int c2leng= c2.length;
//int shorter= c1leng < c2leng ? c1leng : c2leng ;
if(c1leng==c2leng){
for(int p=0; p<c1leng; p++){
if(c1[p]==c2[p]){
if(p == c1leng-1) {
arr[k] = L[i];
i++;
break;
}
continue;
}else if(c1[p]<c2[p]){
arr[k] = L[i];
i++;
break;
}else if(c1[p]>c2[p]){
arr[k] = R[j];
j++;
break;
}
}
}else{
arr[k] = L[i]; //here was the problem. I was assigning R[j] instead of L[i] which was pushing larger elements alternatively.
i++;
}
}