我正在尝试比较Perl和Java Hash表的性能。在Perl中,使用哈希和计数100K字的单词数
的Perl:
%words = ();
open FILE, "<", "bigfile" or die "Cannot open file: $!\n";
while(my $line = <FILE>){
chomp( $line );
$line =~ s/[[:punct:]]//g;
my @words = split /\n|\s+/, $line;
foreach my $w (@words){
$words{$w}++;
}
}
close FILE ;
for my $key ( sort( keys %words ) ) {
print "$key : $words{ $key } \n";
}
在Java中:
Map<String, Integer> wordsMap = new HashMap<String, Integer>();
try{
Scanner sc = new Scanner( new File( "bigfile") );
while( sc.hasNextLine() ){
String input = sc.nextLine();
input = input.replaceAll( System.lineSeparator() , " " );
String[] inputArray = input.split("\\s+");
for(int i=0; i< inputArray.length ; i++ ){
String r = inputArray[i].replaceAll("\\p{Punct}|[^\\p{ASCII}]+", "");
if ( wordsMap.containsKey( r )){
int count = wordsMap.get( r );
wordsMap.put( r , count + 1 );
}else {
wordsMap.put( r, 1);
}
}
}
}catch(FileNotFoundException fnf ){
fnf.printStackTrace();
}
Set <String> keys = wordsMap.keySet();
TreeSet<String> sortedKeys = new TreeSet<String>(keys);
for( String key: sortedKeys){
System.out.printf("%-10s%10s\n" , key, wordsMap.get(key) );
}
当我运行上述2版本时,Perl似乎跑得更快。我在某地阅读Java Hash与Perl不同。有没有办法优化Java版本?
我如何使用Linux时间对两者进行计时。
#> time perl count.pl
real 0m0.316s
user 0m0.236s
sys 0m0.018s
#> time java count
real 0m1.434s
user 0m1.856s
sys 0m0.181s
答案 0 :(得分:1)
chomps
行分隔符。java.util.regex.Pattern.compile
)!当然,Perl就是这样做的。