我试图在一个不那么小的文本文件中计算行数(多个MB)。我在这里找到的答案表明了这一点:
(Get-Content foo.txt | Measure-Object -Line).Lines
这样可行,但性能很差。我想整个文件都被加载到内存中而不是逐行流式传输。
我用Java创建了一个测试程序来比较性能:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Scanner;
import java.util.concurrent.TimeUnit;
import java.util.function.ToLongFunction;
import java.util.stream.Stream;
public class LineCounterPerformanceTest {
public static void main(final String... args) {
if (args.length > 0) {
final String path = args[0];
measure(LineCounterPerformanceTest::java, path);
measure(LineCounterPerformanceTest::powershell, path);
} else {
System.err.println("Missing path.");
System.exit(-1);
}
}
private static long java(final String path) throws IOException {
System.out.println("Java");
try (final Stream<String> lines = Files.lines(Paths.get(path))) {
return lines.count();
}
}
private static long powershell(final String path) throws IOException, InterruptedException {
System.out.println("Powershell");
final Process ps = new ProcessBuilder("powershell", String.format("(Get-Content '%s' | Measure-Object -Line).Lines", path)).start();
if (ps.waitFor(1, TimeUnit.MINUTES) && ps.exitValue() == 0) {
try (final Scanner scanner = new Scanner(ps.getInputStream())) {
return scanner.nextLong();
}
}
throw new IOException("Timeout or error.");
}
private static <T, U extends T> void measure(final ExceptionalToLongFunction<T> function, final U value) {
final long start = System.nanoTime();
final long result = function.unchecked().applyAsLong(value);
final long end = System.nanoTime();
System.out.printf("Result: %d%n", result);
System.out.printf("Elapsed time (ms): %,.6f%n%n", (end - start) / 1_000_000.);
}
@FunctionalInterface
private static interface ExceptionalToLongFunction<T> {
long applyAsLong(T value) throws Exception;
default ToLongFunction<T> unchecked() {
return (value) -> {
try {
return applyAsLong(value);
} catch (final Exception ex) {
throw new RuntimeException(ex);
}
};
}
}
}
普通Java解决方案的速度提高了约80倍。
是否有内置的方法来执行具有可比性能的任务?我在PowerShell 4.0上,如果重要的话。
答案 0 :(得分:4)
看看这是否比您当前的方法更快:
$count = 0
Get-Content foo.txt -ReadCount 2000 |
foreach { $Count += $_.count }
$count
答案 1 :(得分:1)
您可以将StreamReader用于此类事情。不确定它的速度与Java代码的比较,但我的理解是ReadLine方法一次只加载一行。
$StreamReader = New-Object System.IO.StreamReader($File)
$LineCount = 0
while ($StreamReader.ReadLine() -ne $null)
{
$LineCount++
}
$StreamReader.Close()
答案 2 :(得分:0)
对于具有900多个字符长度行的GB +文件,SWITCH更快。
$count = 0; switch -File $filepath {default { ++$count }}