Java 8有一种从文件行创建Stream的方法。在这种情况下,foreach将逐步执行。我有一个格式如下的文本文件..
bunch of lines with text
$$$$
bunch of lines with text
$$$$
我需要将$$$$
之前的每一行都放到Stream中的单个元素中。
换句话说,我需要一串字符串。每个字符串都包含$$$$
之前的内容。
执行此操作的最佳方式(最小开销)是什么?
答案 0 :(得分:2)
我无法想出一个懒洋洋地处理这些行的解决方案。我不确定这是否可行。
我的解决方案产生ArrayList
。如果您必须使用Stream
,只需在其上调用stream()
。
public class DelimitedFile {
public static void main(String[] args) throws IOException {
List<String> lines = lines(Paths.get("delimited.txt"), "$$$$");
for (int i = 0; i < lines.size(); i++) {
System.out.printf("%d:%n%s%n", i, lines.get(i));
}
}
public static List<String> lines(Path path, String delimiter) throws IOException {
return Files.lines(path)
.collect(ArrayList::new, new BiConsumer<ArrayList<String>, String>() {
boolean add = true;
@Override
public void accept(ArrayList<String> lines, String line) {
if (delimiter.equals(line)) {
add = true;
} else {
if (add) {
lines.add(line);
add = false;
} else {
int i = lines.size() - 1;
lines.set(i, lines.get(i) + '\n' + line);
}
}
}
}, ArrayList::addAll);
}
}
文件内容:
bunch of lines with text bunch of lines with text2 bunch of lines with text3 $$$$ 2bunch of lines with text 2bunch of lines with text2 $$$$ 3bunch of lines with text 3bunch of lines with text2 3bunch of lines with text3 3bunch of lines with text4 $$$$
输出:
0: bunch of lines with text bunch of lines with text2 bunch of lines with text3 1: 2bunch of lines with text 2bunch of lines with text2 2: 3bunch of lines with text 3bunch of lines with text2 3bunch of lines with text3 3bunch of lines with text4
修改强>
我终于想出了一个懒惰地生成Stream
的解决方案:
public static Stream<String> lines(Path path, String delimiter) throws IOException {
Stream<String> lines = Files.lines(path);
Iterator<String> iterator = lines.iterator();
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<String>() {
String nextLine;
@Override
public boolean hasNext() {
if (nextLine != null) {
return true;
}
while (iterator.hasNext()) {
String line = iterator.next();
if (!delimiter.equals(line)) {
nextLine = line;
return true;
}
}
lines.close();
return false;
}
@Override
public String next() {
if (!hasNext()) {
throw new NoSuchElementException();
}
StringBuilder sb = new StringBuilder(nextLine);
nextLine = null;
while (iterator.hasNext()) {
String line = iterator.next();
if (delimiter.equals(line)) {
break;
}
sb.append('\n').append(line);
}
return sb.toString();
}
}, Spliterator.ORDERED | Spliterator.NONNULL | Spliterator.IMMUTABLE), false);
}
这实际上/恰巧与BufferedReader.lines()
的实现非常相似(Files.lines(Path)
在内部使用)。不使用这两种方法可能会减少开销,而是直接使用Files.newBufferedReader(Path)
和BufferedReader.readLine()
。
答案 1 :(得分:0)
你可以尝试
List<String> list = new ArrayList<>();
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
list = stream
.filter(line -> !line.equals("$$$$"))
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
}
答案 2 :(得分:0)
已经存在类似的较短答案,但是type.safe如下,没有额外的状态:
Path path = Paths.get("... .txt");
try {
List<StringBuilder> glist = Files.lines(path, StandardCharsets.UTF_8)
.collect(() -> new ArrayList<StringBuilder>(),
(list, line) -> {
if (list.isEmpty() || list.get(list.size() - 1).toString().endsWith("$$$$\n")) {
list.add(new StringBuilder());
}
list.get(list.size() - 1).append(line).append('\n');
},
(list1, list2) -> {
if (!list1.isEmpty() && !list1.get(list1.size() - 1).toString().endsWith("$$$$\n")
&& !list2.isEmpty()) {
// Merge last of list1 and first of list2:
list1.get(list1.size() - 1).append(list2.remove(0).toString());
}
list1.addAll(list2);
});
glist.forEach(sb -> System.out.printf("------------------%n%s%n", sb));
} catch (IOException ex) {
Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
}
而不是.endsWith("$$$$\n")
,最好这样做:
.matches("(^|\n)\\$\\$\\$\\$\n")
答案 3 :(得分:0)
此处基于this previous work的解决方案:
public class ChunkSpliterator extends Spliterators.AbstractSpliterator<List<String>> {
private final Spliterator<String> source;
private final Predicate<String> delimiter;
private final Consumer<String> getChunk;
private List<String> current;
ChunkSpliterator(Spliterator<String> lineSpliterator, Predicate<String> mark) {
super(lineSpliterator.estimateSize(), ORDERED|NONNULL);
source=lineSpliterator;
delimiter=mark;
getChunk=s -> {
if(current==null) current=new ArrayList<>();
current.add(s);
};
}
public boolean tryAdvance(Consumer<? super List<String>> action) {
while(current==null || !delimiter.test(current.get(current.size()-1)))
if(!source.tryAdvance(getChunk)) return lastChunk(action);
current.remove(current.size()-1);
action.accept(current);
current=null;
return true;
}
private boolean lastChunk(Consumer<? super List<String>> action) {
if(current==null) return false;
action.accept(current);
current=null;
return true;
}
public static Stream<List<String>> toChunks(
Stream<String> lines, Predicate<String> splitAt, boolean parallel) {
return StreamSupport.stream(
new ChunkSpliterator(lines.spliterator(), splitAt),
parallel);
}
}
你可以使用
try(Stream<String> lines=Files.lines(pathToYourFile)) {
ChunkSpliterator.toChunks(
lines,
Pattern.compile("^\\Q$$$$\\E$").asPredicate(),
false)
/* chain your stream operations, e.g.
.forEach(s -> { s.forEach(System.out::print); System.out.println(); })
*/;
}
答案 4 :(得分:0)
您可以使用Scanner
作为迭代器并从中创建流:
private static Stream<String> recordStreamOf(Readable source) {
Scanner scanner = new Scanner(source);
scanner.useDelimiter("$$$$");
return StreamSupport
.stream(Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED | Spliterator.NONNULL), false)
.onClose(scanner::close);
}
这将保留块中的换行符以进行进一步过滤或拆分。