Question

我有很多文件要逐行阅读。每行包含一个url，后跟一个时间戳，后跟一些标记

我有一个名为Link的类，它解析每一行并提供静态方法来获取

Link::url
Link::timestamp
Link::tags  where this returns a List of tagstrings

网址可以与标签一起复制到文件中。我需要读取所有文件中的行，收集每个URL的标记并消除重复项然后将结果以url tag1，tag2，tag3

格式写入输出文件

我能够使用map / reduce在Java 7中执行此操作，但无法弄清楚如何使用lambda表达式执行此操作。我被告知可以用一行代码完成吗？

这就是我所拥有的。过滤器后我卡住了。我想我想要做的是创建一个带有键的映射，一个是TreeMap，TreeMap将包含所有唯一标记。我只是不知道怎么写这个任何帮助将不胜感激。

public static void tagUnion() throws Exception {   
    Stream<Path> fstream = Files.list(Paths.get(indir));
    fstream.forEach(path -> {
        Stream<String> lines;
        try (Stream<String> entry = Files.lines(path)) {
            entry
            .filter(s -> !s.isEmpty())
            .map(Link::parse)
            .filter(map -> inDate(map.timestamp()));
            // this is where I’m stuck
        } catch (IOException e) {
            e.printStackTrace();
        }
    });
}

Answer 1

我建议改用Stream::flatMap。此方法将流中的每个对象映射到不同的流，所有流都是相同的类型，并将它们组合成一个可以继续处理的流。例如：

Files.list(somePath)
        .flatMap(Files::lines)
        .filter(s -> !s.isEmpty())
        .map(Link::parse)
        .filter(map -> inDate(map.timestamp()));

现在要做的就是要求编写一个方法来处理链接并将其解析为你想要的行。

最后，要将一个字符串流收集到一个带分隔符的字符串中（可以是换行符或逗号），有一种方法：

String csvLine = stream.collect(Collectors.joining(",");

Answer 2

我不确定这里是否有足够的信息可以自信地回答你的问题，但无论如何这里都是一个刺刀。

鉴于你有类似的东西：

@FunctionalInterface
interface IOFunction<T, R>
{
  R apply(T t) throws IOException;

  public static <T, R> Function<T, R> unchecked(IOFunction<T, R> f)
  {
    return v -> {
      try {
        return f.apply(v);
      } catch (IOException e) {
        throw new UncheckedIOException(e);
      }
    };
  }
}

你可以用这样的东西得到你想要的东西：

  public static Map<String, Set<String>> tagUnion(String indir)
      throws IOException {
    try (Stream<Path> fstream = Files.list(Paths.get(indir))) {
      return fstream
          .flatMap(IOFunction.unchecked(Files::lines))
          .filter(s -> !s.isEmpty())
          .map(Link::parse)
          .filter(link -> inDate(link.timestamp()))
          .collect(Collectors.toMap(Link::url, link -> new TreeSet<>(link.tags())));
    } catch (UncheckedIOException e) {
      throw e.getCause();
    }
  }

这里的复杂性是Files.lines(...)抛出了一个已检查的IOException，它阻止了它直接在流管道中使用。

好的，根据您的评论，您需要groupingBy(...)操作。要将一堆List<String>的内容收集到Set<String>中，需要更多代码。

  return fstream
      .flatMap(IOFunction.unchecked(Files::lines))
      .filter(s -> !s.isEmpty())
      .map(Link::parse)
      .filter(link -> inDate(link.timestamp()))
      .collect(Collectors.groupingBy(Link::url,
          Collectors.mapping(Link::tags,
              Collector.of(
                  () -> new TreeSet<>(),
                  (s, l) -> s.addAll(l),
                  (s1, s2) -> {
                    s1.addAll(s2);
                    return s1;
                  }))));

对于Java 9，这可以简化为：

  return fstream
      .flatMap(IOFunction.unchecked(Files::lines))
      .filter(s -> !s.isEmpty())
      .map(Link::parse)
      .filter(link -> inDate(link.timestamp()))
      .collect(Collectors.groupingBy(Link::url,
          Collectors.flatMapping(link -> link.tags().stream(), Collectors.toSet())));

Answer 3

感谢您的帮助。我能够使用TreeMap以不同的方式解决问题

    // create array of files in the directory
    // make sure the files are json files only
    File[] files = new File(indir).listFiles(new FileFilter() {
        @Override
        public boolean accept(File pathname) {
            //System.out.println(pathname.getName());
            return pathname.getName().toLowerCase().endsWith(".json");
        }
    });

    // exit if no json were found
    if (files.length == 0) {
        System.out.println("No JSON files found in directory " + indir);
        System.exit(0);
    }

    // map each line to a String(url), Set(tags)
    Map<String, Set<String>> tagMap = new TreeMap<>();


            lines.filter(s -> !s.isEmpty())
                    .map(Link::parse).forEach(l -> {
                    HashSet hs = new HashSet(l.tags());
                    if (tagMap.containsKey(l.url())) {
                        tagMap.get(l.url()).addAll(hs);
                    } else {
                        tagMap.put(l.url(), hs);
                    }
            });
        }

    }


        // write the output to the specified file
        writeOutput(tagMap, false);

需要使用lambda

3 个答案: