Question

假设我们有两个逗号分隔值的日志文件。 file1.txt代表employee id和employee name，file2.txt代表与employee id相关联的projects和file1。 file2有独特的条目。 file2.txt会有很多关系。如果新员工没有分配任何项目，则File1.txt:(EmpId, EmpName) 1,abc 2,ac 3,bc 4,acc 5,abb 6,bbc 7,aac 8,aba 9,aaa File2.txt: (EmpId, ProjectId) 1,102 2,102 1,103 3,101 5,102 1,103 2,105 2,200 9,102 Find the each employee has been assigned to number of projects. For New employees if they dont have any projects print 0; Output: 1=3 2=3 3=1 4=0 5=1 6=0 7=0 8=0 9=1中没有任何条目。

file1

我使用BufferedReader从file2读取一行，并将其与public static void main(String[] args) throws IOException { // TODO Auto-generated method stub BufferedReader file1 = new BufferedReader(new FileReader("file1.txt")); BufferedReader file2 = new BufferedReader(new FileReader("file2.txt")); BufferedReader file3 = new BufferedReader(new FileReader("file2.txt")); HashMap<String,Integer> empProjCount = new HashMap<String, Integer>(); int lines =0; while (file2.readLine() != null) lines++; String line1 = file1.readLine(); String[] line_1 = line1.split(","); String line2 = file3.readLine(); String[] line_2 = line2.split(","); while(line1 != null && line2 != null) { int count = 0; for(int i=1;i<=lines+1 && line2 != null;i++) { if(line_1[0].equals(line_2[0])) { count++; } line2 = file3.readLine(); if(line2 != null){ line_2 = line2.split(","); } } file3 = new BufferedReader(new FileReader("file2.txt")); empProjCount.put(line_1[0], count); line1 = file1.readLine(); if(line1 != null) line_1 = line1.split(","); line2 = file3.readLine(); if(line2 != null) line_2 = line2.split(","); } System.out.println(empProjCount);中的每一行进行比较。以下是我的代码，

file2.txt

我的问题是，

有没有办法优化它而不是O（n ^ 2），而不使用任何额外的空间？
我使用3个BufferedReader来读取res.locals，因为一旦我们读到一行，它就会移到下一行。是否还有其他选项来标记当前行？
如果我们将此视为表格，查询上述方案的最佳方法是什么？

Answer 1

1：是的。

2：是的：

我会在两次迭代中完成：

迭代ID（file1）并初始化地图（empId，projectCounter）
迭代项目（file2）和每行更新（projectCounter ++）地图中的相应条目。

通过这种方式，您将拥有几乎线性的执行时间（对于file1和file2大小）。

Answer 2

从Map收集所有员工ID的file 1，并将其初始化为包含0项目计数。

    // Build my map of all employees.
    Map<Integer, Integer> employeeProjectCount = Arrays.stream(file1)
            // Get empId - Split on comma, take the first field and convert to integer.
            .map(s -> Integer.valueOf(s.split(",")[0]))
            // Build a Map for the results.
            .collect(Collectors.toMap(
                    // Key is emp ID.
                    empId -> empId,
                    // Value starts at zero.
                    empId -> ZERO
            ));

遍历file 2计算项目。

    // Walk the projects list.
    Arrays.stream(file2)
            // Get empId - Split on comma, take the first field and convert to integer (again).
            .map(s -> Integer.valueOf(s.split(",")[0]))
            // Count the projects.
            .forEach(empId -> employeeProjectCount.put(empId, employeeProjectCount.get(empId)+1));

打印它：

    // Print it.
    System.out.println(employeeProjectCount);

给出

{1 = 3 = 2 = 3,3 = 1,4 = 0,5 = 1,6 = 0,7 = 0,8 = 0,9 = 1}

BTW：我使用String[] s。

这些文件

String[] file1 = {
        "1,abc",
        "2,ac",
        "3,bc",
        "4,acc",
        "5,abb",
        "6,bbc",
        "7,aac",
        "8,aba",
        "9,aaa",};
String[] file2 = {
        "1,102",
        "2,102",
        "1,103",
        "3,101",
        "5,102",
        "1,103",
        "2,105",
        "2,200",
        "9,102",
};

Answer 3

使用Files.lines和正则表达式：

Pattern employeePattern = Pattern.compile("(?<id>\\d+),(?<name>\\s+)");
Set<String> employees = Files.lines(Paths.get("file1.txt"));
    .map(employeePattern::matcher).filter(Matcher::matches)
    .map(m -> m.group("id")).collect(Collectors.toSet());

Pattern projectPattern = Pattern.compile("(?<emp>\\d+),(?<proj>\\d+)");
Map<String,Long> projects = Files.lines(Paths.get("file2.txt"))
    .map(projectPattern::matcher).filter(Matcher::matches)
    .collect(Collectors.groupingBy(m -> m.group("emp"), Collectors.counting());

打印结果：

employees.stream()
    .map(emp -> emp + "=" + projects.getOrDefault(emp, 0L))
    .forEach(System.out::println);

解析两个文件并生成员工数据

3 个答案: