找到文件中最常见的字节

时间:2014-04-12 17:01:10

标签: java arrays inputstream fileinputstream

所以基本上我有C:/180301.txt文件,其中我有1 1 2 2 3 4 5个数字\字节,输出(49)是正确的我认为。我的qustion是如何打印出所有重复次数的元素,现在我只有一个它的49

    private static ArrayList<Integer> list1 = new ArrayList<>();
    private static ArrayList<Integer> list2 = new ArrayList<>();
    public static void main(String[] args) throws Exception {
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
        FileInputStream fileReader = new FileInputStream(br.readLine());
        while (fileReader .available() > 0)
        {
            list1.add(fileReader .read());
        }
        int element = 0;
        int count = 0;
        for (int i = 0; i < list1.size(); i++)
        {
            if (same_element(list1.get(i)))
            {
                for (int j = 0; i < list1.size(); i++)
                {
                    if (list1.get(i).equals(list1.get(j)))
                    {
                        count++;
                        element = list1.get(j);
                        list2.add(list1.get(i));
                    }
                }
            }
        }
        if (count > 1)
            System.out.println(element);
        fileReader.close();
    }
    private static boolean same_element(int list_i) {
        for (Integer aList2 : list2) if (list_i == aList2) return false;
        return true;
    }
}

1 个答案:

答案 0 :(得分:1)

在same_element中你必须切换true和false。 49是因为你永远不会增加j。所以内循环也是错误的。但是它应该被删除,因为same_element现在应该做那个工作。第三个问题是,如果已经存在的值已经存在,则将最近的值仅添加到已经看到的值的列表中。那是不可能发生的。因此,通过一些轻微的重做,您的代码可能如下所示:

List<Integer> fromFile = new ArrayList<>();
InputStream fileReader = new ByteArrayInputStream("71123456".getBytes("utf-8"), 0, 8);
while (fileReader.available() > 0) 
{
  fromFile.add(fileReader.read());
}
int element = 0;
int count = 0;
List<Integer> seen = new ArrayList<>();
for (int i = 0; i < fromFile.size(); i++) 
{
  Integer recent = fromFile.get(i);
  if (seen.contains(recent)) 
  {
    count++;
    element = recent;
  }
  seen.add(recent);
}
if (count > 1) System.out.println(element);
fileReader.close();

这会打印最后一个副本,但仍然不是最常出现的字节。今天我们这样写:

Map<Byte, Integer> counters = new HashMap<>();
Path path = FileSystems.getDefault().getPath(args[0]);

// build a map with byte value as a key refering to a counter in the value
for (Byte b: Files.readAllBytes(path)) {
  Integer old = counters.get(b);
  counters.put(b, (old == null ? 1 : old + 1));
}

// create a comparator that orders Map.Entry objects by their value. I.E. the 
// occurences of the respective byte. The order is ascending.
Comparator<Entry<Byte, Integer>> byVal = Comparator.comparingInt(e -> e.getValue());

// create a stream of Map.Entry objects. The stream is a new concept of Java8. 
// That's somehow like a collection, but more powerful. While the collection
// stores data he stream has a focus on manipulating
counters.entrySet().stream()
      // Use the comaparator in reversed form. That means the number of
      // occurences is now descending
      .sorted(byVal.reversed())
      // only use the first Map.Entry. I.E. the one with most occurences
      // a similar functionality is by filter. 
      // .filter(e -> e.getValue() > 1) would use all duplicates
      .limit(1)
      // print out the results. Of course the argument for println can be 
      // concatenated from several parts like: 
      // e.getKey() +  "\tcount: " + e.getValue()
      .forEach(e -> System.out.println(e.getKey()));

Java 8对此类问题有很大帮助。与以前的版本编写相同的代码会占用相当多的代码。