Question

我在大学获得了一项任务，包括将PDF文档有效地存储在PDF存储中并且只能存储一次（通过多次上传同一文件不会重复内容）。

该方法是以下store(String title, File pdfFile)

示例1：

"Fast Cars", fastcars.pdf
"Even Faster Cars", fastcars.pdf
"Not So Fast Cars", cars.pdf
"Slow Cars", slowcars.pdf

预期结果：大小应为3，其中包含以下fastcars.pdf, cars.pdf and slowcars.pdf

示例2：

"Fast Cars", fastcars.pdf
"Even Faster Cars", fastcars.pdf
"Fast Cars", sportscars.pdf
"Even Faster Cars", sportscars.pdf

它的大小应为1，且只包含sportscars.pdf

我的想法是内容哈希pdf，并可能使用HashMap映射内容摘要哈希与随机整数，然后将其映射到PDF标题？

棘手的部分是试图满足例2。

您会为此问题推荐哪种数据结构以提高效率以及采用何种方法？

提前致谢

Answer 1

我拿了控制台输入..

<强>测试用例＃1 I / P：

  FastCars fastcars.pdf
  EvenFasterCars fastcars.pdf
  NotSoFastCars cars.pdf
  SlowCars slowcars.pdf

o / p：

slowcars.pdf
 fastcars.pdf
 cars.pdf

testcase＃2

i / p：

 FastCars fastcars.pdf
 EvenFasterCars fastcars.pdf
 FastCars sportscars.pdf
 EvenFasterCars sportscars.pdf

o / p：

  sportscars.pdf

public static void main（String [] args）抛出异常{

        Map<String,String> map1=new HashMap<String,String>();
        Map<String,String> map2=new HashMap<String,String>();


        BufferedReader br=new BufferedReader(new InputStreamReader(System.in));

        for(int i=0;i<4;i++)
        {
            String inpt[]=br.readLine().split(" ");
            String tag=inpt[0];
            String fileName=inpt[1];
            map1.put(tag,fileName);
            map2.put(fileName, tag);
        }

        Set<String> keySet=map1.keySet();
        Iterator it=keySet.iterator();
        while(it.hasNext())
        {
            String key=(String)it.next();
            if(map2.containsKey(map1.get(key)))
            {
                System.out.println(map1.get(key));
                map2.remove(map1.get(key));
            }
        }


    }

Answer 2

每个符合标准的PDF文件都有唯一的ID作为其元数据的一部分。您可能只想将该字符串用作文件名。大多数PDF库工具允许轻松访问此元数据。

最适合独特PDF上传的数据结构

2 个答案: