Question

我有问题。我想创建一个基于IR系统的搜索引擎。所以，我有一些文件，我获取了我需要的信息，并将它们存储在HashMaps，TreeMaps，ArrayLists e.t.c等结构中。然后，我想在文件中写入此信息。所以，我同时打开2个FileWriters。但我在其中添加了越来越多的字符串。

但这个程序需要太长时间。我不知道为什么。当我将所有内容放入FileWriter时，我会通过close()关闭它。

您是否认为每次在缓冲区中添加新字符串时问题都是重新分配？

我应该遵循另一种打开缓冲区的策略，写入，关闭它，然后再次打开以在前一个数据的末尾写入？这会花费更少的时间吗？

P.S。：代码工作正如我想要的小输入文件。问题是当我使用大量的输入文件时。

public static void writeWordsandDfInFile(Map<String, Word> tmpMap) throws IOException
{
    Set tmpSet = tmpMap.entrySet();//Transform to Set for quick iteration  and printing
    Iterator tmpIt = tmpSet.iterator();
    String le3h=null;
    int bytesPostingFile;
    int bytesVocabularyFile;
    String str_out = null;
    String prev_str_out = null;
    String str_out2 = null;
    String str_tmp;
    String str_tmp2;
    String Tstrt;
    int prevctr=0;
    int flag=0;
    int i=0;
    int j;
    int k;
    int flag2;
    int flag3;
    int docId;
    //////////////////
    int SIZEDocumentsFileBytes;
    int prevInDocumentsFileBytes = 0;
    int newInDocumentsFileBytes = 0;
    int prwth_kataxwrhsh;
    int ctrPostingFileBytes=0;
    int prwthMonofora=0;



    giveWrdTakeBytePos=new HashMap<String,Integer>();//8a t dinw thn le3h kai 8a mou epistrefei thn 8esh se bytes mesa sto VocabularyFile.txt

    // Create file
    FileWriter fstream = new FileWriter(vocabularyFile.getPath());
    BufferedWriter out = new BufferedWriter(fstream);
    out.
    out.write("Le3h   Df   PosInPostingFile.txt\n\n");
    str_tmp=("Le3h   Df   PosInPostingFile.txt\n\n");

      // Create file
    FileWriter fstream2 = new FileWriter(postingFile.getPath());
    BufferedWriter out2 = new BufferedWriter(fstream2);
    out2.write("DocId  Tf  LineInFile       PosInDocumentsFile\n\n");
    str_tmp2=("DocId  Tf  LineInFile       PosInDocumentsFile\n\n");



    PostingFileBytes=new ArrayList<Integer>();//krataw ta bytes gia kaue eggrafh sto PostingFile



    flag=0;
    i=0;
    while(tmpIt.hasNext())
    {

         Map.Entry m = (Map.Entry) tmpIt.next();
         le3h=(String)m.getKey();

         Set s = tmpMap.get(le3h).getDocList().entrySet();
         Iterator it = s.iterator();
         Map.Entry mm =(Map.Entry)it.next();
         docId=(Integer)mm.getKey();


         Set ss=tmpMap.get(le3h).getDocList().keySet();

         Set stf=tmpMap.get(le3h).getTf().keySet();

         Iterator ssIt = ss.iterator();




         flag2=0;
         prwth_kataxwrhsh=0;
         while(ssIt.hasNext())
         {
            docId=(Integer)ssIt.next();

            out2.write(docId+"  "+tmpMap.get(le3h).getTf(docId));//grafw sto VocabularyFile.txt thn ka8e le3h kai to Df ths
            if(flag2==0)
            {
                str_out2=(docId+"  "+tmpMap.get(le3h).getTf(docId));
                flag2=1;
            }
            else
            {
                str_out2=(docId+"  "+tmpMap.get(le3h).getTf(docId));
            }



            flag3=0;
            Tstrt=null;
            for(k=0;k<tmpMap.get(le3h).ByteList.get(docId).size();k++)
            {
                out2.write("  "+tmpMap.get(le3h).ByteList.get(docId).get(k));

                if(flag3==0)
                {
                    Tstrt=("  "+tmpMap.get(le3h).ByteList.get(docId).get(k));
                    flag3=1;
                }
                else
                {
                    Tstrt=Tstrt+("  "+tmpMap.get(le3h).ByteList.get(docId).get(k));
                }

            }
            str_out2=str_out2+Tstrt;
            out2.write("  ->"+DocumentsFileBytes.get(docId)+"\n");
            str_out2=str_out2+("  ->"+DocumentsFileBytes.get(docId)+"\n");
            bytesPostingFile=str_out2.toString().length();

        ////////////////////////////////////////////////////////////////////////////////////////////////



            //................................................................................................................................
          SIZEDocumentsFileBytes=PostingFileBytes.size();

          if(prwthMonofora==0)
          {
            prevInDocumentsFileBytes=str_tmp2.toString().length();

            prwthMonofora=1;

            PostingFileBytes.add(prevInDocumentsFileBytes);
            ctrPostingFileBytes=0;//dld. parxei kataxwrish sthn 8esh 0 tou posting file
            newInDocumentsFileBytes=prevInDocumentsFileBytes + bytesPostingFile;
            //System.out.println("EPOMENH: "+newInDocumentsFileBytes);
          }
          else
          {
              if(prwth_kataxwrhsh==0)//gia ka8e le3h mono thn prwth fora kai as exei DF>1
              {
                    //System.out.println("Prohg. Timh:"+prevInDocumentsFileBytes);
                    prevInDocumentsFileBytes=newInDocumentsFileBytes;//apo prin
                    //System.out.println("BAZW: "+prevInDocumentsFileBytes);
                    PostingFileBytes.add(prevInDocumentsFileBytes);
                    ctrPostingFileBytes++;
                    prwth_kataxwrhsh=1;
              }
              else
              {
                prevInDocumentsFileBytes=newInDocumentsFileBytes;
              }
              newInDocumentsFileBytes=prevInDocumentsFileBytes + bytesPostingFile;
              //System.out.println("EPOMENH: "+newInDocumentsFileBytes);
          }


         }


         //------------------------------------------------------------------------------------------------------------------


         int ptr=ctrPostingFileBytes;

         out.write(le3h+"  "+tmpMap.get(le3h).getDf());//grafw sto VocabularyFile.txt thn ka8e le3h kai to Df ths

         out.write("  ->"+PostingFileBytes.get(ptr)+"\n");


           if(flag==0)//thn prwth fora
            {
               str_out=(le3h+"  "+tmpMap.get(le3h).getDf()+"  ->"+PostingFileBytes.get(ptr)+"\n");
               giveWrdTakeBytePos.put(le3h, str_tmp.toString().length());
               flag=1;
               prev_str_out=str_tmp+str_out;
            }
            else
            {
                giveWrdTakeBytePos.put(le3h, prev_str_out.toString().length());

                str_out=str_out+(le3h+"  "+tmpMap.get(le3h).getDf()+"  ->"+PostingFileBytes.get(ptr)+"\n");
                prev_str_out=prev_str_out+(le3h+"  "+tmpMap.get(le3h).getDf()+"  ->"+PostingFileBytes.get(ptr)+"\n");
            }

      //................................................................................................................................


    }

    //Close the output stream
    out.close();

    //Close the output stream
    out2.close();

}

Answer 1

从我所看到的你永远不会附加到文件但总是写新的。但是根据你上面的描述（没有阅读整个代码），你想将数据附加到文件中。

new FileWriter("path", true);

这对你有帮助吗？

另一个建议是删除文件写入并使用它：

public static void foo()
{
    // ...

    byte[] fifeMBByteAryOne = new byte[5242880];
    ByteArrayStream bStream = new ByteArrayStream(fifeMBByteAryOne);
    BufferedWriter out = new BufferedWriter(new OutputStreamWriter(bStream));
    byte[] fifeMBByteAryTwo = new byte[5242880];
    ByteArrayStream bStream2 = new ByteArrayStream(fifeMBByteAryTwo);
    BufferedWriter out2 = new BufferedWriter(new OutputStreamWriter(bStream2));

    // ...

}

private static class ByteArrayStream extends OutputStream {
    int index = 0;
    byte[] container;

    public ByteArrayStream(byte[] container) {
        this.container = container;
    }

    @Override
    public void write(int b) throws IOException {
        container[index++] = (byte)b;
    }

}

然后让它再次运行，看看需要多长时间。如果它和以前一样慢，那么文件不是你的问题。

在阅读完代码之后，我很确定你是java编程的学生或初学者，这很好，但你应该在你的问题中说明。此外，它会让人们给你建议，而不是直接解决你的问题。

你可以改进很多东西。第一个和我的观点非常重要：编码风格需要改进。真！关于如何编写变量（以小写字母开头）等方法的标准。使用它们。您使用的变量远远多于您需要的变量，您可以在方法的开头定义它们。您可以在不需要它们时使用集合和迭代器（例如

Set s = currentWord.getDocList().entrySet();
Iterator it = s.iterator();
Map.Entry mm = (Map.Entry) it.next();
docId = (Integer) mm.getKey();

然后你永远不会使用docId的值，但当然这个动作需要时间。

重写这种方法，这一次了解你做了什么，只做你需要的，当你需要的时候，就像现在一样，我不允许我公司的任何人将它用于顾客。

第二：当您将代码发布到互联网时，请务必发布直接编译的代码。我需要15分钟才能编译代码。周围的人很少有耐心。

第三：对于情境，你写的文本少于2MB，通常有用的是使用StringBuilder来构造整个文本并最终将其写成一个东西。这使调试更容易。

第四：在互联网上发布代码之前，一定要自己考虑问题并进行测试以解决问题。在这种情况下，您可以使用日期来执行此操作，只需编写如下文本：

// at the beginning of a loop
long startedAt = new Date().getTime();
// somewhen within the loop:
System.out.println("in situation X " + (new Date().getTime()-startedAt);

通过这种方式，您可以看到哪个步骤需要多长时间，然后可以开始优化该区域。

第五：如果在第四次之后仍有问题，请务必发布一小段代码，清楚地说明您的问题。不要依赖其他用户了解您的问题，向他们展示。通过使用您所要求的语言自我解释变量，方法，类名来简化它们。你的意见也一样。

第六：你应该做到这一切的原因是让你有能力自己解决你的问题，并向那些有长期技能的人询问那些值得花时间的问题。

祝你好运

Add in ArrayList <integer>需要太长时间（超过50000个节点）</integer>

1 个答案: