我如何计算每个文件的查询次数?

时间:2011-03-15 13:35:44

标签: java

for (a = 0; a < filename; a++) {

        try {
            System.out
                    .println(" _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  ");
            System.out.println("\n");
            System.out.println("The word inputted : " + word2);
            File file = new File(
                    "C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a
                            + ".txt");
            System.out.println(" _________________");

            System.out.print("| File = abc" + a + ".txt | \t\t \n");

            for (int i = 0; i < array2.length; i++) {

                totalCount = 0;
                wordCount = 0;

                Scanner s = new Scanner(file);
                {
                    while (s.hasNext()) {
                        totalCount++;
                        if (s.next().equals(array2[i]))
                            wordCount++;

                    }

                    System.out.print(array2[i] + " --> Word count =  "
                            + "\t " + "|" + wordCount + "|");
                    System.out.print("  Total count = " + "\t " + "|"
                            + totalCount + "|");
                    System.out.printf("  Term Frequency =  | %8.4f |",
                            (double) wordCount / totalCount);

                    System.out.println("\t ");

                    double inverseTF =  Math.log10((float) numDoc
                            / (numofDoc[i]));
                    System.out.println("    --> IDF = " + inverseTF );

                    double TFIDF = (((double) wordCount / totalCount) * inverseTF);
                    System.out.println("    --> TF/IDF = " + TFIDF + "\n");





                }
            }
        } catch (FileNotFoundException e) {
            System.out.println("File is not found");
        }

    }
}

这是我的代码,用于计算内部输入的每个查询的术语频率。 现在我想为每个文件计算每个查询频率。

示例输出:

此文件夹的文件数为:11 请输入查询: 你好吗 怎么 - &gt;包含此术语3的此文件数 是 - &gt;包含此术语的文件数量为7 你 - &gt;包含此术语7的文件数


输入的单词:你好吗


| File = abc0.txt |
怎么 - &gt; 字数 = | 4 |总计数= | 957 |期限频率= | 0.0042 |
     - &GT; IDF = 0.5642714398516419      - &GT; TF / IDF = 0.0023585013159943234

是 - &gt; 字数 = | 7 |总计数= | 957 |期限频率= | 0.0073 |
     - &GT; IDF = 0.1962946357308887      - &GT; TF / IDF = 0.00143580193324579

你 - &gt; 字数 = | 10 |总计数= | 957 |期限频率= | 0.0104 |
     - &GT; IDF = 0.1962946357308887      - &GT; TF / IDF = 0.002051145618922557

示例:总频率为4 + 7 + 10 = 21 ..


输入的单词:你好吗


| File = abc1.txt |
怎么 - &gt; 字数 = | 4 |总计数= | 959 |期限频率= | 0.0042 |
     - &GT; IDF = 0.5642714398516419      - &GT; TF / IDF = 0.0023535826479734803

是 - &gt; 字数 = | 7 |总计数= | 959 |期限频率= | 0.0073 |
     - &GT; IDF = 0.1962946357308887      - &GT; TF / IDF = 0.0014328075600794795

你 - &gt; 字数 = | 10 |总计数= | 959 |期限频率= | 0.0104 |
     - &GT; IDF = 0.1962946357308887      - &GT; TF / IDF = 0.002046867942970685

如何为每个文件总计3次查询 WORD COUNT

示例:总频率为4 + 7 + 10 = 21 ..

2 个答案:

答案 0 :(得分:0)

您需要将wordcount存储在一个数组中(对于每个文件),或者您可以将其添加到某个“sum”变量(在循环外部初始化)

答案 1 :(得分:0)

总金额必须在您的尝试之外。在尝试之前将其初始化并在之后打印。关于Java程序的设计有很多问题,我希望你也会考虑那些东西。对于时间的推移,可能应该是你所需要的:

for (a = 0; a < filename; a++) {
  int totalcount = 0;
  try{
    int wordcount = 0;
    for(...){
      ...
    }
    //print wordcount
    totalcount += wordcount;
  }catch(Exception e){
    ...
    return; //to ensure that there is no total count if something goes wrong.
  }
  //print totacount
}