Question

我有一个方法可以从.txt文件创建一个哈希表，并使用该哈希表为传递给Reducer的Value中的单词赋值。以下是我尝试这样做的方法：

@Override
public void setup(Context context) throws IOException {
    Path pt = new Path("hdfs:/user/jk/sentiwords.txt");
    FileSystem fs = FileSystem.get(new Configuration());
    BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(pt)));
    String line = br.readLine();
    while (line!=null) {
        String[] split =  line.split("\t");
        String word = split[0].substring(0, split[0].length() - 2);
        double score = Double.parseDouble(split[1]);
        int hashCode = word.hashCode();
        sentiTable.put(hashCode, score);
        line = br.readLine();
        System.out.println("Success");
    }
}

然后在此方法中使用它，在键/值对中的每个值上调用：

public double analyzeString(String str) {
    double stringScore = 0.0;
    String[] strArr = str.replaceAll("[^a-zA-Z ]", "").toLowerCase().split(" ");
    for (String segment: strArr) {
        int hashedSeg = segment.hashCode();

        if (sentiTable.containsKey(hashedSeg)) {
            double value = (double) sentiTable.get(hashedSeg);
            stringScore += value;
        }
    }
    return stringScore;
}

理想情况下，这应该返回介于-1和1之间的数字。实际上，它总是返回0.

编辑：

我应该注意，sentiTable是在类级别创建的。

Answer 1

结果为0可能意味着没有从此文件中读取任何内容。我看到两件可能出错的事情：

错误的路径：我认为hdfs路径应该以{{1}}开头，而不是hdfs://...。
Path和FileSystem的导入错误。确保您导入Hadoop提供的那些。

您始终可以在设置方法中打印消息，以查看是否找到了该文件。

额外：您可能想重新考虑包含检查，因为在大数据中使用字符串的hashCode时会出现很多冲突。

为什么我的Reducer不读取文件？

1 个答案: