Question

我的hadoop映射器逐行发出csv作为文本对象，如下所示：

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

    String dataRows[] = value.toString().trim().split("\\r?\\n");

    for (int i = 0; i < dataRows.length; i++) {
        Random r = new Random();
        int partition = r.nextInt(3);
        HC.set(Integer.toString(partition));
        Data.set(dataRows[i]);
        context.write(HC, Data);
    }
}

在我的reducer中，我需要拆分csv并使用字符串进行一些进一步的操作。这是reducer代码：

public static class IntSumReducer extends Reducer<Text, Text, Text, Text> {
    private Text data = new Text();

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        List<Coord> coords = new ArrayList<Coord>();
        Iterator<Text> iter = values.iterator();
        while (iter.hasNext()) {
            String[] elems = iter.next().toString().split(",");
            double[] x = new double[elems.length];
            try {
                for (int i = 0; i < x.length; i++) {
                    x[i] = Integer.parseInt(elems[i]);
                }
            } catch (Exception e) {
                continue;
            }
            coords.add(new Coord(x));
        }
        try {
            Cluster cluster = runClusterer(coords);
            data.set(cluster.toNewick()+";");
            context.write(key, data);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

这是奇怪的事情，elems字符串数组的长度为1，只包含我的CSV每一行的左边元素。

例如 - 假设我的CSV包含两行。第一行 - {1,2}和第二行{3,4}

elems数组将填充为{1,3}。

感谢任何帮助。

Hadoop文本对象toString（）拆分问题

0 个答案: