Question

我正在尝试删除Hadoop中名为outputList的ArrayList中的重复字符串。

这是我的代码：

List<String> newList = new ArrayList<String>();

    for( String item : outputList){
      if(!newList.contains(item))
        newList.add(item);
      else newList.add("wrong");
    }

问题是newList中的字符串都是＆＃34;错误＆＃34;。

一些事实： 1.上述代码适用于本地机器。

我可以在hadoop的outputList中写出字符串。 outputList中的大多数字符串都不同（存在重复项）。
我尝试了一些其他方法来删除重复的项目。就像使用HashSet一样。但是当我使用outputList初始化HashSet时，获取的HashSet为空。
Hadoop中的java版本是javac 1.6.0_18

感谢。

以下是我的减速机代码：

public static class EditReducer 
       extends Reducer<Text,Text,Text,Text> {

    private Text editor2 = new Text();

    public void reduce(Text key, Iterable<Text> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      //write the content of iterable to an array list.

     List<String> editorList =new ArrayList<String>();
     for (Text t:values) {
      editorList.add(t.toString());

     }


    //if a user appears more than once in the list, add to outputList
     int occ;
     List<String> outputList =new ArrayList<String>();

     for (int i=0;i<editorList.size();i++) {

      occ= Collections.frequency(editorList, editorList.get(i));
      if(occ>1) {
        outputList.add(editorList.get(i));
      }
    }



    //make outputList distinct
   List<String> newList = new ArrayList<String>();

   for( String item : outputList){
      if(!newList.contains(item))
        newList.add(item);
      else newList.add("wrong");
    }

      for (String val : newList) {
        editor2.set(val);
        context.write(editor2,editor2); 
      }
    }

  }

Answer 1

您可以在原始for循环中创建嵌套的for循环，并按比例比较字符串：

List<String> newList = new ArrayList<String>();

    for(String item : outputList) {
        boolean contains = false;
        for(String str: newList) {
            if(str.equals(item)) {
                contains = true;
                break;
            }
        }
        if(!contains) {
            newList.add(item);
        } 
        else {
            newList.add("wrong");
        }
    }

hadoop中的Java ArrayList <string> .contains（）

1 个答案: