我拆分时在ArrayList中重复

时间:2015-03-31 12:05:26

标签: java arraylist split

我有ArrayList Dico,我尝试将其拆分为多个ArrayLists,但这会导致一些重复。

这是Dico课程:

public class Dico implements Comparable {
    private final String m_term;
    private double m_weight;
    private final int m_Id_doc;

    public Dico(int Id_Doc, String Term, double tf_ief) {
        this.m_Id_doc = Id_Doc;
        this.m_term = Term;
        this.m_weight = tf_ief;
    }

    public String getTerm() {
        return this.m_term;
    }

    public double getWeight() {
        return this.m_weight;
    }

    public void setWeight(double weight) {
        this.m_weight = weight;
    }

    public int getDocId() {
        return this.m_Id_doc;
    }

    @Override
    public int compareTo(Object another) throws ClassCastException {
        if (!(another instanceof Dico))
            throw new ClassCastException("A Dico object expected.");
        int anotherDocid = ((Dico) another).getDocId();
        return this.getDocId() - anotherDocid;
    }

    @Override
    public String toString() {
        return "id" + getDocId() + "term" + getTerm() + "weight" + getWeight() + "";
    }
}

用于执行此操作的split_dico函数:

public static void split_dico(List<Dico> list) {
    int[] changes = new int[list.size() + 1]; // allow for max changes--> contain index of subList
    Arrays.fill(changes, -1); // if an index is not used, will remain -1
    changes[0] = 0;
    int change = 1;
    int id = list.get(0).getDocId();
    for (int i = 1; i < list.size(); i++) {
        Dico dic_entry = list.get(i);
        if (id != dic_entry.getDocId()) {
            changes[change++] = i;
            id = dic_entry.getDocId();
        }
    }
    changes[change] = list.size(); // end of last change segment
    List<List<Dico>> sublists = new ArrayList<>(change);
    for (int i = 0; i < change; i++) {
        sublists.add(list.subList(changes[i], changes[i + 1]));
        System.out.println(sublists);
    }
}

测试:

List<Dico> list = Arrays.asList(new Dico(1, "foo", 1),
    new Dico(7, "zoo", 5),
    new Dico(2, "foo", 1),
    new Dico(3, "foo", 1),
    new Dico(1, "bar", 2),
    new Dico(4, "zoo", 0.5),
    new Dico(2, "bar", 2),
    new Dico(3, "baz", 3));
Collections.sort(list_new);
split_dico(list_new);

输出:

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6]]

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6]]

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6], [doc id : 3 term : foo weight : 2.2]]

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6], [doc id : 3 term : foo weight : 2.2], [doc id : 4 term : zoo weight : 0.15]]

[[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6], [doc id : 3 term : foo weight : 2.2], [doc id : 4 term : zoo weight : 0.15], [doc id : 7 term : zoo weight : 1.5]] 

我不明白这个功能的问题。

2 个答案:

答案 0 :(得分:1)

在您的打印循环中,您在添加新的子列表后打印整个子列表列表。

相反,根据您的要求,您应该只在完成填充子列表

时打印

答案 1 :(得分:0)

我很抱歉这个愚蠢的问题太荒谬了,我想提出更快速的解决方案:

public static void split_dico(List<Dico> list)
   {
   int[] changes = new int[list.size() + 1]; // allow for max changes--> contain index of subList
Arrays.fill(changes, -1); // if an index is not used, will remain -1
changes[0] = 0;
int change = 1;
int id = list.get(0).getDocId();
for (int i = 1; i < list.size(); i++)
{
    Dico dic_entry = list.get(i);
    if (id != dic_entry.getDocId()) 
    {
        changes[change++] = i;
        id = dic_entry.getDocId();
    }
}
changes[change] = list.size(); // end of last change segment
List<List<Dico>> sublists = new ArrayList<>(change);
for (int i = 0; i < change; i++) 
{
    sublists.add(list.subList(changes[i], changes[i + 1]));

 } 
  for (int i = 1; i < sublists.size(); i++)
 {
      lists <Dico> = sublists.get(i);
      system.out.println(lists);

 }
 }   

输出:

 [[doc id : 1 term : foo weight : 2.2, doc id : 1 term : bar weight : 6.6], [doc id : 2 term : foo weight : 2.2, doc id : 2 term : bar weight : 6.6], [doc id : 3 term : foo weight : 2.2], [doc id : 4 term : zoo weight : 0.15], [doc id : 7 term : zoo weight : 1.5]]