将列表拆分为子列表,并将它们一个接一个地传递给算法

时间:2019-05-13 10:32:16

标签: java list apache-spark arraylist collections

我有一个字符串类型列表,其中包含以下形式的多个项目

[Cid:0001,Jid:439,java,unit testing]
[Cid:0001,Jid:439,java,jsp,xml,javascript,servlet,html]
[Cid:0001,Jid:245,ui development,jquery,javascript,html,ajax]
[Cid:0002,Jid:312,team,goals,territory]

以此类推

由于有很多项目,我需要根据Cid和Jid对项目进行分组。例如,上面的前两行应该是一组,因为它具有相同的Cid和Jid。

我需要一次一次地将每个组传递到采用JavaRDD格式作为输入的算法中。每个列表都使用spark中的parallelize函数进行并行化。

List<List<String>> mainList = new ArrayList<>();
for (Resume r : resumes) {
  List<String> subList = new ArrayList<>();
  for (String temp : hashSet) {
    if (temp.equalsIgnoreCase(r.getJid() + r.getCid())) { 
      subList.add(r.toString());
      mainList.add(subList);

2 个答案:

答案 0 :(得分:0)

这是我的代码段:




        Resume r1 = new Resume();
        r1.setJid("123");
        r1.setCid("2900");
        r1.setRes("java,unit testing");

        Resume r2 = new Resume();
        r2.setJid("1232");
        r2.setCid("900");
        r2.setRes("java,jsp,xml,javascript,servlet,html");

        Resume r3 = new Resume();
        r3.setJid("123");
        r3.setCid("2900");
        r3.setRes("ui development,jquery,javascript,html,ajax");

        List<Resume> resumes = new ArrayList<Resume>();
        resumes.add(r1);
        resumes.add(r2);
        resumes.add(r3);
        Map<String, String> map = new HashMap<String, String>();
        for (Resume r : resumes) {
            StringBuilder subList = new StringBuilder();
            subList.append("\""+r.toString()+"\"");
            if (map.containsKey("JID:" + r.getJid()+"+" + "CID:" + r.getCid())) {
                subList.append(","+map.get("JID:" + r.getJid()+"+" + "CID:" + r.getCid()));
            }
            map.put("JID:" + r.getJid()+"+" + "CID:" + r.getCid(),subList.toString());

        }

        for(String key:map.keySet()) {
            System.out.println("{"+key+map.get(key)+"}");
        }


输出:

{JID:123+CID:2900"ui development,jquery,javascript,html,ajax","java,unit testing"}
{JID:1232+CID:900"java,jsp,xml,javascript,servlet,html"}

我已经使用"JID"+r.getJid()+"CID"+r.getCid()来考虑

JId:212,Cid:456

JId:2124和Cid:56

因为不应该将它们分组在一起。 将密钥添加为r.getJid() + r.getCid()不会帮助您考虑这种情况

答案 1 :(得分:0)

我在您在顶部提到的字符串列表中使用了正则表达式,请告诉我这是否对您有用

   public static void main(String[] args)  {
        List<String> list = new ArrayList<String>();
        list.add("[Cid:0001,Jid:439,java,unit testing]"); 
        list.add("[Cid:0001,Jid:439,java,jsp,xml,javascript,servlet,html]");
        list.add("[Cid:0001,Jid:245,ui development,jquery,javascript,html,ajax]"); 
        list.add("[Cid:0002,Jid:312,team,goals,territory]");

        Map<String, String> map = new HashMap<String, String>();

        final Pattern patternId = Pattern.compile("Cid:\\d*,Jid:\\d*,", Pattern.MULTILINE);//to get Id combo
        for(int i=0; i<list.size(); i++) {
            Matcher matcher = patternId.matcher(list.get(i));
            String cIdJid = null;
            if(matcher.find()) {
                cIdJid = matcher.group(0);
            }
            if(map.containsKey(cIdJid)) {
                map.put(cIdJid, map.get(cIdJid)+","+list.get(i));
            }else {
                map.put(cIdJid, list.get(i));
            }           
        }
        Collection<String> collection = map.values();
        for (String value : collection) {
            if(value.contains("],[")) {
                System.out.println("["+value+"]");
            }else {
                System.out.println(value);
            }       
        }
    }

输出

[[Cid:0001,Jid:439,java,unit testing],[Cid:0001,Jid:439,java,jsp,xml,javascript,servlet,html]]
[Cid:0002,Jid:312,team,goals,territory]
[Cid:0001,Jid:245,ui development,jquery,javascript,html,ajax]