我有一个字符串类型列表,其中包含以下形式的多个项目
[Cid:0001,Jid:439,java,unit testing]
[Cid:0001,Jid:439,java,jsp,xml,javascript,servlet,html]
[Cid:0001,Jid:245,ui development,jquery,javascript,html,ajax]
[Cid:0002,Jid:312,team,goals,territory]
以此类推
由于有很多项目,我需要根据Cid和Jid对项目进行分组。例如,上面的前两行应该是一组,因为它具有相同的Cid和Jid。
我需要一次一次地将每个组传递到采用JavaRDD格式作为输入的算法中。每个列表都使用spark中的parallelize函数进行并行化。
List<List<String>> mainList = new ArrayList<>();
for (Resume r : resumes) {
List<String> subList = new ArrayList<>();
for (String temp : hashSet) {
if (temp.equalsIgnoreCase(r.getJid() + r.getCid())) {
subList.add(r.toString());
mainList.add(subList);
答案 0 :(得分:0)
这是我的代码段:
Resume r1 = new Resume();
r1.setJid("123");
r1.setCid("2900");
r1.setRes("java,unit testing");
Resume r2 = new Resume();
r2.setJid("1232");
r2.setCid("900");
r2.setRes("java,jsp,xml,javascript,servlet,html");
Resume r3 = new Resume();
r3.setJid("123");
r3.setCid("2900");
r3.setRes("ui development,jquery,javascript,html,ajax");
List<Resume> resumes = new ArrayList<Resume>();
resumes.add(r1);
resumes.add(r2);
resumes.add(r3);
Map<String, String> map = new HashMap<String, String>();
for (Resume r : resumes) {
StringBuilder subList = new StringBuilder();
subList.append("\""+r.toString()+"\"");
if (map.containsKey("JID:" + r.getJid()+"+" + "CID:" + r.getCid())) {
subList.append(","+map.get("JID:" + r.getJid()+"+" + "CID:" + r.getCid()));
}
map.put("JID:" + r.getJid()+"+" + "CID:" + r.getCid(),subList.toString());
}
for(String key:map.keySet()) {
System.out.println("{"+key+map.get(key)+"}");
}
输出:
{JID:123+CID:2900"ui development,jquery,javascript,html,ajax","java,unit testing"}
{JID:1232+CID:900"java,jsp,xml,javascript,servlet,html"}
我已经使用"JID"+r.getJid()+"CID"+r.getCid()
来考虑
JId:212,Cid:456
JId:2124和Cid:56
因为不应该将它们分组在一起。
将密钥添加为r.getJid() + r.getCid()
不会帮助您考虑这种情况
答案 1 :(得分:0)
我在您在顶部提到的字符串列表中使用了正则表达式,请告诉我这是否对您有用
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("[Cid:0001,Jid:439,java,unit testing]");
list.add("[Cid:0001,Jid:439,java,jsp,xml,javascript,servlet,html]");
list.add("[Cid:0001,Jid:245,ui development,jquery,javascript,html,ajax]");
list.add("[Cid:0002,Jid:312,team,goals,territory]");
Map<String, String> map = new HashMap<String, String>();
final Pattern patternId = Pattern.compile("Cid:\\d*,Jid:\\d*,", Pattern.MULTILINE);//to get Id combo
for(int i=0; i<list.size(); i++) {
Matcher matcher = patternId.matcher(list.get(i));
String cIdJid = null;
if(matcher.find()) {
cIdJid = matcher.group(0);
}
if(map.containsKey(cIdJid)) {
map.put(cIdJid, map.get(cIdJid)+","+list.get(i));
}else {
map.put(cIdJid, list.get(i));
}
}
Collection<String> collection = map.values();
for (String value : collection) {
if(value.contains("],[")) {
System.out.println("["+value+"]");
}else {
System.out.println(value);
}
}
}
输出
[[Cid:0001,Jid:439,java,unit testing],[Cid:0001,Jid:439,java,jsp,xml,javascript,servlet,html]]
[Cid:0002,Jid:312,team,goals,territory]
[Cid:0001,Jid:245,ui development,jquery,javascript,html,ajax]