我现在在 static void Main(string[] args)
{
//a list with a possible of duplicate
var theList = (new int[] { 1, 2, 3, 5, 7, 8, 11, 13, 14, 13 }).OrderBy(x => x).ToList();
var step1 = theList.Select((a, b) => theList.Skip(b).TakeWhile((x, y) => a == x || theList[b + y] - 1 == theList[b + y - 1]));
var step2 = step1.GroupBy(x => x.Last())
.Select(x => x.SelectMany(y => y).Distinct())
.Select(x => x.Count() > 1 ? string.Format("{0}-{1}", x.First(), x.Last()) : x.First().ToString());
var result = string.Format("[{0}]", string.Join(", ", step2));
}
中有一个单词列表,如何将其转换为另一个JavaRDD<String> words
,其中包含由前一个单词列表组成的N-gram列表?
以下是我现在获得的代码:
JavaRDD<String> NGram
但是,我知道将RDD收集到List中的速度很慢。我想知道是否有任何方法可以直接将RDD List<String> word = words.collect();
List<String> list = new ArrayList<String>();
for (int i = 0; i < (word.size() - n + 1); i++){
String nGram = word.get(i);
for(int j = 1; j < n; j++){
nGram = nGram + " " + word.get(i + j);
}
//Add n-gram to a list
list.add(nGram);
}
JavaRDD<String> NGram = sc.parallelize(list);
转换为RDD words
?