如何交叉组合(这是正确的描述方式?)两个RDDS?
输入:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class DigitsConverter{
// this will have dictionary relation between digits and words
private static final Map<Character,String> words = new HashMap<>();
// can use String array instead of Map as suggested in comments
private static final String[] alsoWords = {"zero","one","two", "three", "four", "five", "six", "seven", "eight", "nine"}
// provide mapping of digits to words
static {
words.put('0', "zero");
words.put('1', "one");
words.put('2', "two");
words.put('3', "three");
words.put('4', "four");
words.put('5', "five");
words.put('6', "six");
words.put('7', "seven");
words.put('8', "eight");
words.put('9', "nine");
}
public static void main(String args[]) throws FileNotFoundException {
Scanner scanner = new Scanner(new File("../SomeFile"));
while (scanner.hasNextInt()) {
char[] chars = ("" + scanner.nextInt()).toCharArray();
System.out.print(String.valueOf(chars) +": ");
// for each digit in a given number
for (char digit: chars) {
// print word for that digit
System.out.print(words.get(digit) + " ");
// if String array is used instead of Map
System.out.print(alsoWords[((int)digit- 48)] + " ");
}
System.out.println();
}
scanner.close();
}
}
输出:
rdd1 = [a, b]
rdd2 = [c, d]
我尝试了rdd3 = [(a, c), (a, d), (b, c), (b, d)]
,它抱怨rdd3 = rdd1.flatMap(lambda x: rdd2.map(lambda y: (x, y))
。我想这意味着你不能像列表推导那样嵌套It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation.
,而且一个语句只能做一个action
。
答案 0 :(得分:3)
因为您注意到您无法在另一个transformation
内执行transformation
(请注意flatMap
&amp; map
为transformations
而非因为他们返回RDD而不是actions
。值得庆幸的是,您尝试完成的工作直接受到Spark API中另一个转换的支持 - 即cartesian
(请参阅http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD)。
所以你想做rdd1.cartesian(rdd2)
。
答案 1 :(得分:1)