我有$mgClient = new Mailgun('key-xxxxx');
$domain = "mg.xxxxx.com";
$result = $mgClient->sendMessage($domain, array(/* ... */), array(
'attachment' => array(
// array entry for each file:
array(
// filePath: path to the actual file
'filePath' => $_FILES["resume"]["tmp_name"],
// remoteName: user-visible attachment name
'remoteName' => $_FILES["resume"]["name"]
)
)
));
:
RDD[Array[(String, Int)]]
和Array(Array((yellow,1), (green,1), (orange,1), (red,1)), Array((banana,1), (orange,1), (green,1), (apple,2), (kiwi,1), (pear,1), (red,1)), Array((salad,1), (potato,1), (carrot,1), (green,1), (leek,1)))
:
RDD[(String, Double)]
我希望通过将每个单词的值乘以第二个RDD中相同单词的值,从第一个RDD元素映射Array((pear,1.0986122886681098), (orange,0.0), (kiwi,1.0986122886681098), (apple,0.0), (yellow,1.0986122886681098), (banana,1.0986122886681098), (green,0.0), (carrot,1.0986122886681098), (leek,1.0986122886681098), (salad,1.0986122886681098), (red,0.0), (potato,1.0986122886681098))
。
结果应该是这样的:
RDD[Array[(String, Double)]]
答案 0 :(得分:0)
由于你要对第一类数组的元素进行实际的并行计算,而不是在数组本身上,我认为parallelize
元素(单词和值的数组)和将它们作为驱动程序集合。而不是阵列的RDD,你将有阵列的RDD。它会在以后派上用场。
val setOneRaw =
Array(
Array(
( "kiwi", 1 ),
( "green", 1 ),
( "orange", 1 ),
( "red", 1 )
),
Array(
( "banana", 1 ),
( "orange", 1 ),
( "green", 1 )
),
Array(
( "kiwi", 1 ),
( "pear", 1 ),
( "carrot", 1 )
)
)
val setOneRDDs =
setOneRaw
.map( sc.parallelize( _ ) )
如果这样做,第二个RDD将与主要集合中的其他RDD具有相同的类型。
val setTwo =
sc.parallelize(
Array(
( "pear", 1.0986122886681098 ),
( "orange", 0.0 ),
( "kiwi", 1.0986122886681098 ),
( "apple", 0.0 )
)
)
通过将两个需要处理的元素都作为RDD处理,您可以join
它们,然后乘以连接产生的元组结果。
val mixedSet =
setOneRDDs
.map( _.leftOuterJoin( setTwo ) )
.map(
_.map(
( row ) => ( row._1, row._2._1 * row._2._2.getOrElse( 1.0 ) )
)
)
通过使用leftOuterJoin
,您将解决第一个集合具有给定单词的值但第二个集合没有的情况。
示例中给出的数据结果
(orange,1)
(green,1)
(red,1)
(kiwi,1)
(banana,1)
(orange,1)
(green,1)
(carrot,1)
(pear,1)
(kiwi,1)
-=-
(green,1.0)
(orange,0.0)
(red,1.0)
(kiwi,1.0986122886681098)
(green,1.0)
(banana,1.0)
(orange,0.0)
(kiwi,1.0986122886681098)
(carrot,1.0)
(pear,1.0986122886681098)
像“红色”或“香蕉”这样的词的值应按原样保留。