如何向Spark RDD添加新列?

时间:2015-04-30 08:48:00

标签: apache-spark rdd

我的RDD包含 MANY 列(例如数百),如何在此RDD结尾添加一列?

例如,如果我的RDD如下所示:

    123, 523, 534, ..., 893
    536, 98, 1623, ..., 98472
    537, 89, 83640, ..., 9265
    7297, 98364, 9, ..., 735
    ......
    29, 94, 956, ..., 758

如何向其中添加一列,其值是第二列和第三列的总和?

非常感谢。

2 个答案:

答案 0 :(得分:8)

您根本不必使用RDD *个对象向val rdd = ... val withAppendedColumnsRdd = rdd.map(row => { val originalColumns = row.toSeq.toList val secondColValue = originalColumns(1).asInstanceOf[Int] val thirdColValue = originalColumns(2).asInstanceOf[Int] val newColumnValue = secondColValue + thirdColValue Row.fromSeq(originalColumns :+ newColumnValue) // Row.fromSeq(originalColumns ++ List(newColumnValue1, newColumnValue2, ...)) // or add several new columns }) 添加新列。

可以通过映射每一行,将其原始内容加上您想要追加的元素来完成,例如:

body{
    padding-bottom: 15px;
    position: relative;
    height: auto;
    min-height: 100%;
}
table{
         padding-bottom: 15px;  
}
form,table,h3,h4,#retrieveform{
    text-align: center;
    margin: 0px auto;
}


table, th, td {
    border-collapse: collapse;
}
th, td {
    padding: 10px;
    text-align: left;
}
table tr:nth-child(even) {
    background-color: hsla(120,100%,75%,0.3);;
}
table tr:nth-child(odd) {
   background-color:hsla(120,100%,25%,0.3);;
}
table th    {
    background-color: #B0B0B0  ;
    color: white;
}

.myButton {
        -moz-box-shadow: 0px 1px 0px 0px #f0f7fa;
        -webkit-box-shadow: 0px 1px 0px 0px #f0f7fa;
        box-shadow: 0px 1px 0px 0px #f0f7fa;
        background:-webkit-gradient(linear, left top, left bottom, color-stop(0.05, #33bdef), color-stop(1, #019ad2));
        background:-moz-linear-gradient(top, #33bdef 5%, #019ad2 100%);
        background:-webkit-linear-gradient(top, #33bdef 5%, #019ad2 100%);
        background:-o-linear-gradient(top, #33bdef 5%, #019ad2 100%);
        background:-ms-linear-gradient(top, #33bdef 5%, #019ad2 100%);
        background:linear-gradient(to bottom, #33bdef 5%, #019ad2 100%);
        filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#33bdef', endColorstr='#019ad2',GradientType=0);
        background-color:#33bdef;
        -moz-border-radius:6px;
        -webkit-border-radius:6px;
        border-radius:6px;
        border:1px solid #057fd0;
        display:inline-block;
        cursor:pointer;
        color:#ffffff;
        font-family:Arial;
        font-size:15px;
        font-weight:bold;
        padding:6px 24px;
        text-decoration:none;
        text-shadow:0px -1px 0px #5b6178;
}
.myButton:hover {
        background:-webkit-gradient(linear, left top, left bottom, color-stop(0.05, #019ad2), color-stop(1, #33bdef));
        background:-moz-linear-gradient(top, #019ad2 5%, #33bdef 100%);
        background:-webkit-linear-gradient(top, #019ad2 5%, #33bdef 100%);
        background:-o-linear-gradient(top, #019ad2 5%, #33bdef 100%);
        background:-ms-linear-gradient(top, #019ad2 5%, #33bdef 100%);
        background:linear-gradient(to bottom, #019ad2 5%, #33bdef 100%);
        filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#019ad2', endColorstr='#33bdef',GradientType=0);
        background-color:#019ad2;
}
.myButton:active {
        position:relative;
        top:1px;
}
#footer {
   position:fixed;
   margin-top: -450px; 
   padding-top: 0;
   color:white;
   bottom:0;
   text-align:center;
   width:100%;
   height:20px;   /* Height of the footer */
   background:black;
}

答案 1 :(得分:4)

你有元组4的RDD,应用地图并将其转换为tuple5

val rddTuple4RDD = ...........
val rddTuple5RDD = rddTuple4RDD.map(r=> Tuple5(rddTuple4._1, rddTuple4._2, rddTuple4._3, rddTuple4._4, rddTuple4._2 + rddTuple4._3))