如何更新Apache Spark DataFrame中的行/列值?

时间:2015-07-15 18:11:01

标签: apache-spark apache-spark-sql spark-dataframe

您好我有一个有序的Spark DataFrame我希望在使用以下代码进行迭代时更改几行,但似乎没有任何方法可以更新Row对象

orderedDataFrame.foreach(new Function1<Row,BoxedUnit>(){

@Override
public BoxedUnit apply(Row v1) {
//how to I change Row here 
//I want to change column no 2 using v1.get(2)
//also what is BoxedUnit how do I use it
return null;
}
});

上面的代码也提供了编译错误"myclassname is not abstract and it does not override abstract method apply$mcVj$sp(long) in scala Function 1" 请指导。我是Spark的新手。我正在使用1.4.0版本。

2 个答案:

答案 0 :(得分:7)

试试这个:

 final DataFrame withoutCurrency = sqlContext.createDataFrame(somedf.javaRDD().map(row -> {
            return RowFactory.create(row.get(0), row.get(1), someMethod(row.get(2)));
        }), somedf.schema());

答案 1 :(得分:0)

Dataset<Row> ds = spark.createDataFrame(Collections.singletonList(data), SellerAsinAttribute.class);
        ds.map((i)-> {
            Object arrayObj = Array.newInstance(Object.class, i.length());
            for (int n = 0; n < i.length(); ++n) {
                Array.set(arrayObj, n, i.get(n));//change 'i.get(n)' to anything you want, if you change type, remember to update schema
            }
            Method create = RowFactory.class.getMethod("create", Object[].class);
            return (Row) create.invoke(null, arrayObj);
        }, RowEncoder.apply(ds.schema())).show();