将List <row>列入java中的List <t>

时间:2017-05-26 14:37:22

标签: java apache-spark

我从cassandra表中选择值并将它们存储在数据集中,如下所示:

b = int(raw_input("Where do you want to start? > "))
a = int(raw_input("How Many Labels do you need? >"))
for x in range(b, b+a):
    with open("Test_file{}.txt".format(x), "w") as file:
        file.write("This is me typing {}".format(x))

现在我有一个POJO类GroupClass,其变量为url,sourceip和destionationip。

Dataset query =spark.sql("select url,sourceip,destinationip from traffic_data");
List<Row> = query.collectAsList();

3 个答案:

答案 0 :(得分:0)

从技术上讲,你可以,但这会在运行时抛出ClassCastException

在这种情况下,最佳做法是使用Copy Constructor

答案 1 :(得分:0)

我来自scala,但我相信在java中有类似的方式。

可能的解决方案如下:

val query =spark.sql("select url,sourceip,destinationip from traffic_data").as[GroupClass]

现在查询值的类型为Dataset[GroupClass],因此调用collectAsList()方法会重新导入List [GroupClass]

val list = query.collectAsList();

另一个解决方案(我认为你必须使用streams在java中执行相同的操作)是map Row来自GroupClass的列表,如下所示:val query =spark.sql("select url,sourceip,destinationip from traffic_data") val list = query.collectAsList(); val mappedList = list.map { case Row(url: String,sourceip: String,destinationip: String) => GroupClass(url, sourceip, destinationip) }

String

我认为所有属性(url,sourceip,destinationip)都有GroupedClass

您必须创建GroupClass(url: String,sourceip: String,destinationip: String)

logger [OPTIONS] [MESSAGE]

Write MESSAGE to the system log. If MESSAGE is omitted, log stdin.

Options:

        -s      Log to stderr as well as the system log
        -t TAG  Log using the specified tag (defaults to user name)
        -p PRIO Priority (numeric or facility.level pair)

希望有所帮助

答案 2 :(得分:0)

你应该使用编码器

Dataset schools = context
.read()
.json("/schools.json")
.as(Encoders.bean(University.class));

可在此处找到更多信息https://databricks.com/blog/2016/01/04/introducing-apache-spark-datasets.html 或者https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-Encoder.html