从字符串映射到数据集中的List <string>

时间:2018-11-14 03:44:55

标签: java apache-spark

我正在尝试从def threaded(fn): def wrapper(*args, **kwargs): thread = threading.Thread(target=fn, args=args, kwargs=kwargs) thread.start() return thread return wrapper class ThreadHost(threading.Thread, Host): def __init__(self, host, user, psw): super(Host, self).__init__(host, user, psw) self.bypass = '' self.state = "IDLE" @threaded def run(self): print("Starting connection to {}".format(self.host)) self.connect() 映射到String。我应该如何提出List<String>

Row data = RowFactory.create("123");
StructType schema = new StructType(new StructField[]{
        new StructField("text", DataTypes.StringType, false, Metadata.empty())
});
Dataset<Row> df = spark.createDataFrame(data, schema)
    .map(s -> Arrays.<String>asList(s), ???);

我已经走了这么远:

我自己找到了两个答案。在这两种情况下,您都将使用Encoder<List<String>>静态方法。但是对于第一个解决方案,您可以将其传递给Encoders.bean()

List.class

在第二种解决方案中(更具体但又有些丑陋):

Row data = RowFactory.create("123");
StructType schema = new StructType(new StructField[]{
        new StructField("text", DataTypes.StringType, false, Metadata.empty())
});
Dataset<Row> df = spark.createDataFrame(data, schema)
    .map(s -> Arrays.<String>asList(s), Encoders.bean(List.class));

虽然这两种解决方案都可以编译,但是它们都面临运行时错误:

Row data = RowFactory.create("123");
StructType schema = new StructType(new StructField[]{
        new StructField("text", DataTypes.StringType, false, Metadata.empty())
});
Dataset<Row> df = spark.createDataFrame(data, schema)
    .map(s -> Arrays.<String>asList(s), Encoders.bean((Class<List<String>>) Collections.<String>emptyList().getClass()));

它指向Exception in thread "main" java.lang.AssertionError: assertion failed 行。

我找到解决此问题的唯一方法是:

.map()

Row data = RowFactory.create("123"); StructType schema = new StructType(new StructField[]{ new StructField("text", DataTypes.StringType, false, Metadata.empty()) }); Dataset<Row> df = spark.createDataFrame(data, schema) .map(s -> new DummyList(s), Encoders.bean(DummyList.class)); 是:

DummyList

当然,这只是一个hack。我不会将其作为答案,因为我希望有人能提出一个解决此问题的优雅方法。

0 个答案:

没有答案