我正在尝试从def threaded(fn):
def wrapper(*args, **kwargs):
thread = threading.Thread(target=fn, args=args, kwargs=kwargs)
thread.start()
return thread
return wrapper
class ThreadHost(threading.Thread, Host):
def __init__(self, host, user, psw):
super(Host, self).__init__(host, user, psw)
self.bypass = ''
self.state = "IDLE"
@threaded
def run(self):
print("Starting connection to {}".format(self.host))
self.connect()
映射到String
。我应该如何提出List<String>
?
Row data = RowFactory.create("123"); StructType schema = new StructType(new StructField[]{ new StructField("text", DataTypes.StringType, false, Metadata.empty()) }); Dataset<Row> df = spark.createDataFrame(data, schema) .map(s -> Arrays.<String>asList(s), ???);
我已经走了这么远:
我自己找到了两个答案。在这两种情况下,您都将使用Encoder<List<String>>
静态方法。但是对于第一个解决方案,您可以将其传递给Encoders.bean()
:
List.class
在第二种解决方案中(更具体但又有些丑陋):
Row data = RowFactory.create("123");
StructType schema = new StructType(new StructField[]{
new StructField("text", DataTypes.StringType, false, Metadata.empty())
});
Dataset<Row> df = spark.createDataFrame(data, schema)
.map(s -> Arrays.<String>asList(s), Encoders.bean(List.class));
虽然这两种解决方案都可以编译,但是它们都面临运行时错误:
Row data = RowFactory.create("123");
StructType schema = new StructType(new StructField[]{
new StructField("text", DataTypes.StringType, false, Metadata.empty())
});
Dataset<Row> df = spark.createDataFrame(data, schema)
.map(s -> Arrays.<String>asList(s), Encoders.bean((Class<List<String>>) Collections.<String>emptyList().getClass()));
它指向Exception in thread "main" java.lang.AssertionError: assertion failed
行。
我找到解决此问题的唯一方法是:
.map()
Row data = RowFactory.create("123");
StructType schema = new StructType(new StructField[]{
new StructField("text", DataTypes.StringType, false, Metadata.empty())
});
Dataset<Row> df = spark.createDataFrame(data, schema)
.map(s -> new DummyList(s), Encoders.bean(DummyList.class));
是:
DummyList
当然,这只是一个hack。我不会将其作为答案,因为我希望有人能提出一个解决此问题的优雅方法。