这里我需要将特定数据从dataframe1推送到空dataframe2,但我遇到了麻烦。下面是代码。
public class PrintValue {
public static void main(String[] args) {
System.setProperty("hadoop.home.dir", "C:\\winutils");
JavaSparkContext sc = new JavaSparkContext(newSparkConf().setAppName("JoinFunctions").setMaster("local[*]"));
SQLContext sqlContext = new SQLContext(sc);
SparkSession spark = SparkSession.builder().appName("JavaTokenizerExample").getOrCreate();
JaroWinkler jw = new JaroWinkler();
List<Row> data = Arrays.asList(
RowFactory.create(1,"Hi I heard about Spark"),
RowFactory.create(2,"I wish Java could use case classes"),
RowFactory.create(3,"Logistic,regression,models,are,neat"));
StructType schema = new StructType(new StructField[] {
new StructField("label", DataTypes.IntegerType, false,
Metadata.empty()),
new StructField("sentence", DataTypes.StringType, false,
Metadata.empty()) });
Dataset<Row> DataFrame1 = spark.createDataFrame(data, schema);
sentenceDataFrame.show();
List<Row> data1 = Arrays.asList();
StructType schema2 = new StructType(new StructField[] {
new StructField("label2", DataTypes.IntegerType, false,Metadata.empty()),
new StructField("sentence2", DataTypes.StringType, false,Metadata.empty()) });
Dataset<Row> DataFrame1 = spark.createDataFrame(sc.emptyRDD(), schema2);
sentenceDataFrame1.show();
预期输出为:
DataFrame1
+-----+--------------------+
|label| sentence|
+-----+--------------------+
| 1|Hi I heard about ...|
| 2|I wish Java could...|
| 3|Logistic,regressi...|
+-----+--------------------+
DataFrame2
+-----+--------------------+
|label2| sentence2 |
+-----+--------------------+
| | |
| 2|I wish Java could...|
| | |
+-----+--------------------+