我遇到了一个奇怪的问题。我想从数据框中获取所有数据并插入到永久性的hive表中并将其索引到elasticsearch.Query很简单select * from result*
我将循环遍历每一行并插入ES。和简单insert into <hive_table> select * from result
,但我得到了不同的结果。所以要检查我创建了这样的3个不同的temprorary表
spark.sql("select * from QtyContribution").join(getRevenueContribution(spark,table2), "item").join(finalUniqueItem(spark), "item").registerTempTable("hola");
spark.sql("select * from QtyContribution").join(getRevenueContribution(spark,table2), "item").join(finalUniqueItem(spark), "item").registerTempTable("hola1");
spark.sql("select * from QtyContribution").join(getRevenueContribution(spark,table2), "item").join(finalUniqueItem(spark), "item").registerTempTable("hola2");
每个查询都相同,只是表格不同。并且
Dataset<Row> dframe1 = spark.sql("select * from hola");
Row[] row1 = (Row[]) dframe1.collect();
int q=1;
for(Row s : row1){
System.out.println(s.get(0)+" =======df1======= "+ q++);
}
Dataset<Row> dframe2 = spark.sql("select * from hola1");
Row[] row2 = (Row[]) dframe2.collect();
int w=1;
for(Row s : row2){
System.out.println(s.get(0)+" =======df2======= "+ w++);
}
Dataset<Row> dframe3 = spark.sql("select * from hola2");
Row[] row3 = (Row[]) dframe3.collect();
int e=1;
for(Row s : row2){
System.out.println(s.get(0)+" =======df3======= "+ e++);
}
我得到的结果就是这个
BM8942 =======df1======= 1723
BM8942 =======df2======= 1733
BM8942 =======df3======= 1733
对于ES我做了
Dataset<Row> dframe = spark.sql("select * from hola1");
Row[] row = (Row[]) dframe.collect();
int i = 1;
for (Row r : row) {
bulkRequest.add(client.prepareIndex("twitter1234", "use1", String.valueOf(i))
.setSource(jsonBuilder()
.startObject()
.field("item", r.get(0))
.field("qty_contrib", r.get(1))
.field("division", r.get(2))
.field("rev_contrib", r.get(3))
.field("bp", r.get(4))
.endObject()
)
);
System.out.println(i++ +" ==== "+r.get(0));
}
我得到了
1534 ==== BM8942
发生了什么事?