我有两个数据集已经使用相同的分区器进行了分区并存储在HDFS中。这些数据集是我们无法控制的两个不同Spark作业的输出。现在,我想加入这两个数据集来产生不同的信息。
<?php
$email = TRIM($_REQUEST['email']);
$pword = TRIM($_REQUEST['password']);
if( isset($email) && isset($password)){
$pword = mysqli_real_escape_string($con, $pword);
$email = mysqli_real_escape_string($con, $email);
require('includes/dbcon.php');
$sql = mysqli_query($con, "SELECT ALL from users WHERE email = '$email' AND BINARY password = '$pword'");
$checkrows=mysqli_num_rows($sql);
$row = mysqli_fetch_array($sql);
}
$userID = $row('id');
$name = $row('username');
if($checkrow>0){
mysqli_close($con);
header("location:index.html#mapPage")
}
else{
$loginError = "Wrong Username or Pasword";
echo $loginError;
}
答案 0 :(得分:0)
您可以尝试创建2个数据框并使用SQL加入它们。请找到以下代码。
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.Encoder
// For implicit conversions from RDDs to DataFrames
import spark.implicits._
case class struc_dataset(ORDER_ID: String,CUSTOMER_ID: String, ITEMS:String)
//Read file1
val File1DF = spark.sparkContext
.textFile("temp/src/file1.txt")
.map(_.split("\t"))
.map(attributes => struc_dataset(attributes(0), attributes(1),attributes(3))).toDF()
//Register as Temp view - Dataset1
File1DF.createOrReplaceTempView("Datset1")
//Read file2
val File2DF = spark.sparkContext
.textFile("temp/src/file2.txt")
.map(_.split("\t"))
.map(attributes => struc_dataset(attributes(0),attributes(1),attributes(3))).toDF()
//Register as Temp view - Dataset2
File2DF.createOrReplaceTempView("Datset2")
// SQL statement to create final dataframe (JOIN)
val finalDF = spark.sql("SELECT * FROM Dataset1 ds1 JOIN Dataset2 ds2 on ds1.ORDER_ID=ds2.ORDER_ID AND ds1.CUSTOMER_ID=ds2.CUSTOMER_ID")
finalDF.show()