Question

我使用下面的代码片段将数据加载到表中。但是数据没有加载到表中。

import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
import java.text.SimpleDateFormat
import java.util.Calendar
import sqlContext.implicits._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, FloatType, DoubleType}
import org.apache.spark.sql.functions.rand
import scala.io.Source

val sqlContext = new SQLContext(sc)

val TextFiledata= sc.textFile("wasb://Test.txt")

val schema = StructType(
    Array(
      StructField("ABC", StringType, true),
      StructField("XYZ", StringType, true)
    )
)

val mapped = TextFiledata
  .map(_.split("#|#"))
  .filter(r => r(0) != "ABC")
  .map(p => Row(p(0), p(1))


val DF = sqlContext.createDataFrame(mapped ,schema)
DF.registerTempTable("Table")

Answer 1

根据您的代码val TextFiledata= sc.textFile("wasb://Test.txt")，我认为基于Azure Blob存储的HDFS文件路径不正确。

WASB URI语法是：

wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>

因此，您应该将该文件称为wasbs:///Test.txt或wasbs://<ContainerName>@<StorageAccountName>.blob.core.windows.net/Test.txt。

使用wasb:// URI方案时，Spark使用未加密的HTTP访问Azure存储Blob端点中的数据。我们可以使用wasbs://来确保通过HTTPS访问数据。

数据未加载到Scala中的表

1 个答案: