我是SparkSQL中的新手。请帮助我。
我的具体问题是,如果我们可以将RDD hospitalDataText
转换为DataFrame(使用.toDF()
),其中hospitalDataText
已使用Spark上下文读取csv文件(不使用sqlContext.read.csv("path")
) 。
为什么我们不能写 header.toDF()
?如果我尝试将变量header
RDD转换为DataFrame,则会抛出错误:value toDF is not a member of String
。 为什么? 我的主要目的是希望使用header
函数 查看变量.show()
RDD的数据那么为什么我无法将RDD转换为DataFrame?请检查下面给出的代码! 看起来像DOUBLE-STANDARD :'(
scala> val hospitalDataText = sc.textFile("/Users/TheBhaskarDas/Desktop/services.csv")
hospitalDataText: org.apache.spark.rdd.RDD[String] = /Users/TheBhaskarDas/Desktop/services.csv MapPartitionsRDD[39] at textFile at <console>:33
scala> val header = hospitalDataText.first() //Remove the header
header: String = uhid,locationid,doctorid,billdate,servicename,servicequantity,starttime,endtime,servicetype,servicecategory,deptname
阶&GT; header.toDF()
<console>:38: error: value toDF is not a member of String header.toDF() ^
scala> val hospitalData = hospitalDataText.filter(a => a != header)
hospitalData: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[40] at filter at <console>:37
scala> val m = hospitalData.toDF()
m: org.apache.spark.sql.DataFrame = [value: string]
scala> println(m)
[value: string]
scala> m.show()
+--------------------+
| value|
+--------------------+
|32d84f8b9c5193838...|
|32d84f8b9c5193838...|
|213d66cb9aae532ff...|
|222f8f1766ed4e7c6...|
|222f8f1766ed4e7c6...|
|993f608405800f97d...|
|993f608405800f97d...|
|fa14c3845a8f1f6b0...|
|6e2899a575a534a1d...|
|6e2899a575a534a1d...|
|1f1603e3c0a0db5e6...|
|508a4fbea4752771f...|
|5f33395ae7422c3cf...|
|5f33395ae7422c3cf...|
|4ef07783ce800fc5d...|
|70c13902c9c9ccd02...|
|70c13902c9c9ccd02...|
|a950feff6911ab5e4...|
|b1a0d427adfdc4f7e...|
|b1a0d427adfdc4f7e...|
+--------------------+
only showing top 20 rows
scala> m.show(1)
+--------------------+
| value|
+--------------------+
|32d84f8b9c5193838...|
+--------------------+
only showing top 1 row
scala> m.show(1,true)
+--------------------+
| value|
+--------------------+
|32d84f8b9c5193838...|
+--------------------+
only showing top 1 row
scala> m.show(1,2)
+-----+
|value|
+-----+
| 32|
+-----+
only showing top 1 row
答案 0 :(得分:3)
您一直说i
是header
,而您发布的输出清楚地表明RDD
是header
。 String
不会返回first()
。您无法在RDD
上使用show()
,但可以使用String
。