我从HDFS系统读取了一个镶木地板文件:
path<-"hdfs://part_2015"
AppDF <- parquetFile(sqlContext, path)
printSchema(AppDF)
root
|-- app: binary (nullable = true)
|-- category: binary (nullable = true)
|-- date: binary (nullable = true)
|-- user: binary (nullable = true)
class(AppDF)
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
collect(AppDF)
.....error:
arguments imply differing number of rows: 46021, 39175, 62744, 27137
head(AppDF)
.....error:
arguments imply differing number of rows: 36, 30, 48
我已经阅读了有关此问题的一些帖子。但这不是我的理由。事实上,我只是从镶木地板文件中读取了一个表格,并head()
或collect()
。我的镶木桌子如下:
app category date user
aaa test 20150101 123
aaa test 20150102 345
aaa test 20150103 678
aaaa testA 20150104 123
aaaa testA 20150105 234
aaaa testA 20150106 4345
bbbb testB 20150101 5435
我使用spark-1.4.0-bin-hadoop2.6 我使用
在集群上运行它./sparkR --master yarn--client
我也在当地试过,也存在同样的问题。
showDF(AppDF)
+-----------+-----------+-----------+-----------+
| app| category| date| user|
+-----------+-----------+-----------+-----------+
|[B@217fa749|[B@43bfbacd|[B@60810b7a|[B@3818a815|
|[B@5ac31778|[B@3e39f5d5|[B@4f3a92dd| [B@e8013ce|
|[B@7a9440d1|[B@1b2b9836|[B@4b160f29|[B@153d7342|
|[B@7559fcf2|[B@66edb00e|[B@7ec19bec|[B@58e3e3f7|
|[B@598b9ab8|[B@5c5ad3f5|[B@4f11a931|[B@107af885|
|[B@7951ec36|[B@716b0b73|[B@2abce531|[B@576b09e2|
|[B@34560144|[B@7a6d3233|[B@16faf110|[B@34e85d39|
| [B@3406452|[B@787a4528|[B@235282e3|[B@7e0f1732|
|[B@10bc1446|[B@2bd7083f|[B@325e7695|[B@57bb4a08|
|[B@48f98037|[B@7450c04e|[B@61817c8a|[B@7c177a08|
|[B@694ce2dd|[B@36c2512d| [B@f5f7d71|[B@46248d99|
|[B@479dee25|[B@517de3de|[B@1ffb2d9e|[B@236ff079|
|[B@52ac196f|[B@20b9f0d0| [B@f70f879|[B@41c8d7da|
|[B@68d34af3| [B@7ddcd49|[B@72d077a7|[B@545fafd4|
|[B@5610b292|[B@623bbb62|[B@3f8b5150|[B@53877bc7|
|[B@63cf70a8|[B@47ed58c9|[B@2f601903|[B@4e0a2c41|
|[B@7ddf876d|[B@5e3445aa|[B@39c9cc37|[B@6f7e4c84|
|[B@4cd1a74b|[B@583e5453|[B@64124267|[B@6ac5ab84|
|[B@577f9ddf|[B@7b55c859|[B@3cd48a51|[B@25c4eb0a|
|[B@2322f0e5|[B@4af55c68|[B@3285d64a|[B@70b7ae2f|
+-----------+-----------+-----------+-----------+
我尝试在Scala中读取这个镶木地板文件。并执行collect()操作。似乎一切都运作良好。所以它应该是SparkR特有的问题