从sparklyr

时间:2018-04-18 05:36:09

标签: r apache-spark hive sparklyr

我有以下数据:

 "ElemUID   ElemName    Kind    Number  DaySecFrom(UTC) DaySecTo(UTC)"
"399126817  A648/13FKO-66   DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492732  A661/18FRS-97   DEZ   120.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126819  A648/12FKO-2    DEZ    60.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126818  A648/12FKO-1    DEZ   180.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126816  A648/13FKO-65   DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"398331142  A661/31OFN-1    DEZ   120.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"398331143  A661/31OFN-2    DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492739  A5/28FKN-65 DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492735  A661/23FRS-97   DEZ    60.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492740  B44/104FSN-33   DEZ   180.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"

我把它加载到HDFS。然后我在Hive中定义了一个外部表:

CREATE EXTERNAL TABLE IF NOT EXISTS deg
(
ElemUID int,
ElemName string,
Kind string,
Number float,
timefromdeg string,
timetodeg string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
TBLPROPERTIES ("skip.header.line.count"="1");

然后我用LOAD DATA INPATH..将数据集加载到定义的模式中 现在我希望它用tbl()加载到sparklyr。每次我这样做,我总是把标题作为第一行中的数据: 输出glimpse()

Variables: 6
$ elemuid  <int> NA, 399126817, 483492732, 399126819, 399126818, 399126816, 39...
$ elemname <chr> "ElemName", "A648/13FKO-66", "A661/18FRS-97", "A648/12FKO-2",...
$ kind     <chr> "Kind", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ...
$ number   <dbl> NaN, NaN, 120, 60, 180, NaN, 120, NaN, NaN, 60, 180, NaN, NaN...
$ timefrom <dttm> NA, 2017-07-01 23:58:00, 2017-07-01 23:58:00, 2017-07-01 23:...
$ timeto   <dttm> NA, 2017-07-01 23:59:00, 2017-07-01 23:59:00, 2017-07-01 23:...

我觉得这会扰乱我后来的分析。在创建外部table()时,我已使用TBLPROPERTIES ("skip.header.line.count"="1")

是否有可能跳过sparklyr中的第一行?

谢谢!

0 个答案:

没有答案