Spark没有检测dateType,也无法将stringType转换为DateType

时间:2017-07-17 15:21:54

标签: scala apache-spark machine-learning apache-spark-mllib

这是我的代码:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
import org.apache.spark.ml.feature.VectorAssembler
object Import {
  def main(args:Array[String]): Unit ={

    val spark = SparkSession.builder.master("local[*]").config("spark.eventLog.enabled", "true").config("spark.eventLog.dir", "file:///C:/Users/me/spark/logs").appName("S1").getOrCreate()
    val df = spark.read.format("csv").option("header", true).option("dateFormat", "HH:mm:ss").csv("moyenexport.csv")
    var dff = df.select("Fc_0004","Fc_0008","Fc_0009","Fc_0010","Fc_0011","Fc_0013","Fc_0015","Fc_0047","Fc_0055","Fc_0063")
    dff.cache()
    dff.show()
    //dff.withColumn("Fc_0004",dff("Fc_0004").cast(TimestampType))
    dff.printSchema()

以下是第一个show函数的Dataframe:

+--------+-------+-------+-------+-------+-------+-------+-------+----------------+----------------+
| Fc_0004|Fc_0008|Fc_0009|Fc_0010|Fc_0011|Fc_0013|Fc_0015|Fc_0047|         Fc_0055|         Fc_0063|
+--------+-------+-------+-------+-------+-------+-------+-------+----------------+----------------+
|00:06:27|      1|     45|     31|      2|    116|      2|      0|5373.92999999997|         1040.47|
|00:08:53|      1|     23|     17|      3|     19|      1|     19|         1889.18|               0|
|01:11:40|      4|     21|     11|      1|      1|      0|      1|               0|               0|
|00:12:16|      1|     48|     33|      1|     39|      1|      0|            5430|             580|
|00:16:54|      1|      8|      6|      0|     11|      0|     11|          215.03|               0|
|00:30:14|      1|    296|    212|    137|    175|     31|     11|21655.5500000013|12785.9099999999|
|00:32:45|      1|     25|     14|      0|     24|      0|     24|            3000|               0|
|22:41:15|      9|      7|      5|      0|      7|      0|      0|          996.03|               0|

如您所见,第一列应采用日期格式。但是,即使我取消注释了强制转换线,模式看起来像这样:

root
 |-- Fc_0004: string (nullable = true)
 |-- Fc_0008: string (nullable = true)
 |-- Fc_0009: string (nullable = true)
 |-- Fc_0010: string (nullable = true)
 |-- Fc_0011: string (nullable = true)
 |-- Fc_0013: string (nullable = true)
 |-- Fc_0015: string (nullable = true)
 |-- Fc_0047: string (nullable = true)
 |-- Fc_0055: string (nullable = true)
 |-- Fc_0063: string (nullable = true)

因此我的演员表或文件阅读流程都不适用于该类型。 我确定我错过了一些东西,但我无法看到它,任何帮助都会非常感激。

编辑:csv文件的示例:

    Cible,"Fc_0000","Fc_0001","Fc_0002","Fc_0003","Fc_0004","Fc_0005","Fc_0006","Fc_0007","Fc_0008","Fc_0009","Fc_0010","Fc_0011","Fc_0012","Fc_0013","Fc_0014","Fc_0015","Fc_0016","Fc_0017","Fc_0018","Fc_0019","Fc_0020","Fc_0021","Fc_0022","Fc_0023","Fc_0024","Fc_0025","Fc_0026","Fc_0027","Fc_0028","Fc_0029","Fc_0030","Fc_0031","Fc_0032","Fc_0033","Fc_0034","Fc_0035","Fc_0036","Fc_0037","Fc_0038","Fc_0039","Fc_0040","Fc_0041","Fc_0042","Fc_0043","Fc_0044","Fc_0045","Fc_0046","Fc_0047","Fc_0048","Fc_0049","Fc_0050","Fc_0051","Fc_0052","Fc_0053","Fc_0054","Fc_0055","Fc_0056","Fc_0057","Fc_0058","Fc_0059","Fc_0060","Fc_0061","Fc_0062","Fc_0063","Fc_0064","Fc_0065","Fc_0066","Fc_0067","Fc_0068","Fc_0069","Fc_0070","Fc_0071","Fc_0072","Fc_0073","Fc_0074","Fc_0075","Fc_0076","Fc_0077","Fc_0078","Fc_0079","Fc_0080","Fc_0081","Fc_0082","Fc_0083","Fc_0084","Fc_0085","Fc_0086","Fc_0087","Fc_0088","Fc_0089","Fc_0090","Fc_0091","Fc_0092","Fc_0093","Fc_0094","Fc_0095","Fc_0096","Fc_0097","Fc_0098","Fc_0099","Fc_0100","Fc_0101","Fc_0102","Fc_0103","Fc_0104","Fc_0105","Fc_0106","Fc_0107","Fc_0108","Fc_0109","Fc_0110","Fc_0111","Fc_0112","Fc_0113","Fc_0114","Fc_0115","Fc_0116","Fc_0117","Fc_0118","Fc_0119","Fc_0120","Fc_0121","Fc_0122","Fc_0123","Fc_0124","Fc_0125","Fc_0126","Fc_0127","Fc_0128","Fc_0129","Fc_0130","Fc_0131","Fc_0132","Fc_0133","Fc_0134","Fc_0135","Fc_0136","Fc_0137","Fc_0138","Fc_0139","Fc_0140","Fc_0141","Fc_0142","Fc_0143","Fc_0144","Fc_0145","Fc_0146","Fc_0147","Fc_0148","Fc_0149","Fc_0150","Fc_0151","Fc_0152","Fc_0153","Fc_0154","Fc_0155","Fc_0156","Fc_0157","Fc_0158","Fc_0159","Fc_0160","Fc_0161","Fc_0162","Fc_0163","Fc_0164","Fc_0165","Fc_0166","Fc_0167","Fc_0168","Fc_0169","Fc_0170","Fc_0171","Fc_0172","Fc_0173","Fc_0174","Fc_0175","Fc_0176","Fc_0177","Fc_0178","Fc_0179","Fc_0180","Fc_0181","Fc_0182","Fc_0183","Fc_0184","Fc_0185","Fc_0186","Fc_0187","Fc_0188","Fc_0189","Fc_0190","Fc_0191","Fc_0192","Fc_0193","Fc_0194","Fc_0195","Fc_0196","Fc_0197","Fc_0198","Fc_0199","Fc_0200","Fc_0201","Fc_0202","Fc_0203","Fc_0204","Fc_0205","Fc_0206","Fc_0207","Fc_0208","Fc_0209","Fc_0210","Fc_0211","Fc_0212","Fc_0213","Fc_0214","Fc_0215","Fc_0216","Fc_0217","Fc_0218","Fc_0219","Fc_0220","Fc_0221","Fc_0222","Fc_0223","Fc_0224","Fc_0225","Fc_0226","Fc_0227","Fc_0228","Fc_0229","Fc_0230","Fc_0231","Fc_0232","Fc_0233","Fc_0234","Fc_0235","Fc_0236","Fc_0237","Fc_0238","Fc_0239","Fc_0240","Fc_0241","Fc_0242","Fc_0243","Fc_0244","Fc_0245","Fc_0246","Fc_0247","Fc_0248","Fc_0249","Fc_0250","Fc_0251","Fc_0252","Fc_0253","Fc_0254","Fc_0255","Fc_0256","Fc_0257","Fc_0258","Fc_0259","Fc_0260","Fc_0261","Fc_0262","Fc_0263","Fc_0264","Fc_0265","Fc_0266","Fc_0267","Fc_0268","Fc_0269","Fc_0270","Fc_0271","Fc_0272","Fc_0273","Fc_0274","Fc_0275","Fc_0276","Fc_0277","Fc_0278","Fc_0279","Fc_0280","Fc_0281","Fc_0282","Fc_0283","Fc_0284","Fc_0285","Fc_0286","Fc_0287","Fc_0288","Fc_0289","Fc_0290","Fc_0291","Fc_0292","Fc_0293","Fc_0294","Fc_0295","Fc_0296","Fc_0297","Fc_0298","Fc_0299","Fc_0300","Fc_0301","Fc_0302","Fc_0303","Fc_0304","Fc_0305","Fc_0306","Fc_0307","Fc_0308","Fc_0309","Fc_0310","Fc_0311","Fc_0312","Fc_0313","Fc_0314","Fc_0315","Fc_0316","Fc_0317","Fc_0318","Fc_0319","Fc_0320","Fc_0321","Fc_0322","Fc_0323","Fc_0324","Fc_0325","Fc_0326","Fc_0327","Fc_0328","Fc_0329","Fc_0330","Fc_0331","Fc_0332","Fc_0333"
0,2,1,1,1,00:06:27,0,0,0,1,45,31,2,0,116,2,2,2,2,10,264,808,125,2,2,2,2,3081,9906,3851,2,,114977,0,1,59.02,0,0,0,0,69.720959999996,,,69.7209599999957,61.8586822460233,59.9042630728827,106.948393378773,116,0,16,2,3,4,31,31,1000,5373.92999999997,0,0,0,0,8715.1199999995,1012.1,1586.85,1040.47,0,0,0,0,7128.26999999957,0,0,7128.26999999957,59.02,59.02,59.02,59.02,0,8715.11999999947,59.02,59.02,59.02,59.02,329508,4,4,0,0,0,4,4,4,4,0,0,4,0,0,2,125,0,105,0,10,6,0,0,0,0,0,115,0,0,115,2,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,70,70,42,2316.56,2182.25,0,0,0,1,45,0,45,43,0,42.8,1,2,0.032,0.032,31,0,,,,,,,,0.0365414543681721,0.0347447071036541,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2742,0,0,0,0,70,2,0,0,0,0,,2,0,0,0,2,1,0,0,0,0,00:05:35,1,2,0,0,1,0,0,0,0,0,2,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,0,0,18,2,2,2,2,2,2,2,0,0,0,2,2,2,0,0,3,3,3,2,2,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0
0,4,0,,0,00:08:53,1,1,0,1,23,17,3,0,19,2,1,1,2,0,28,90,31,3,1,1,3,976,2855,2097,2,,3417852,0,1,260,0,0,0,0,,65.759677419355,66.3458369098709,65.759677419355,96.9586168032786,106.447152364273,34.6977459016393,0,19,0,0,0,0,0,31,289.51,1889.18,260,130,130,260,0,0,-1.70530256582424e-13,0,0,0,0,0,0,0,0,2038.55,260,130,130,260,2038.55,2038.55,260,130,130,260,33865,0,0,0,0,0,0,0,11,0,0,11,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,3,1,1,3,0,31,0,0,31,0,0,0,0,0,0,0,0,0,0,11,1,3,11,1,3,0,0,0,7,0,0,0,0,0,0,0,0,0,0,0,0,130,1,1,0,0.354838709677419,31,0,,,,,,,,0.11994773989887,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2038.55,0,42764.2100000002,0,0,0,0,0,0,0,0,361,31,0,5,0,0,0,0,0,1,2,0,4,0,0,0,2,1,0,0,0,1,00:09:41,1,2,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,0,0,9,2,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,2,0,,0,01:11:40,1,1,0,4,21,11,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,21,50,50,1,,437891,1,0,151.96,0,0,0,0,,39.6,41.785,39.6,72.0523809523809,70.855,0.952380952380952,0,1,0,0,0,0,0,14,252.24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,39.6,0,0,0,0,39.6,39.6,0,0,0,0,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,39.6,1,3,0,0,0,0,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,39.6,0,786.9,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,6,0,0,1,0,0,0,0,0,1,00:10:07,1,0,0,0,1,0,0,0,0,0,5,0,0,0,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,3,1,1,18,2,2,2,2,2,2,2,1,1,1,2,2,2,1,1,3,3,3,2,2,1,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0

1 个答案:

答案 0 :(得分:1)

而不是以这种方式转换为时间戳

dff.withColumn("Fc_0004",dff("Fc_0004").cast(TimestampType))

使用unix_timestamp作为方法,方法是传递返回long值的列名和格式,而不是将其强制转换为时间戳

dff.withColumn("Fc_0004", unix_timestamp(dff("Fc_0004", "HH:mm:ss")).cast(TimestampType))

这将架构设为Fc_0004: timestamp (nullable = true)

希望这有帮助!