SparkSQL-以分钟为单位的两个时间戳之间的差异

时间:2020-02-25 01:21:18

标签: sql apache-spark apache-spark-sql

我正在尝试以cts.values()的形式转换两个时间戳之间以分钟为单位的差异。我是使用SparkSQL的新手,并尝试使用其他SQL语法支持的基本MM/dd/yyyy hh:mm:ss AM/PM函数,即datediff,但这会产生错误:

datediff(minute,start_time,end_time)

sparkSQL的datediff似乎不支持org.apache.spark.sql.AnalysisException: cannot resolve '`minute`' given input columns: [taxisub.tpep_dropoff_datetime, taxisub.DOLocationID, taxisub.improvement_surcharge, taxisub.VendorID, taxisub.trip_distance, taxisub.tip_amount, taxisub.tolls_amount, taxisub.payment_type, taxisub.fare_amount, taxisub.tpep_pickup_datetime, taxisub.total_amount, taxisub.store_and_fwd_flag, taxisub.extra, taxisub.passenger_count, taxisub.PULocationID, taxisub.mta_tax, taxisub.RatecodeID]; line 1 pos 153; 参数。我当前拥有的查询是:

minute

我的结果是:

spark.sqlContext.sql("Select to_timestamp(tpep_pickup_datetime,'MM/dd/yyyy hh:mm:ss') as pickup,to_timestamp(tpep_dropoff_datetime,'MM/dd/yyyy hh:mm:ss') as dropoff, datediff(to_timestamp(tpep_pickup_datetime,'MM/dd/yyyy hh:mm:ss'),to_timestamp(tpep_dropoff_datetime,'MM/dd/yyyy hh:mm:ss')) as diff from taxisub ").show()

假设结果为0,我假设+-------------------+-------------------+----+ | pickup| dropoff|diff| +-------------------+-------------------+----+ |2018-12-15 08:53:20|2018-12-15 08:57:57| 0| |2018-12-15 08:03:08|2018-12-15 08:07:30| 0| |2018-12-15 08:28:34|2018-12-15 08:33:31| 0| |2018-12-15 08:37:53|2018-12-15 08:43:47| 0| |2018-12-15 08:51:02|2018-12-15 08:55:54| 0| |2018-12-15 08:03:47|2018-12-15 08:03:50| 0| |2018-12-15 08:45:21|2018-12-15 08:57:08| 0| |2018-12-15 08:04:47|2018-12-15 08:29:05| 0| |2018-12-15 08:01:22|2018-12-15 08:12:15| 0| +-------------------+-------------------+----+ 的默认值是“天数”。我应该使用其他参数/​​函数来确定这两个时间戳之间以分钟为单位的差异吗?

谢谢。

1 个答案:

答案 0 :(得分:3)

在Spark sql中有2种方法可以做到这一点。您将timestamp列强制转换为bigint,然后除以60即可;您可以直接将其强制转换为unix_timestamp,然后乘以60除以得到结果。我在上面的数据框中使用了拾取和放置列。(在pyspark / scala spark中,bigint长)

    public static void ExampleOne()
    {
        // Create the full image
        Mat fullImage = new Mat(100, 100, DepthType.Cv8U, 3);
        fullImage.SetTo(new Bgr(255, 255, 255).MCvScalar);

        // Create an ArUco marker
        Mat marker = new Mat(40, 40, DepthType.Cv8U, 3);
        Dictionary arucoDictionary = new Dictionary(Dictionary.PredefinedDictionaryName.Dict6X6_250);
        ArucoInvoke.DrawMarker(arucoDictionary, 1, 20, marker);

        // Create a section of the full image for the ArUco Marker to sit in
        Mat section = new Mat(fullImage, new Rectangle(new Point(0, 0), marker.Size));
        section.SetTo(new Bgr(255, 0, 0).MCvScalar); // Set to blue for testing

        // Debug stuff
        bool sectionIsSubMatrix1 = section.IsSubmatrix;
        IntPtr sectionDataPointer1 = section.DataPointer;

        // Convert the ArUco marker to the same depth and channels as the original
        Mat marker8U = new Mat(marker.Size, section.Depth, section.NumberOfChannels);
        marker.ConvertTo(marker8U, marker8U.Depth);

        // Copy the marker into the section
        marker8U.CopyTo(section);

        // Debug stuff
        bool sectionIsSubMatrix2 = section.IsSubmatrix;
        IntPtr sectionDataPointer2 = section.DataPointer;

        Debug.WriteLine("Section 1 {0} 0x{1:X}", sectionIsSubMatrix1, (long)sectionDataPointer1);
        Debug.WriteLine("Section 2 {0} 0x{1:X}", sectionIsSubMatrix2, (long)sectionDataPointer2);
    }

OR

spark.sqlContext.sql("""select pickup, dropoff, (unix_timestamp(dropoff)-unix_timestamp(pickup))/(60) as diff from taxisub""").show()

输出:

spark.sqlContext.sql("""select pickup, dropoff, ((bigint(to_timestamp(dropoff)))-(bigint(to_timestamp(pickup))))/(60) as diff from taxisub""").show()