我正在尝试以cts.values()
的形式转换两个时间戳之间以分钟为单位的差异。我是使用SparkSQL的新手,并尝试使用其他SQL语法支持的基本MM/dd/yyyy hh:mm:ss AM/PM
函数,即datediff
,但这会产生错误:
datediff(minute,start_time,end_time)
sparkSQL的datediff似乎不支持org.apache.spark.sql.AnalysisException: cannot resolve '`minute`' given input columns: [taxisub.tpep_dropoff_datetime, taxisub.DOLocationID, taxisub.improvement_surcharge, taxisub.VendorID, taxisub.trip_distance, taxisub.tip_amount, taxisub.tolls_amount, taxisub.payment_type, taxisub.fare_amount, taxisub.tpep_pickup_datetime, taxisub.total_amount, taxisub.store_and_fwd_flag, taxisub.extra, taxisub.passenger_count, taxisub.PULocationID, taxisub.mta_tax, taxisub.RatecodeID]; line 1 pos 153;
参数。我当前拥有的查询是:
minute
我的结果是:
spark.sqlContext.sql("Select to_timestamp(tpep_pickup_datetime,'MM/dd/yyyy hh:mm:ss') as pickup,to_timestamp(tpep_dropoff_datetime,'MM/dd/yyyy hh:mm:ss') as dropoff, datediff(to_timestamp(tpep_pickup_datetime,'MM/dd/yyyy hh:mm:ss'),to_timestamp(tpep_dropoff_datetime,'MM/dd/yyyy hh:mm:ss')) as diff from taxisub ").show()
假设结果为0,我假设+-------------------+-------------------+----+
| pickup| dropoff|diff|
+-------------------+-------------------+----+
|2018-12-15 08:53:20|2018-12-15 08:57:57| 0|
|2018-12-15 08:03:08|2018-12-15 08:07:30| 0|
|2018-12-15 08:28:34|2018-12-15 08:33:31| 0|
|2018-12-15 08:37:53|2018-12-15 08:43:47| 0|
|2018-12-15 08:51:02|2018-12-15 08:55:54| 0|
|2018-12-15 08:03:47|2018-12-15 08:03:50| 0|
|2018-12-15 08:45:21|2018-12-15 08:57:08| 0|
|2018-12-15 08:04:47|2018-12-15 08:29:05| 0|
|2018-12-15 08:01:22|2018-12-15 08:12:15| 0|
+-------------------+-------------------+----+
的默认值是“天数”。我应该使用其他参数/函数来确定这两个时间戳之间以分钟为单位的差异吗?
谢谢。
答案 0 :(得分:3)
在Spark sql中有2种方法可以做到这一点。您将timestamp列强制转换为bigint,然后除以60即可;您可以直接将其强制转换为unix_timestamp,然后乘以60除以得到结果。我在上面的数据框中使用了拾取和放置列。(在pyspark / scala spark中,bigint长)
public static void ExampleOne()
{
// Create the full image
Mat fullImage = new Mat(100, 100, DepthType.Cv8U, 3);
fullImage.SetTo(new Bgr(255, 255, 255).MCvScalar);
// Create an ArUco marker
Mat marker = new Mat(40, 40, DepthType.Cv8U, 3);
Dictionary arucoDictionary = new Dictionary(Dictionary.PredefinedDictionaryName.Dict6X6_250);
ArucoInvoke.DrawMarker(arucoDictionary, 1, 20, marker);
// Create a section of the full image for the ArUco Marker to sit in
Mat section = new Mat(fullImage, new Rectangle(new Point(0, 0), marker.Size));
section.SetTo(new Bgr(255, 0, 0).MCvScalar); // Set to blue for testing
// Debug stuff
bool sectionIsSubMatrix1 = section.IsSubmatrix;
IntPtr sectionDataPointer1 = section.DataPointer;
// Convert the ArUco marker to the same depth and channels as the original
Mat marker8U = new Mat(marker.Size, section.Depth, section.NumberOfChannels);
marker.ConvertTo(marker8U, marker8U.Depth);
// Copy the marker into the section
marker8U.CopyTo(section);
// Debug stuff
bool sectionIsSubMatrix2 = section.IsSubmatrix;
IntPtr sectionDataPointer2 = section.DataPointer;
Debug.WriteLine("Section 1 {0} 0x{1:X}", sectionIsSubMatrix1, (long)sectionDataPointer1);
Debug.WriteLine("Section 2 {0} 0x{1:X}", sectionIsSubMatrix2, (long)sectionDataPointer2);
}
spark.sqlContext.sql("""select pickup, dropoff, (unix_timestamp(dropoff)-unix_timestamp(pickup))/(60) as diff from taxisub""").show()
输出:
spark.sqlContext.sql("""select pickup, dropoff, ((bigint(to_timestamp(dropoff)))-(bigint(to_timestamp(pickup))))/(60) as diff from taxisub""").show()