我有一个csv文件,其中包含一百万行dateTime,longitude,lattitude值。截至目前,我已经提取了最小的经纬度值。我的目标是在此数据集上形成大小为0.01 * 0.01的单元格。我想在scala中执行以下操作。
创建一个二维数组:grid [rows] [cols]
for r <- 0 until rows
for c <- 0 until cols
if( (tuple(longitude) <= minimumlongitude + (0.01 * r) AND tuple(longitude) <= minimumlongitude + (0.01 * r+1) ) AND ((tuple(lattitude) <= minimumlattitude + (0.01 * r) AND tuple(lattitude) <= minimumlattitude + (0.01 * r+1) ))
then
grid[r][c].append(tuple)
基本上,确定点(x,y)属于网格中的哪个单元格,并将所有这些点分组以表示特定单元格。
编辑1:示例输入如下:
(10/23/2015, -73.1111112212, 45.2)
(10/23/2015, -73.1555555121, 45.20005011)
(10/23/2015, -73.1112232113, 45.20000051)
(10/20/2015, -73.1121243113, 45.20100011)
(10/20/2015, -73.1234123412, 45.20004011)
(10/23/2015, -73.1521233123, 45.20000211)
(10/23/2015, -73.1531231233, 45.20000011)
... upto about 10 million rows.
我所做的是,我已经提取了最小经度和最小纬度以及最大经度和最大纬度。所以,这形成了外部的大矩形。现在,我想将这个矩形划分为0.01 * 0.01大小的单元格。例如,第一个单元格将是(minlattitude,minlongitude),(minlattitude + 0.01,minlongitude + 0.01)。然后,我想根据条件将每行数据映射到它所属的单元格
rowOfData.longitude&gt; = cell.minLongitude&amp;&amp; rowOfData.longitude&lt; cell.minLongitude + 0.01&amp;&amp;
rowOfData.lattidude&gt; = cell.minLattitude&amp;&amp; rowOfData.lattidude&lt; cell.minLattitude + 0.01
有人可以告诉我该如何去做吗?而且,由于数据集的大小,更有效。非常感谢任何帮助
答案 0 :(得分:2)
从给定边界框的最小坐标开始,对给定分辨率的单元格中的坐标数据集进行分组:
val resolution = 0.01
val sampleData = "/.../sampleGeoCoordinatesWithTs.csv"
val data = sparkSession.read.option("inferSchema", "true").csv(sampleData).toDF("date","lat","long")
import org.apache.spark.sql.Row
val Row(minLat:Double, minLong:Double) = data.select(min($"lat"),min($"long")).head
def cellUdf(minValue:Double, res:Double) = udf((x:Double) => ((x-minValue)/res).toInt)
val latCoordsUdf = cellUdf(minLat, resolution)
val longCoordsUdf = cellUdf(minLong, resolution)
val relData = data.withColumn("cellx",latCoordsUdf($"lat")).withColumn("celly", longCoordsUdf($"long"))
relData.show(10)
+----------+--------------+-----------+-----+-----+
| date| lat| long|cellx|celly|
+----------+--------------+-----------+-----+-----+
|10/23/2015|-73.1111112212| 45.2| 4| 0|
|10/23/2015|-73.1555555121|45.20005011| 0| 0|
|10/23/2015|-73.1112232113|45.20000051| 4| 0|
|10/20/2015|-73.1121243113|45.20100011| 4| 0|
|10/20/2015|-73.1234123412|45.20004011| 3| 0|
|10/23/2015|-73.1521233123|45.20000211| 0| 0|
|10/23/2015|-73.1531231233|45.20000011| 0| 0|
|10/23/2015|-73.1114423304|45.21100003| 4| 1|
|10/23/2015|-73.1443144233|45.22130002| 1| 2|
|10/23/2015|-73.1283500011|45.21900001| 2| 1|
+----------+--------------+-----------+-----+-----+