我有两组坐标,试图找到最接近的坐标匹配。鉴于一个数据集由100万条记录组成,另一个数据集近50万条记录,寻找更好的方法来完成这项任务并需要建议。
第一个数据集的输入是
structure(list(longitude = c(-2.5168477762, -2.5972432832, -2.5936692407,
-2.5943475677, -2.5923214528, -2.5919014869, -2.5913454553, -2.5835739992,
-2.5673150195, -2.5683356381), latitude = c(51.4844052488, 51.45278562,
51.4978889752, 51.4979844501, 51.4983813479, 51.4982126232, 51.4964350456,
51.4123728037, 51.4266239227, 51.4265740193)), .Names = c("longitude",
"latitude"), row.names = c(NA, 10L), class = "data.frame")
第二个数据集的输入是
structure(list(longitude = c(-3.4385392589, -3.4690321528, -3.2723981534,
-3.3684012246, -3.329625956, -3.3093349806, 0.8718409198, 0.8718563602,
0.8643998472, 0.8644153057), latitude = c(51.1931124311, 51.206897181,
51.1271423704, 51.1618047221, 51.1805971356, 51.1663567178, 52.896084336,
52.896092955, 52.9496082626, 52.9496168824)), .Names = c("longitude",
"latitude"), row.names = 426608:426617, class = "data.frame")
我已经查看了R中的approx和findInterval函数,但对它们的工作方式并不完全了解它们。我要做的是从数据集1中获取坐标,并将它们与dataset2中的所有坐标相匹配,以找到最接近的匹配。目前我正在使用两个forloops,但由于数据的大小,它需要永远。
我试过的代码如下:
cns <- function(x,y)
{
a = NULL
b = NULL
for(i=1:nrow(x))
{
for(j=1:nrow(y))
{
a[j] = distm(c(x$longitude[i],x$latitude[i]),
c(y$longitude[j],y$latitude[j]),
fun = distVincentyEllipsoid)
}
b[i] = which(a == min(a))
}
return(y[b,])
}
上述函数从dataset1中取一个点并使用dataset2中的所有点计算距离,然后找到最小距离并返回该距离的坐标。
寻找可能是并行处理以在合适的时间内完成此任务。欢迎任何建议。
此致
答案 0 :(得分:2)
在R中,向量化通常比对于循环更有效:
Unit: milliseconds
expr min lq mean median uq max neval
cns(x, y) 42.46518 45.16829 46.61517 46.45560 47.09023 80.25171 100
cns2(x, y) 26.09484 27.33122 28.21505 28.07837 29.10225 30.74004 100
让我们评估差异:
cns3 <- function(x,y){
b <- numeric(length = nrow(y))
a<- distm(x=x,
y=y,
fun = distVincentyEllipsoid)
b<-apply(X = a,MARGIN = 1, which.min)
return(y[b,])
}
结果:
Unit: milliseconds
expr min lq mean median uq max neval
cns(x, y) 43.38928 45.69135 48.72223 46.70839 48.56951 135.80555 100
cns2(x, y) 25.96674 27.15066 28.86999 28.43569 29.99138 35.86383 100
cns3(x, y) 23.90187 24.84592 26.68738 25.87950 27.99075 34.71469 100
您已经将时间缩短了一半,没有并行计算。我们可以增加它吗?
> cns(x,y)
longitude latitude
426613 -3.309335 51.16636
426613.1 -3.309335 51.16636
426613.2 -3.309335 51.16636
426613.3 -3.309335 51.16636
426613.4 -3.309335 51.16636
426613.5 -3.309335 51.16636
426613.6 -3.309335 51.16636
426613.7 -3.309335 51.16636
426613.8 -3.309335 51.16636
426613.9 -3.309335 51.16636
> cns2(x,y)
longitude latitude
426613 -3.309335 51.16636
426613.1 -3.309335 51.16636
426613.2 -3.309335 51.16636
426613.3 -3.309335 51.16636
426613.4 -3.309335 51.16636
426613.5 -3.309335 51.16636
426613.6 -3.309335 51.16636
426613.7 -3.309335 51.16636
426613.8 -3.309335 51.16636
426613.9 -3.309335 51.16636
> cns3(x,y)
longitude latitude
426613 -3.309335 51.16636
426613.1 -3.309335 51.16636
426613.2 -3.309335 51.16636
426613.3 -3.309335 51.16636
426613.4 -3.309335 51.16636
426613.5 -3.309335 51.16636
426613.6 -3.309335 51.16636
426613.7 -3.309335 51.16636
426613.8 -3.309335 51.16636
426613.9 -3.309335 51.16636
基准回报:
LinearLayout tmpLL = (LinearLayout) convertView.findViewById(R.id.llUpgrades);
//remove previous list contents first
tmpLL.removeAllViews();
for(int i = 0; i<= tmpUpgradeList.size()-1; i++){
ImageView tmpIB = new ImageView(getContext());
Upgrade tmpUpgrade = tmpUpgradeList.get(i);
Upgrade.setUpgradePic(tmpIB, tmpUpgrade, tmpUpgrade.Title()==null);
tmpIB.setTag(position + ":" + i);
tmpIB.setPadding(5, 0, 0, 0);
tmpIB.setMaxWidth(50);
tmpLL.addView(tmpIB);
tmpIB.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
String[] split = ((String) v.getTag()).split(":");
runUpgradePopup(Integer.parseInt(split[0]), Integer.parseInt(split[1]));
}
});
tmpIB.setOnLongClickListener(new View.OnLongClickListener() {
@Override
public boolean onLongClick(View v) {
String[] split = ((String) v.getTag()).split(":");
clearUpgrade(Integer.parseInt(split[0]), Integer.parseInt(split[1]));
return true;
}
});
}
所以cns3似乎要快一点,但是通过替换foreach可以很容易地并行化cns2。
这是对的吗?这三种方法提供相同的输出。
<RelativeLayout
xmlns:android="http://schemas.android.com/apk/res/android"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
android:orientation="vertical"
android:paddingTop="5dp"
android:paddingBottom="5dp">
<Button
android:layout_width="100dp"
android:layout_height="60dp"
android:id="@+id/btnFRemoveShip"
android:text="Remove"/>
<ImageView
android:id="@+id/ivFRowShipIcon"
android:layout_height="60dp"
android:layout_width="75dp"
android:src="@android:drawable/ic_delete"
android:layout_marginLeft="10dp"
android:layout_toRightOf="@+id/btnFRemoveShip"/>
<TextView
android:layout_height="wrap_content"
android:ems="10"
android:layout_width="wrap_content"
android:id="@+id/tvFRowShipTitle"
android:text="error"
android:textSize="20dp"
android:layout_marginLeft="10dp"
android:layout_toRightOf="@+id/ivFRowShipIcon"/>
<HorizontalScrollView
android:orientation="horizontal"
android:layout_width="match_parent"
android:layout_height="75dp"
android:layout_marginTop="5dp"
android:layout_below="@+id/btnFRemoveShip">
<LinearLayout
android:orientation="horizontal"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
android:id="@+id/llUpgrades">
</LinearLayout>
</HorizontalScrollView>
</RelativeLayout>
按照你编写它的方式,你保持所有的联系,这可能是一个麻烦,因为b可能会被强制列入某个点。