Question

我有两个数据集。第一个具有某人的位置及其到不同目的地的距离（以英里为单位）。第二个数据集包含所有目的地的列表。我希望R创建一列，以拉出距离不到1000英里的每个目的地的名称。

这是第一个数据集的示例：

library(tidyverse)
start_location <- tibble(location = c("Anhui China", "Amersfoort Utrecht Netherlands", "Akita Akita Japan"),
lon = c(117.92, 5.38, 140.1),
lat = c(30.60, 52.16, 39.71),
dist_beijing = c(658, 5686, 1250),
dist_shanghai = c(241, 5510, 1200),
dist_tokyo = c(1300, 5775, 280),
dist_prague = c(5173, 417, 5415), 
dist_pomezia = c(5555, 474, 5927),
dist_antwerp = c(5498, 77, 5612))

这是第二个数据集

library(tidyverse)
destinations <- tibble(destinations = c("beijing china", "shanghai china", "tokyo japan", "prague czech republic", "pomezia italy", "antwerp belgium"),
lon = c(116.4, 121.47, 139.65, 14.43, 12.50, 4.40),
lat = c(39.90, 31.23, 35.67, 50.07, 41.67, 51.22))

这就是我想要数据集的样子：

library(tidyverse)
solution <- tibble(location = c("Anhui China", "Amersfoort Utrecht Netherlands", "Akita Akita Japan"),
lon = c(117.92, 5.38, 140.1),
lat = c(30.60, 52.16, 39.71),
nearest1 = c("shanghai china", "antwerp belgium", "tokyo japan"),
nearest2 = c("beijing china", "prague czech republic", NA),
nearest3 = c(NA, "pomezia italy", NA))

我知道如何使它找到最短的距离，但是我正在努力使它产生每个列的名称。另外，尽管这个距离最近的三个，但我并不一定要将其限制为仅3个。我只希望它为1000英里以下的每个目的地创建列。

我假设我应该使用case_when和pmap，但是我不知道如何添加if语句并允许它创建多列。

如果它不能轻松地创建列，我也可以将其列成一列，按顺序列出所有1000英里以下的目的地（例如，如果“北京中国，中国上海”），因为那样我就可以至少与提迪尔分开。

如果可能的话，我还要一个整洁的解决方案。

谢谢！

Answer 1

这是一个整洁的解决方案：

result<-start_location %>% gather("destination","distance",-(1:3)) %>%
  filter(distance<=1000) %>% 
  group_by(location) %>% 
  arrange(distance) %>% 
  mutate(id=paste0("nearest",row_number())) %>% 
  select(-5) 
result$destination<-gsub("dist_","",result$destination)
result$destination<-sapply(result$destination, function(x) grep(x,destinations$destinations,value=TRUE))
result<-result %>% spread(id, destination)

# A tibble: 3 x 6
# Groups:   location [3]
  location                     lon   lat nearest1       nearest2          nearest3   
  <chr>                      <dbl> <dbl> <chr>          <chr>             <chr>      
1 Akita Akita Japan         140.    39.7 tokyo japan    NA                NA         
2 Amersfoort Utrecht Nethe~   5.38  52.2 antwerp belgi~ prague czech rep~ pomezia it~
3 Anhui China               118.    30.6 shanghai china beijing china     NA

关键在于按距离排列目的地（已按起始位置分组的目的地），然后根据其顺序分配id标签-然后您可以spread将目的地分成几列基于这些id标签。

我在spread之前添加了几步，用destinations数据帧中目标的实际名称替换目标列的名称-如果有目标，可能会引入一些错误也是一个国家/地区名称的城市（例如，墨西哥城），并且该国家/地区也出现在另一个目的地，因此请记住这一点。

为满足条件R的每一行新建一个列

1 个答案: