Question

以下是我想做的事情：

写一个for循环，检查数据frame1中一列的值是否在数据frame2的特定列中，然后将数据frame2中的两列添加到数据帧1.听起来很简单，对吧？

这是我到目前为止所做的：

ID <- c(seq(1:5))

zip_codes <- c("47304", "46011", "47305", "46033", "46044")

data <- data.frame(ID, zip_codes)

library(zipcode)

data("zipcode")


data_zip <- zipcode[1:25000, c("zip", "latitude", "longitude")]

data$lat <- 0
data$long <- 0

for (i in data$zip_codes){
  if (i %in% data_zip[,1]) {
    data$lat <- data_zip[i, 2]
    data$long <- data_zip[i, 3]
  }
}

for循环运行没有错误，并填充'data'数据框中的两列，但只将NAs放在那里。我检查了for循环和if语句，它们运行正常，这让我觉得它可能是一个索引问题，必须与[i，2]和[i，3]。以下是数据框在循环之前和之后的样子：

在：

 ID zip_codes lat long
1  1     47304   0    0
2  2     46011   0    0
3  3     47305   0    0
4  4     46033   0    0
5  5     46044   0    0

后：

  ID zip_codes lat long
1  1     47304  NA   NA
2  2     46011  NA   NA
3  3     47305  NA   NA
4  4     46033  NA   NA
5  5     46044  NA   NA

我会感谢任何指示 - 也许我正在思考这个问题，并且那里有一个更简单的解决方案......

Answer 1

以下是其他许多选项：

base R no loop

您还可以使用match：

来避免循环

cbind(data, data_zip[match(data$zip_codes, data_zip$zip), ])

      ID zip_codes   zip latitude longitude
21464  1     47304 47304 40.21540 -85.43636
20815  2     46011 46011 40.11291 -85.73700
21465  3     47305 47305 40.19229 -85.38494
20826  4     46033 46033 39.97373 -86.08875
20835  5     46044 46044 40.22121 -85.77612

甚至：cbind(data, data_zip[match(data$zip_codes, data_zip$zip), -1])摆脱重复的zip列（一旦你有＆＃34;选中＆＃34; match完成工作）。此选项不需要额外的包，并且可能比循环选项快得多。

基础R - 循环

如果你真的想要一个循环（你的确没有为data$lat/long正确分配值，那么这里有两个：

# this one around your original code
for (i in 1:nrow(data)){
    data$lat[i]  <- data_zip[data_zip$zip == data$zip_codes[i], "latitude"]
    data$long[i] <- data_zip[data_zip$zip == data$zip_codes[i], "longitude"]
}

# shorter alternative
for (i in 1:nrow(data)){
  data[i, 3:4]  <- data_zip[data_zip$zip == data$zip_codes[i], c("latitude", "longitude")]
}

dplyr

*_join包中的

dplyr（例如left_join）是一种更具可读性且可能更简单的替代方案（但它还需要一个包）。可以找到join的图形说明there。

Answer 2

您也可以使用合并功能代替for loop：

library(dplyr)
df <- merge(data, data_zip, by.x = "zip_codes", by.y = "zip", all.x = T) %>%
      arrange(ID) %>% select(ID, zip_codes, lat = latitude, long = longitude)

> df
  ID zip_codes      lat      long
1  1     47304 40.21540 -85.43636
2  2     46011 40.11291 -85.73700
3  3     47305 40.19229 -85.38494
4  4     46033 39.97373 -86.08875
5  5     46044 40.22121 -85.77612

R循环以查看dataframe1中的列值是否与dataframe2中的列值匹配

2 个答案:

base R no loop

基础R - 循环

dplyr