如何使用行号列表在数据框列中查找值

时间:2018-10-14 01:41:03

标签: r lapply

我有一个很大的数据框,其中包含各列,一个数据框是一个名为“ code”的ID代码,另一个是两个火车站之间的斜杠“ name”所组成的名称。

我想搜索与站点名称相关的所有代码(并能够一次查找多个站点),以便为我提供包含每个站点多个代码的向量列表。

我曾经用lapply获取每个站点的行,但是现在我无法在与行号关联的“代码”列中查找值。

SearchFor <- c("Chicago", "New York", "Atlanta")
lapply(c(SearchFor,grep,x=datastations$name)

我有以下列表:

$`Chicago`
 [1]  29  64 135 160 164 167 176 186 225 247 248 

$New York
 [1]  51  53 109 111 112 164 


$Atlanta
[1]   4  78 168 237 291

基本上,我想将这些数字中的每一个都更改为这些行上“代码”列的值。

在使用dput之后,这是我的数据表“数据站”:

structure(list(code = c(6000L, 6001L, 6002L, 6003L, 6004L, 6005L, 
6006L, 6007L, 6008L, 6009L, 6010L, 6011L, 6012L, 6013L, 6014L, 
6015L, 6016L, 6017L, 6018L, 6019L, 6020L, 6021L, 6022L, 6023L, 
6024L, 6025L, 6026L, 6027L, 6028L, 6029L, 6030L, 6031L, 6032L, 
6033L, 6034L, 6035L, 6036L, 6037L, 6038L, 6039L, 6040L, 6041L, 
6042L, 6043L, 6044L, 6045L, 6046L, 6047L, 6048L, 6049L, 5000L, 
5001L, 5002L, 5003L, 5004L, 5005L, 5006L, 5007L, 5008L, 6050L, 
6051L, 6052L, 6053L, 6054L, 6055L, 6056L, 6057L, 6058L, 6059L, 
6060L, 6061L, 6062L, 6063L, 6064L, 6065L, 6066L, 6067L, 6068L, 
6069L, 6070L, 6071L, 6072L, 6073L, 6074L, 6075L, 6076L, 6077L, 
6078L, 6079L, 6080L, 6081L, 6082L, 6083L, 6084L, 6085L, 6086L, 
6087L, 6088L, 6089L, 6090L, 6091L, 5009L, 5010L, 5011L, 5012L, 
6092L, 6093L, 6094L, 6095L, 6096L, 6097L), name = c("Atlanta / New York", 
"Atlanta / Chicago", "Atlanta / Miami", "Atlanta / Los Angeles", 
"Atlanta / Toronto", "Atlanta / Washington", "Atlanta / Cleveland", 
"Atlanta / Raleigh", "Atlanta / Newark", "Atlanta / Ottawa", 
"Atlanta / Detroit", "Atlanta / Albany", "Atlanta / Hartford", 
"Atlanta / Providence", "New York / Chicago", "New York / Miami", 
"New York / Los Angeles", "New York / Toronto", "New York / Washington", 
"New York / Cleveland", "New York / Raleigh", "New York / Newark", 
"New York / Ottawa", "New York / Detroit", "New York / Albany", 
"New York / Hartford", "New York / Providence", "Chicago / Miami", 
"Chicago / Los Angeles", "Chicago / Toronto", "Chicago / Washington", 
"Chicago / Cleveland", "Chicago / Raleigh", "Chicago / Newark", 
"Chicago / Ottawa", "Chicago / Detroit", "Chicago / Albany", 
"Chicago / Hartford", "Chicago / Providence", "Miami / Los Angeles", 
"Miami / Toronto", "Miami / Washington", "Miami / Cleveland", 
"Miami / Raleigh", "Miami / Newark", "Miami / Ottawa", "Miami / Detroit", 
"Miami / Albany", "Miami / Hartford", "Miami / Providence", "Toronto /             Washington", 
"Toronto / Cleveland", "Toronto / Raleigh", "Toronto / Newark", 
"Toronto / Ottawa", "Toronto / Detroit", "Toronto / Albany", 
"Toronto / Hartford", "Toronto / Providence", "Los Angeles / Toronto", 
"Los Angeles / Washington", "Los Angeles / Cleveland", "Los Angeles /         Raleigh", 
"Los Angeles / Newark", "Los Angeles / Ottawa", "Los Angeles / Detroit", 
"Los Angeles / Albany", "Los Angeles / Hartford", "Los Angeles / Providence", 
"Washington / Washington", "Washington / Cleveland", "Washington / Raleigh", 
"Washington / Newark", "Washington / Ottawa", "Washington / Detroit", 
"Washington / Hartford", "Washington / Providence", "Raleigh / Newark", 
"Raleigh / Ottawa", "Raleigh / Detroit", "Raleigh / Albany", 
"Raleigh / Hartford", "Raleigh / Providence", "Cleveland / Raleigh", 
"Cleveland / Newark", "Cleveland / Ottawa", "Cleveland / Detroit", 
"Cleveland / Albany", "Cleveland / Hartford", "Cleveland / Providence", 
"New York / Newark", "New York / Ottawa", "New York / Detroit", 
"New York / Albany", "New York / Hartford", "New York / Providence", 
"Newark / Ottawa", "Newark / Detroit", "Newark / Albany", "Newark /         Hartford", 
"Newark / Providence", "Ottawa / Detroit", "Ottawa / Albany", 
"Ottawa / Hartford", "Ottawa / Providence", "Detroit / Albany", 
"Detroit / Hartford", "Detroit / Providence", "Albany / Hartford", 
"Albany / Providence", "Hartford / Providence")), class = "data.frame",     row.names = c(NA, 
-111L))

我通过使用此代码读取.csv文件来获得此数据库

read.csv(file, colClasses = 
c(rep("integer",1),rep("character",1),rep("NULL",2)))

我想应用以下内容:

List[1] <- datastations$code[List[[1]]]

但是在列表的每个向量上,无论有多少(基本上没有循环)

2 个答案:

答案 0 :(得分:0)

也许这就是您想要的?在阅读问题时,您需要列出与特定城市或一组城市相对应的所有车站代码。如果这看起来很有趣,可能是您的dput中输入了错误的站号。

library(dplyr)
codelist <- df %>% filter(grepl("Chicago",name)) %>% select(code)

> unlist(codelist)
 code1  code2  code3  code4  code5  code6  code7  code8  code9 code10 code11 code12 code13 code14 
  6001   6014   6027   6028   6029   6030   6031   6032   6033   6034   6035   6036   6037   6038 

或对于多个电台:

> codelist <- df %>% filter(grepl("Chicago|New York|Atlanta",name)) %>% select(code)
> unlist(codelist)
 code1  code2  code3  code4  code5  code6  code7  code8  code9 code10 code11 code12 code13 code14 code15 
  6000   6001   6002   6003   6004   6005   6006   6007   6008   6009   6010   6011   6012   6013   6014 
code16 code17 code18 code19 code20 code21 code22 code23 code24 code25 code26 code27 code28 code29 code30 
  6015   6016   6017   6018   6019   6020   6021   6022   6023   6024   6025   6026   6027   6028   6029 
code31 code32 code33 code34 code35 code36 code37 code38 code39 code40 code41 code42 code43 code44 code45 
  6030   6031   6032   6033   6034   6035   6036   6037   6038   6081   6082   6083   6084   6085   6086 

答案 1 :(得分:0)

就像其他人在上面的评论中所说的那样,最终结果是什么样子还不清楚。但是,如果我理解正确,我想这可能就是您想要的。

我在这里使用map包中的purrr遍历一个城市名称的向量,并为每个城市获取一个代码向量,使用set_names来命名元素最终名单中的城市。

library(dplyr)
library(stringr)
library(purrr)

# load data as df (see below) 

cities <- c("Chicago", "New York", "Atlanta")

get_city_stations <- function(city, station_data) {
  station_data %>% 
    filter(str_detect(name, city)) %>% 
    pull(code)
}

codes <- map(cities, get_city_stations, station_data = df) %>% set_names(cities)

codes
#> $Chicago
#>  [1] 6001 6014 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038
#> 
#> $`New York`
#>  [1] 6000 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026
#> [15] 6081 6082 6083 6084 6085 6086
#> 
#> $Atlanta
#>  [1] 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013

reprex package(v0.2.0)于2018-10-14创建。

df <- structure(list(code = c(6000L, 6001L, 6002L, 6003L, 6004L, 6005L, 
6006L, 6007L, 6008L, 6009L, 6010L, 6011L, 6012L, 6013L, 6014L, 
6015L, 6016L, 6017L, 6018L, 6019L, 6020L, 6021L, 6022L, 6023L, 
6024L, 6025L, 6026L, 6027L, 6028L, 6029L, 6030L, 6031L, 6032L, 
6033L, 6034L, 6035L, 6036L, 6037L, 6038L, 6039L, 6040L, 6041L, 
6042L, 6043L, 6044L, 6045L, 6046L, 6047L, 6048L, 6049L, 5000L, 
5001L, 5002L, 5003L, 5004L, 5005L, 5006L, 5007L, 5008L, 6050L, 
6051L, 6052L, 6053L, 6054L, 6055L, 6056L, 6057L, 6058L, 6059L, 
6060L, 6061L, 6062L, 6063L, 6064L, 6065L, 6066L, 6067L, 6068L, 
6069L, 6070L, 6071L, 6072L, 6073L, 6074L, 6075L, 6076L, 6077L, 
6078L, 6079L, 6080L, 6081L, 6082L, 6083L, 6084L, 6085L, 6086L, 
6087L, 6088L, 6089L, 6090L, 6091L, 5009L, 5010L, 5011L, 5012L, 
6092L, 6093L, 6094L, 6095L, 6096L, 6097L), name = c("Atlanta / New York", 
"Atlanta / Chicago", "Atlanta / Miami", "Atlanta / Los Angeles", 
"Atlanta / Toronto", "Atlanta / Washington", "Atlanta / Cleveland", 
"Atlanta / Raleigh", "Atlanta / Newark", "Atlanta / Ottawa", 
"Atlanta / Detroit", "Atlanta / Albany", "Atlanta / Hartford", 
"Atlanta / Providence", "New York / Chicago", "New York / Miami", 
"New York / Los Angeles", "New York / Toronto", "New York / Washington", 
"New York / Cleveland", "New York / Raleigh", "New York / Newark", 
"New York / Ottawa", "New York / Detroit", "New York / Albany", 
"New York / Hartford", "New York / Providence", "Chicago / Miami", 
"Chicago / Los Angeles", "Chicago / Toronto", "Chicago / Washington", 
"Chicago / Cleveland", "Chicago / Raleigh", "Chicago / Newark", 
"Chicago / Ottawa", "Chicago / Detroit", "Chicago / Albany", 
"Chicago / Hartford", "Chicago / Providence", "Miami / Los Angeles", 
"Miami / Toronto", "Miami / Washington", "Miami / Cleveland", 
"Miami / Raleigh", "Miami / Newark", "Miami / Ottawa", "Miami / Detroit", 
"Miami / Albany", "Miami / Hartford", "Miami / Providence", "Toronto /             Washington", 
"Toronto / Cleveland", "Toronto / Raleigh", "Toronto / Newark", 
"Toronto / Ottawa", "Toronto / Detroit", "Toronto / Albany", 
"Toronto / Hartford", "Toronto / Providence", "Los Angeles / Toronto", 
"Los Angeles / Washington", "Los Angeles / Cleveland", "Los Angeles /         Raleigh", 
"Los Angeles / Newark", "Los Angeles / Ottawa", "Los Angeles / Detroit", 
"Los Angeles / Albany", "Los Angeles / Hartford", "Los Angeles / Providence", 
"Washington / Washington", "Washington / Cleveland", "Washington / Raleigh", 
"Washington / Newark", "Washington / Ottawa", "Washington / Detroit", 
"Washington / Hartford", "Washington / Providence", "Raleigh / Newark", 
"Raleigh / Ottawa", "Raleigh / Detroit", "Raleigh / Albany", 
"Raleigh / Hartford", "Raleigh / Providence", "Cleveland / Raleigh", 
"Cleveland / Newark", "Cleveland / Ottawa", "Cleveland / Detroit", 
"Cleveland / Albany", "Cleveland / Hartford", "Cleveland / Providence", 
"New York / Newark", "New York / Ottawa", "New York / Detroit", 
"New York / Albany", "New York / Hartford", "New York / Providence", 
"Newark / Ottawa", "Newark / Detroit", "Newark / Albany", "Newark /         Hartford", 
"Newark / Providence", "Ottawa / Detroit", "Ottawa / Albany", 
"Ottawa / Hartford", "Ottawa / Providence", "Detroit / Albany", 
"Detroit / Hartford", "Detroit / Providence", "Albany / Hartford", 
"Albany / Providence", "Hartford / Providence")), class = "data.frame",     row.names = c(NA, 
-111L))