Dplyr select_和starts_with在变量列表部分2中的多个值

时间:2017-07-28 13:58:57

标签: r select dplyr purrr multiple-matches

这是我之前提出的问题的延续:Dplyr select_ and starts_with on multiple values in a variable list

我从不同位置的不同传感器收集数据,数据输出类似于:

df<-data.frame(date=c(2011,2012,2013,2014,2015),"Sensor1 Temp"=c(15,18,15,14,19),"Sensor1 Pressure"=c(1001, 1000, 1002, 1004, 1000),"Sensor1a Temp"=c(15,18,15,14,19),"Sensor1a Pressure"=c(1001, 1000, 1002, 1004, 1000), "Sensor2 Temp"=c(15,18,15,14,19),"Sensor2 Pressure"=c(1001, 1000, 1002, 1004, 1000), "Sensor2 DewPoint"=c(10,11,10,9,12),"Sensor2 Humidity"=c(90, 100, 90, 100, 80))

问题是(我认为)类似于:Using select_ and starts_with R 要么 select columns based on multiple strings with dplyr

我想按位置搜索传感器,因此我有一个列表来搜索数据帧并包含时间戳。但是当我搜索多个传感器(或传感器类型等)时,搜索就会崩溃。有没有办法使用dplyr(NSE或SE)来实现这一目标?

FindLocation = c("date", "Sensor1", "Sensor2")
df %>% select(matches(paste(FindLocation, collapse="|"))) # works but picks up "Sensor1a" and "DewPoint" and "Humidity" data from Sensor2 

此外,我想添加混合搜索,例如:

 FindLocation = c("Sensor1", "Sensor2") # without selecting "Sensor1a"
 FindSensor = c("Temp", "Pressure") # without selecting "DewPoint" or "Humidity"

我希望select将FindSensor与FindLocation结合使用,并为Sensor1和Sensor2选择Temp和Pressure数据(不选择Sensor1a)。返回包含数据和列标题的数据框:

日期,Sensor1 Temp,Sen​​sor1 Pressure,Sensor2 Temp,Sen​​sor2 Pressure

再次感谢!

3 个答案:

答案 0 :(得分:2)

purrr中的某些功能将非常有用。首先,您使用cross2来计算FindLocationFindSensor的笛卡尔积。你会得到一对配对清单。然后使用map_chrpaste应用于它们,使用点(.)连接位置和传感器字符串。然后使用one_of帮助程序选择列。

library(purrr)

FindLocation = c("Sensor1", "Sensor2")
FindSensor = c("Temp", "Pressure")

columns = cross2(FindLocation, FindSensor) %>%
  map_chr(paste, collapse = ".")

df %>% select(one_of(columns))

答案 1 :(得分:2)

我们可以使用

# First skip the 25 lines
data = read.csv(file, skip = 25, header = T)

# Then remove all other empty rows
data[rowSums(is.na(data)) != ncol(data),]

答案 2 :(得分:1)

如下:

library(tidyverse)
wich_col <- df %>% names %>% strsplit("[.]") %>% map_lgl(function(x)x[1]%in%FindLocation&x[2]%in%FindSensor)
df[wich_col]