这是我之前提出的问题的延续:Dplyr select_ and starts_with on multiple values in a variable list
我从不同位置的不同传感器收集数据,数据输出类似于:
df<-data.frame(date=c(2011,2012,2013,2014,2015),"Sensor1 Temp"=c(15,18,15,14,19),"Sensor1 Pressure"=c(1001, 1000, 1002, 1004, 1000),"Sensor1a Temp"=c(15,18,15,14,19),"Sensor1a Pressure"=c(1001, 1000, 1002, 1004, 1000), "Sensor2 Temp"=c(15,18,15,14,19),"Sensor2 Pressure"=c(1001, 1000, 1002, 1004, 1000), "Sensor2 DewPoint"=c(10,11,10,9,12),"Sensor2 Humidity"=c(90, 100, 90, 100, 80))
问题是(我认为)类似于:Using select_ and starts_with R 要么 select columns based on multiple strings with dplyr
我想按位置搜索传感器,因此我有一个列表来搜索数据帧并包含时间戳。但是当我搜索多个传感器(或传感器类型等)时,搜索就会崩溃。有没有办法使用dplyr(NSE或SE)来实现这一目标?
FindLocation = c("date", "Sensor1", "Sensor2")
df %>% select(matches(paste(FindLocation, collapse="|"))) # works but picks up "Sensor1a" and "DewPoint" and "Humidity" data from Sensor2
此外,我想添加混合搜索,例如:
FindLocation = c("Sensor1", "Sensor2") # without selecting "Sensor1a"
FindSensor = c("Temp", "Pressure") # without selecting "DewPoint" or "Humidity"
我希望select将FindSensor与FindLocation结合使用,并为Sensor1和Sensor2选择Temp和Pressure数据(不选择Sensor1a)。返回包含数据和列标题的数据框:
日期,Sensor1 Temp,Sensor1 Pressure,Sensor2 Temp,Sensor2 Pressure
再次感谢!
答案 0 :(得分:2)
purrr
中的某些功能将非常有用。首先,您使用cross2
来计算FindLocation
和FindSensor
的笛卡尔积。你会得到一对配对清单。然后使用map_chr
将paste
应用于它们,使用点(.
)连接位置和传感器字符串。然后使用one_of
帮助程序选择列。
library(purrr)
FindLocation = c("Sensor1", "Sensor2")
FindSensor = c("Temp", "Pressure")
columns = cross2(FindLocation, FindSensor) %>%
map_chr(paste, collapse = ".")
df %>% select(one_of(columns))
答案 1 :(得分:2)
我们可以使用
# First skip the 25 lines
data = read.csv(file, skip = 25, header = T)
# Then remove all other empty rows
data[rowSums(is.na(data)) != ncol(data),]
答案 2 :(得分:1)
如下:
library(tidyverse)
wich_col <- df %>% names %>% strsplit("[.]") %>% map_lgl(function(x)x[1]%in%FindLocation&x[2]%in%FindSensor)
df[wich_col]