我正在尝试加载文件夹中的文本文件(超过1000个)。我可以把它变成一个大的list
现在我想检查是否存在特定的列名称并且我这样做
我做以下
sapply(my list, function(x) all(c("Transmittance: F112: Light, Sample " ) %in% names(x)))
还有许多其他列具有相同的名称,但我特别希望列具有Transmittance: F*
我有什么办法吗?因为最后我希望能够用其他列提取这些列
这是一个文件的一小部分
ldf<- list(structure(list(`Transmitance Ratio: (F648, Light) / (F648, Heavy)` = c(NA,
100, 0.768, NA, 0.676, NA, NA, 0.538, 0.482), `Transmitance Ratio (log2): (F648, Light) / (F648, Heavy)` = c(NA,
6.64, -0.38, NA, -0.56, NA, NA, -0.89, -1.05), `Transmitance s (Scaled): F648: Light, Sample` = c(NA,
200, 86.9, NA, 80.7, NA, NA, 69.9, 65), `Transmitance s (Scaled): F648: Heavy, Sample` = c(NA,
NA, 113.1, NA, 119.3, NA, NA, 130.1, 135), `Transmitance s (Normalized): F648: Light, Sample` = c(NA,
2e+05, 6.46e+08, NA, 2720000, NA, NA, 25800000, 5380000), `Transmitance s (Normalized): F648: Heavy, Sample` = c(NA,
NA, 8.42e+08, NA, 4030000, NA, NA, 4.8e+07, 11200000), `Transmitance : F648: Light, Sample` = c(NA,
2e+05, 6.46e+08, NA, 2720000, NA, NA, 25800000, 5380000), `Transmitance : F648: Heavy, Sample` = c(NA,
NA, 3.47e+08, NA, 1660000, NA, NA, 19700000, 4600000), `Transmitance s Count: F648: Light, Sample` = c(NA,
1L, 44L, NA, 4L, NA, NA, 4L, 2L), `Transmitance s Count: F648: Heavy, Sample` = c(NA,
NA, 44L, NA, 3L, NA, NA, 3L, 2L)), .Names = c("Transmitance Ratio: (F648, Light) / (F648, Heavy)",
"Transmitance Ratio (log2): (F648, Light) / (F648, Heavy)",
"Transmitance s (Scaled): F648: Light, Sample", "Transmitance s (Scaled): F648: Heavy, Sample",
"Transmitance s (Normalized): F648: Light, Sample", "Transmitance s (Normalized): F648: Heavy, Sample",
"Transmitance : F648: Light, Sample", "Transmitance : F648: Heavy, Sample",
"Transmitance s Count: F648: Light, Sample", "Transmitance s Count: F648: Heavy, Sample"
), row.names = c(NA, -9L), class = c("data.table", "data.frame"
)))
我只对使用任何扩展列
标识Transmitance : F
感兴趣
答案 0 :(得分:2)
你可以试试这个:
lapply(ldf, function(x) grep("^Transmitance : F.+", names(x), value = TRUE))
# [[1]]
# [1] "Transmitance : F648: Light, Sample" "Transmitance : F648: Heavy, Sample"
#
# [[2]]
# [1] "Transmitance : F648: Light, Sample1" "Transmitance : F648: Heavy, Sample1"
要实际提取列,而不仅仅是名称:
library(dplyr)
lapply(ldf, function(x) select(x, starts_with("Transmitance : F")))
# [[1]]
# Transmitance : F648: Light, Sample Transmitance : F648: Heavy, Sample
# 1 NA NA
# 2 2.00e+05 NA
# 3 6.46e+08 3.47e+08
# 4 NA NA
# 5 2.72e+06 1.66e+06
# 6 NA NA
# 7 NA NA
# 8 2.58e+07 1.97e+07
# 9 5.38e+06 4.60e+06
#
# [[2]]
# Transmitance : F648: Light, Sample1 Transmitance : F648: Heavy, Sample1
# 1 NA NA
# 2 2.00e+05 NA
# 3 6.46e+08 3.47e+08
# 4 NA NA
# 5 2.72e+06 1.66e+06
# 6 NA NA
# 7 NA NA
# 8 2.58e+07 1.97e+07
# 9 5.38e+06 4.60e+06
如果您希望将所有提取的列缩减为单个数据帧,则可以使用map_dfc
中的purrr
:
library(purrr)
map_dfc(ldf, function(x) select(x, starts_with("Transmitance : F")))
map_dfc
基本上将函数应用于提供列表的每个元素,并将所有元素的输出组合到带有cbind的数据框中。
数据:修改OP ldf
以获得更好的演示:
ldf[[2]] = ldf[[1]]
names(ldf[[2]]) = paste0(names(ldf[[1]]), 1)
根据OP在评论中的附加要求,还要提取&#34;传输率&#34;列,只需更改grep
的正则表达式:
lapply(ldf, function(x) grep("^Transmitance (: F|Ratio).+", names(x), value = TRUE))
start_with
中的 select
不会使用正则表达式,因此请改用matches
:
library(dplyr)
lapply(ldf, function(x) select(x, matches("^Transmitance (: F|Ratio).+")))
library(purrr)
map_dfc(ldf, function(x) select(x, matches("^Transmitance (: F|Ratio).+")))
答案 1 :(得分:1)
这将搜索字符串的开头以匹配模式并返回完整的字符串
lapply(ldf, function(x) grep(names(x), pattern = "^Transmitance : F", value = TRUE))
[[1]]
[1] "Transmitance : F648: Light, Sample" "Transmitance : F648: Heavy, Sample"
要提取这些列,请使用grepl
和子集
lapply(seq_along(ldf), function(x) ldf[[x]][grepl(names(ldf[[x]]), pattern = "^Transmitance : F")])
[[1]]
Transmitance : F648: Light, Sample Transmitance : F648: Heavy, Sample
1 NA NA
2 2.00e+05 NA
3 6.46e+08 3.47e+08
4 NA NA
5 2.72e+06 1.66e+06
6 NA NA
7 NA NA
8 2.58e+07 1.97e+07
9 5.38e+06 4.60e+06
答案 2 :(得分:0)
这应该有效:
lapply(ldf, function(x) grep("Transmitance : F", names(x), value = T))