从小标题列表中找到一种列的类型

时间:2019-03-07 05:23:25

标签: r lapply rbind

我有四个excel文件,已使用list.files加载到R中,并使用lapply读取它们。 我的代码是:

 my_files <- list.files(pattern = '*.xlsx')
 my_list <- lapply(my_files ,read_excel)

文件包含许多不同的列:

 lapply(my_list ,colnames)
 > lapply(my_list ,colnames)
 [[1]]
 [1] "JobCard Branch" "Customer Name" "Primary Contact No" "Alt No 1"          
 [5] "Alt No 2" "Reg No"            
 [[2]]
 [1] "CUSTOMER" "Primary Contact No"  "Alt No 1" "REG NO#"            
 [5] "VehModel" "Last Service Outlet"
 [[3]]
 [1] "Company Name" "JobCard Branch" "Service_Branch"          
 [4] "HUB" "Customer Code" "Address"                 
 [7] "Address Line2" "Primary Contact No" "Alt No 1"                
 [10] "Alt No 2" "Alt No 3" "Zip"                     
 [13] "Source" "City" "Vehicle Model"           
 [16] "Make" "Reg No" "Chasis No"               
 [[4]]
 [1] "Last Call Date" "Reg.No" "Model" "Customer Name"  "Contact Number" "Booked Outlet" 
 > 

有人可以让我知道是否可以使用rbind或任何其他功能从所有这些小标题中仅提取注册号列(“ Reg No”,“ REG NO#”,“ Reg No”,“ Reg.No”)

2 个答案:

答案 0 :(得分:1)

您可以尝试在不区分大小写的模式下使用grep

lapply(my_list, function(x) {
    y <- colnames(x)
    y[grep("\\breg\\b", y, ignore.case=TRUE)]
})

这在不区分大小写的模式下使用正则表达式模式\breg]b,以查找与所需内容匹配的列名。

答案 1 :(得分:0)

我们可以创建一个我们要提取的列名(cols)的向量,然后使用lapply遍历数据帧列表,并对与cols匹配的列进行子集化。 / p>

cols <- c("Reg No","REG NO#","Reg No","Reg.No")
data.frame(unlist(lapply(my_list, function(x) 
           x[names(x) %in% cols]), use.names = FALSE))

可复制的示例

df1 <- data.frame(a = 1:5, b = 2:6)
df2 <- data.frame(a1 = 1:4, new_s = 2:5)
df3 <- data.frame(abc = 1:4)
list_df <- list(df1, df2, df3)


cols <- c("a", "a1", "abc")
data.frame(new = unlist(lapply(list_df, function(x) 
                 x[names(x) %in% cols]),use.names = FALSE))

#     new
# 1     1
# 2     2
# 3     3
# 4     4
# 5     5
# 6     1
# 7     2
# 8     3
# 9     4
#10     1
#11     2
#12     3
#13     4