Question

我在大街上有一些数据，我运行了R代码以获取列表中38个csv文件的内容（以后会添加更多文件）：

    common_path  <- "0_data/source_data/DB/Speed/"
    csv_files <- list.files(
    path = common_path,        # directory to search within
    pattern = ".*(1|2).*csv$", # 
    recursive = TRUE,          # search subdirectories
    full.names = TRUE          # return the full path
    )
    data_lst = lapply(csv_files, read.csv2)

他们的头看起来像这样：

Data Example

以下是可重复格式的数据框的开头：

structure(list(typ = c(100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 1L, 1L, 1L, 1L, 1L, 1L), date.and.time = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("2019/11/07 18:07:27.000", 
"2019/11/07 18:07:36.290", "2019/11/07 18:07:40.030", "2019/11/07 18:07:41.930", 
"2019/11/07 18:07:43.720", "2019/11/07 18:07:46.380", "2019/11/07 18:07:54.010"
), class = "factor"), speed..km.h. = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 42L, 44L, 43L, 42L, 41L, 43L), length..m. = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 3.2, 4.2, 3.2, 3.9, 3.7, 3.2), 
    range..m. = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 
    0L, 0L, 0L, 0L), notes = c("Serial No = 1", "Direction = NORTH", 
    "Counting type = SINGLE LANE", "Ref count sense = IN", "Install height = 42 decimeter", 
    "Axis distance = 58 decimeter", "Road type = STANDARD", "Road slope = FLAT", 
    "Start of campain", "", "", "", "", "", "")), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15"), class = "data.frame")

我想做的是：

从“注释”列获取前9行的信息
将“注释”列中的信息添加为单独的变量
此后，删除前9行或所有行，列“ typ” == 100

我可以手动对列表中的对象执行此操作，如下面的代码所示：

data_lst[[1]]$serial <- data_lst[[1]]$notes[1]
data_lst[[1]]$direction <- data_lst[[1]]$notes[2]
data_lst[[1]]$lane <- data_lst[[1]]$notes[3]
data_lst[[1]]$install_height <- data_lst[[1]]$notes[5]
data_lst[[1]]$axis <- data_lst[[1]]$notes[6]
data_lst[[1]]$notes <- NULL 
data_lst[[1]] <- data_lst[[1]][-c(1:9),]

但是当我尝试循环此过程时会出现问题，因为我对循环非常缺乏经验。我做了这样的事情，

for(i in data_lst){
  data_lst[[i]]$serial <- data_lst[[i]]$notes[1]
}

从我的数据中获取“序列”信息，但出现此错误：

error:
in data_lst[[i]] : invalid subscript type 'list'

热烈欢迎任何帮助：）

Answer 1

您进行for循环的方式i是列表，而不是索引。如果需要索引，则应使用函数seq_along，以为列表返回索引向量。检查以下示例：

> l = list("apple", "banana", "carrot")
> l
[[1]]
[1] "apple"

[[2]]
[1] "banana"

[[3]]
[1] "carrot"

> for(p in l) print(p)
[1] "apple"
[1] "banana"
[1] "carrot"
> for(i in seq_along(l)) print(i)
[1] 1
[1] 2
[1] 3

Answer 2

在for循环中，您始终必须指定循环的开始位置和结束位置。您要遍历列表中的每个元素，这意味着您需要for (i in seq_along(data_lst))。运行seq_along(data_lst)将创建一个从1到列表中元素数的序列。

Answer 3

如果要对列表中的每个条目执行相当复杂的操作，则最好编写一个函数，以将希望应用于每个条目的逻辑分开。这使您的代码更具可读性，更加模块化，并且将来更容易调试或修改。

在您的情况下，您可以编写一个函数以对列表中的每个数据框进行操作，以创建包含不同组件的命名列表：所需的所有命名注释以及修改后的数据框。也许是这样的：

change_data_frame_to_named_list <- function(old_frame)
{
  return(list(serial         = old_frame$notes[1],
              direction      = old_frame$notes[2],
              lane           = old_frame$notes[3],
              install_height = old_frame$notes[5],
              xaxis          = old_frame$notes[6],
              data           = old_frame[-which(old_frame$type == 100), -6]
              ))
}

现在您要做的就是将此功能应用于列表中的所有元素。在R中，最惯用的方法根本不是使用循环，而是使用lapply（适用于列表的缩写）。这会将列表作为第一个参数，而希望应用到每个元素的函数作为第二个参数。

这意味着您可以执行以下操作：

result <- lapply(data_lst, change_data_frame_to_named_list)

这等效于循环版本，但更短，更整齐。

如果您真的想循环执行，则等效为：

result <- list()
for (i in seq_along(data_lst))
{
  result[[i]] = change_data_frame_to_named_list(data_lst[[i]])
}

在任何一种情况下，变量result是与data_lst长度相同的列表，其中每个条目本身就是一个命名列表，其中包含新数据框及其关联的命名注释。

编辑

OP请求了一种类似的方法，该方法以他用手写循环制作的格式返回数据。这是可以实现的方式。由于逻辑已分离到函数中，因此我们只需要更改函数本身即可：

change_data_frame <- function(old_frame)
{
  old_frame$serial         <- old_frame$notes[1]
  old_frame$direction      <- old_frame$notes[2]
  old_frame$lane           <- old_frame$notes[3]
  old_frame$install_height <- old_frame$notes[5]
  old_frame$xaxis          <- old_frame$notes[6]
  old_frame$notes          <- NULL

  return(old_frame[-which(old_frame$typ == 100),])    
}

# Now you just do as you did before
result <- lapply(data_lst, change_data_frame)

# and to get all dfs into one big data frame...
do.call("rbind", result)

遍历/循环遍历列表

3 个答案: