我在多个CSV文件中有来自书法设备的数据。我想从这些文件中分出三行,然后将它们合并到一个数据框中。
问题在于第一行仅在第一列中有数据(在Excel中认为1A),但是某些行在13列中具有数据。因此缺少几个列名。
我首先尝试将所有csv文件合并如下:
file.list <- list.files(pattern='*.csv')
df.list <- sapply(file.list, read.csv, simplify=FALSE)
library(dplyr)
df <- bind_rows(df.list, .id = "id")
此处的第二个命令给出“不允许重复的'row.names'”错误。我尝试添加row.names=NULL
参数,但这会导致“输入中无行”错误消息。
对于单个数据文件,我可以通过首先命名列然后对数据进行子集来达到所需的结果:
test <- read.csv("test3.csv",header=FALSE)
names(test) <-c("Column.A","Column.B","Column.C","Column.D","Column.E","Column.F","Column.G",
"Column.H","Column.I","Column.J","Column.K", "Column.L", "Column.M")
bar <- subset(test, Column.A =="Identity:" | Column.A == "Interval Type"| Column.A == "Sleep Summary" & Column.B == "Average(n)")
如何对给定文件夹中的所有csv文件重复类似的过程?
谢谢!
答案 0 :(得分:0)
我们可以通过定义一个辅助函数来对每个文件执行多项操作来做到这一点。假设每个文件具有相同的列数,并且names
向量中定义的names
与之相同。
library(tidyverse)
readFile <- function(file){
df <- read.csv(file, header=FALSE)
names(df) <-c("Column.A","Column.B","Column.C","Column.D","Column.E","Column.F","Column.G",
"Column.H","Column.I","Column.J","Column.K", "Column.L", "Column.M")
df <- subset(df, Column.A =="Identity:" | Column.A == "Interval Type"| Column.A == "Sleep Summary" & Column.B == "Average(n)")
return(df)
}
file.list <- list.files(pattern='*.csv')
df.list <- sapply(file.list, readFile, simplify=FALSE) %>% bind_rows()
答案 1 :(得分:0)
仅在sapply
中扩展您的功能。实际上,请使用read.csv
的 col.names 参数,并在LETTERS
调用中使用paste0
向量。此外,在%in%
中使用subset
运算符:
df.list <- sapply(file.list, function(f) {
# READ DATA AND ASSIGN COL NAMES
tmp <- read.csv(f, header=FALSE, col.names = paste0("Column.", LETTERS[1:13])
# SUBSET DATA
bar <- subset(tmp, Column.A %in% c("Identity:", "Interval Type") |
Column.A == "Sleep Summary" & Column.B == "Average(n)")
}, simplify=FALSE)
final_df <- do.call(rbind, df.list)
答案 2 :(得分:0)