我有很多.csv文件,我已将其保存在PC上的一个文件夹中。然后,我按如下方式创建这些数据集的列表:
> file_list <- list.files()
> file_list
[1] "ABWAbwut50.csv" "ABWEinfam50.csv" "ABWFeldwaldasph50.csv" "ABWGarage50.csv"
[5] "ABWGemeindestr50.csv" "ABWHotel50.csv" "ABWInd50.csv" "ABWIntflaechen50.csv"
[9] "ABWKantonsstr50.csv" "ABWMehrfam50.csv" "ABWNutzwald50.csv" "ABWSchutzwald50.csv"
[13] "ABWstahlmitvieh50.csv" "ABWStromut50.csv" "ABWWeideland50.csv"
.csv文件使用相同的列,小数使用.
,列由;
分隔。我尝试使用以下代码组合这些数据集:
for (file in file_list){
if (!exists("dataset")){
dataset <- read_delim(file, ";", escape_double = FALSE, trim_ws = TRUE)
}
}
dataset
但它只读取第一个文件。 如何才能将所有15个.csv文件合并到一个数据框中?
当我运行不同的代码时,我收到以下错误消息:
> View(dataset)
> dataset <- do.call("rbind",lapply(file_list,
+ FUN=function(files){read.table(files,
+ header=TRUE, sep=";")}))
Show Traceback
Rerun with Debug
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 103 did not have 8 elements
我认为出现了问题,其中一个文件(实际上我知道文件中只有几行)只有7列而不是8列。我不想分别查看每个文件以试图找到有一些异常现象。如何将这些不遵循该模式的行自动删除?
我的数据文件类似于:
> dput(dataset[1:10,])
structure(list(Berechnung = c("EconoMe original", "Berechnung 1",
"Berechnung 2", "Berechnung 3", "Berechnung 4", "Berechnung 5",
"Berechnung 6", "Berechnung 7", "Berechnung 8", "Berechnung 9"
), Situation = c("Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach"), NK = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), PID = c(2639L, 2639L, 2639L, 2639L,
2639L, 2639L, 2639L, 2639L, 2639L, 2639L), Case = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), Differenz = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0), Prozess = c("Murgang", "Murgang", "Murgang", "Murgang",
"Murgang", "Murgang", "Murgang", "Murgang", "Murgang", "Murgang"
), Objektart = c("Abwasser unter Terrain", "Abwasser unter Terrain",
"Abwasser unter Terrain", "Abwasser unter Terrain", "Abwasser unter Terrain",
"Abwasser unter Terrain", "Abwasser unter Terrain", "Abwasser unter Terrain",
"Abwasser unter Terrain", "Abwasser unter Terrain")), .Names = c("Berechnung",
"Situation", "NK", "PID", "Case", "Differenz", "Prozess", "Objektart"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
答案 0 :(得分:1)
其中一个文件可能在文本中包含;
。此解决方案使用您的第一个编码示例进行修改,以检查哪些文件包含问题。
file_list <- list.files()
# setup the dataset
dataset <- read.table(file_list[1], sep = ";", header = TRUE)
# cycle through all other files
for (file in file_list[-1]){
temp <- try(read.table(file, sep = ";", header = TRUE))
# check if the file can be read as a table
if(class(temp) == "try-error"){
message(paste("One file skipped. Correct mistakes in file", file))
print(temp)
next
}
dataset <- rbind(dataset, temp)
}