我必须将一个Excel文件导入到R中,但到目前为止我发现的每个教程都是关于简单的数据表,而我的更复杂一点。你能帮我解决这个问题吗?
https://drive.google.com/file/d/1R5sVaP20MDLlaY6TLesrCj664wJYUhDG/view?usp=sharing
非常感谢你!
答案 0 :(得分:1)
library(xlsx)
library(zoo)
# Read the dataset starting form the 3rd line
df <- read.xlsx("SO.xlsx", 1, header=TRUE,startRow=3, stringsAsFactors=FALSE)
# Clean the data to remove the lines that should not be there
# like the lines 4 and 66 in this dataset
# this could be done many ways. Here I assume that all columns starting from the third
# should have some values
df <- df[!is.na(df$hallos),]
# Assign the names to the first 2 columns
names(df)[1:2] <- c( "year", "type")
# The last 2 rows are summaries, so we probably want to remove them
df <- df[!grepl("",df$type),]
# The first column "year" has many missing values. We need to add year values to each cell:
df$year <- na.locf(df$year)
警告:由于此文本框格式的限制,以下结果缺少某些带重音的字符,但在R环境中,类型列中列和符号的名称将正确显示。
# Result
head(df)
# year type hallos slyos. knny sszesen meghalt slyosan knnyen sszesen.1
# 2 2013 J 28 255 622 905 33 300 870 1203
# 3 2013 F 31 223 527 781 34 248 764 1046
# 4 2013 M 34 274 691 999 34 320 971 1325
# 5 2013 A 36 349 757 1142 42 392 1090 1524
# 6 2013 Mj 52 436 902 1390 54 501 1241 1796
# 7 2013 J 39 455 1004 1498 41 509 1414 1964