Question

我必须将一个Excel文件导入到R中，但到目前为止我发现的每个教程都是关于简单的数据表，而我的更复杂一点。你能帮我解决这个问题吗？

https://drive.google.com/file/d/1R5sVaP20MDLlaY6TLesrCj664wJYUhDG/view?usp=sharing

非常感谢你！

Answer 1

library(xlsx)
library(zoo)

# Read the dataset starting form the 3rd line
df <- read.xlsx("SO.xlsx", 1, header=TRUE,startRow=3, stringsAsFactors=FALSE)

# Clean the data to remove the lines that should not be there
# like the lines 4 and 66 in this dataset
# this could be done many ways. Here I assume that all columns starting from the third 
# should have some values
df <- df[!is.na(df$hallos),]

# Assign the names to the first 2 columns
names(df)[1:2] <- c( "year", "type")

# The last 2 rows are summaries, so we probably want to remove them
df <- df[!grepl("",df$type),]

# The first column "year" has many missing values. We need to add year values to each cell:
df$year <- na.locf(df$year)

警告：由于此文本框格式的限制，以下结果缺少某些带重音的字符，但在R环境中，类型列中列和符号的名称将正确显示。

# Result
head(df)
#   year type  hallos  slyos.   knny sszesen  meghalt  slyosan  knnyen  sszesen.1
# 2 2013    J      28     255    622      905      33      300     870       1203
# 3 2013    F      31     223    527      781      34      248     764       1046
# 4 2013    M      34     274    691      999      34      320     971       1325
# 5 2013    A      36     349    757     1142      42      392    1090       1524
# 6 2013   Mj      52     436    902     1390      54      501    1241       1796
# 7 2013    J      39     455   1004     1498      41      509    1414       1964

将Excel文件导入R

1 个答案: