将我的CSV文件上传到r时,数据行中会添加一行和一列,尽管CSV文件中这些行为空。数据框应具有2005 obs。和49个变量。但是,在上传时,它会导致带有2006 obs的数据框。和50个变量。另外,在上传后,某些字段由r填充为NA。
这是我用于将文件上传到r中的代码:
Dev_REITs_MTBV <- read.csv2("Developed_REITS_MTBV.csv", na="NA")
这是csv文件:
代码在上传前运行:
pkgs <- c("readxl","akima","rgl","scatterplot3d","car","MASS","ISLR","stargazer","urca","rpart","ggplot2","e1071","randomForest",
"quantreg","mgcv","gamlss","rlang","gplots","psych","ggridges","viridis","caTools","caret","forecast", "shape", "diagram",
"writexl", "openxlsx", "maptools", "ggridges", "calibrate", "modelr", "XLConnect")
for (pkg in pkgs) {if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }}
lapply(pkgs, require, character.only = TRUE)
以下是我输入的CSV行和列以及生成的数据框的图片:
非常感谢您的帮助!
答案 0 :(得分:0)
您是否尝试过tidyverse中的read_csv()
?如果没有实际文件,可能很难进行故障排除,但是仅尝试另一个软件包就可以解决。您也可以尝试从data.table包中fread()
稍后编辑/添加:
您的数据非常混乱(用','代替'。'作为小数点分隔符,而';'和列分隔符,在最后一列中有一堆尾随逗号,并以数字(年)作为变量名)。但是,此代码应解决此问题:
library(tidyverse) # you need dplyr 1.0.0 or later
# load data
dataset <- read_delim("Developed_REITS_MTBV.csv", delim = ";") %>%
# rename final column
rename(`2019` = `2019,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,`) %>%
# delete all trailing commas in last column (but not the first one)
mutate(`2019` = gsub("^,*|(?<=,),|,*$", "", `2019`, perl = T)) %>%
# name the year columns numeric after switching the commas to points
mutate(across(c(`1980`:`2019`), ~as.numeric(gsub(",", ".", .))))
部分代码来自:Removing multiple commas and trailing commas using gsub