我一直在尝试合并和整理几个csv文件(下面的链接)。 我已经成功合并了文件,并且可以在excel中手动对结果进行排序。但是我想使它自动化,并能够得到排序的结果。
问题 在最后一步中,我尝试在合并的DF中转换因子“ rankingGDP”,以便能够按值按desc顺序对其进行排序。 当我将结果DF分配给订单函数时,每个国家/地区的GDP排名值完全不同。数据已对齐。谁能告诉我我在做什么错。谢谢堆
#Fetch the files
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
download.file(fileUrl, destfile="./fgdp.csv")
fileUrl <-"https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
download.file(fileUrl, destfile="./fed.csv")
#Read the files
fgdp <- read.csv("fgdp.csv",skip = 4, header = T)
fed <- read.csv("fed.csv" ,header = T)
#subset relevant columns
fgdp <- fgdp[,c(1,2,4,5)]
#remove rows that are empty
fed <- fed[rowSums(is.na(fed))<ncol(fed),]
fgdp <- fgdp[rowSums(is.na(fgdp))<ncol(fgdp),]
#name the columns for fgdp to match fed
colnames(fgdp) <- c("CountryCode","rankingGDP",
"Long.Name", "gdp")
#merge the files based on Country Code
dt <- merge(fgdp, fed, by.x ="CountryCode", by.y = "CountryCode", all = TRUE)
#Remove rows where the relevant columns are empty
dt <- dt[!dt$CountryCode=="" ,]
dt <- dt[!(dt$rankingGDP=="" | is.na(dt$rankingGDP)) ,]
#subset the columns used for analysis
dt1 <- dt[,1:4]
#remove NAs
dt1 <- dt1[!(is.na(dt1$rankingGDP)),]
#Convert factor to numeric to be able to sort rankingGDP decending
#THE ISSUE IS HERE WHERE THE result gives me different values for the
#rankingGDP column(2). By that I mean factor numbers(type chars) are not
#converted to the associated number in most cases.
dt1[,2]<- as.numeric(dt1[,2])