Question

我一直在尝试合并和整理几个csv文件（下面的链接）。我已经成功合并了文件，并且可以在excel中手动对结果进行排序。但是我想使它自动化，并能够得到排序的结果。

问题在最后一步中，我尝试在合并的DF中转换因子“ rankingGDP”，以便能够按值按desc顺序对其进行排序。当我将结果DF分配给订单函数时，每个国家/地区的GDP排名值完全不同。数据已对齐。谁能告诉我我在做什么错。谢谢堆

   #Fetch the files
    fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
    download.file(fileUrl, destfile="./fgdp.csv")
    fileUrl <-"https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
    download.file(fileUrl, destfile="./fed.csv")

    #Read the files
    fgdp <- read.csv("fgdp.csv",skip = 4, header = T)
    fed <- read.csv("fed.csv" ,header = T)

    #subset relevant columns
    fgdp <- fgdp[,c(1,2,4,5)]

    #remove rows that are empty
    fed <- fed[rowSums(is.na(fed))<ncol(fed),]
    fgdp <- fgdp[rowSums(is.na(fgdp))<ncol(fgdp),]

    #name the columns for fgdp to match fed
    colnames(fgdp) <- c("CountryCode","rankingGDP", 
                        "Long.Name", "gdp")

    #merge the files based on Country Code
    dt <- merge(fgdp, fed, by.x ="CountryCode", by.y = "CountryCode", all = TRUE)

    #Remove  rows where the relevant columns are empty
    dt <- dt[!dt$CountryCode=="" ,]
    dt <- dt[!(dt$rankingGDP=="" | is.na(dt$rankingGDP)) ,]

    #subset the columns used for analysis
    dt1 <- dt[,1:4]

    #remove NAs
    dt1 <- dt1[!(is.na(dt1$rankingGDP)),]

    #Convert factor to numeric to be able to sort rankingGDP decending
    #THE ISSUE IS HERE WHERE THE result gives me different values for the
    #rankingGDP column(2). By that I mean factor numbers(type chars) are not
    #converted to the associated number in most cases.

    dt1[,2]<- as.numeric(dt1[,2])

R Dataframe Factor转换为数字问题

0 个答案: