Question

我在使用 apply 函数（我认为这是执行以下操作的正确方法）跨多个数据框时遇到问题。

一些示例数据（3个不同的数据框，但我正在处理的问题超过50个）：

biz <- data.frame(
    country = c("england","canada","australia","usa"),
    businesses = sample(1000:2500,4))

pop <- data.frame(
    country = c("england","canada","australia","usa"),
    population = sample(10000:20000,4))

restaurants <- data.frame(
    country = c("england","canada","australia","usa"),
    restaurants = sample(500:1000,4))

这是我最终想做的事情：

1）根据包含的变量

将吃数据框从最大到最小排序

dataframe <- dataframe[order(dataframe$VARIABLE,)]

2）然后创建一个向量变量，给出每个

的等级

dataframe$rank <- 1:nrow(dataframe)

3）然后创建另一个数据框，其中包含一列国家和每个感兴趣变量的排名作为其他列。看起来像的东西（排名在这里不真实）：

country.rankings <- structure(list(country = structure(c(5L, 1L, 6L, 2L, 3L, 4L), .Label = c("brazil", 
"canada", "england", "france", "ghana", "usa"), class = "factor"), 
    restaurants = 1:6, businesses = c(4L, 5L, 6L, 3L, 2L, 1L), 
    population = c(4L, 6L, 3L, 2L, 5L, 1L)), .Names = c("country", 
"restaurants", "businesses", "population"), class = "data.frame", row.names = c(NA, 
-6L))

所以我猜测有一种方法可以将每个数据框放在一个列表中，例如：

lib <- c(biz, pop, restaurants)

然后执行 lapply 到1）排序，2）创建排名变量和3）为每个变量创建排名矩阵或数据框（业务数量，人口规模，每个国家的餐馆数量。我遇到的问题是当我尝试按变量排序时，编写 lapply 函数来对每个数据框进行排序会遇到问题：

sort <- lapply(lib, 
    function(x){
        x <- x[order(x[,2]),]
        })

返回错误消息：

Error in `[.default`(x, , 2) : incorrect number of dimensions

因为我正在尝试将列标题应用于列表。但是，当每个数据框的变量名称不同时，我还能怎样解决这个问题（但请记住，国家名称是一致的）

（也很想知道如何使用 plyr ）

Answer 1

理想情况下，我会为此推荐data.table。但是，这是使用data.frame的快速解决方案试试这个：

步骤1：创建所有data.frames的列表

varList <- list(biz,pop,restaurants)

步骤2：将所有这些组合在一个data.frame

中

temp <- varList[[1]]
for(i in 2:length(varList))  temp <- merge(temp,varList[[i]],by = "country")

第3步：获得排名：

cbind(temp,apply(temp[,-1],2,rank))

如果您愿意，可以删除不需要的列!!

cbind(temp[,1:2],apply(temp[,-1],2,rank))[,-2]

希望这会有所帮助!!

Answer 2

totaldatasets <- c('biz','pop','restaurants')
totaldatasetslist <- vector(mode = "list",length = length(totaldatasets))
for ( i in seq(length(totaldatasets)))
{
  totaldatasetslist[[i]]  <- get(totaldatasets[i])
}

totaldatasetslist2 <- lapply(
  totaldatasetslist,
  function(x)
  {
    temp <- data.frame(
      country = totaldatasetslist[[i]][,1],
      countryrank  = rank(totaldatasetslist[[i]][,2])
    )

    colnames(temp) <- c('country', colnames(x)[2])

    return(temp)
  }
    )


Reduce(
  merge,
  totaldatasetslist2
)

输出 -

    country businesses population restaurants
1 australia          3          3           3
2    canada          2          2           2
3   england          1          1           1
4       usa          4          4           4

在多个数据帧中使用“应用”功能

2 个答案: