计算数据框的所有行和特定列的公式

时间:2018-04-13 11:00:21

标签: r dataframe row apply col

我有以下样本数据框,其中包含不同商店的玩具价格:

dfData <- data.frame(article = c("Fix", "Foxi", "Stan", "Olli", "Barbie", "Ken", "Hulk"),
                     priceToys1 = c(10, NA, 10.5, NA, 10.7, 11.2, 12.0),
                     priceAllToys = c(NA, 11.4, NA, 11.9, 11.7, 11.1, NA),
                     price123Toys = c(12, 12.4, 12.7, NA, NA, 11.0, 12.1))

此外,我通过添加:

生成最低价格列
dfData$MinPrice <- apply(dfData[, grep("price", colnames(dfData))], 1, FUN=min, na.rm = TRUE)

所以我现在有了这个数据框:

#  article priceToys1 priceAllToys price123Toys MinPrice
#1     Fix       10.0           NA         12.0     10.0
#2    Foxi         NA         11.4         12.4     11.4
#3    Stan       10.5           NA         12.7     10.5
#4    Olli         NA         11.9           NA     11.9
#5  Barbie       10.7         11.7           NA     10.7
#6     Ken       11.2         11.1         11.0     11.0
#7    Hulk       12.0           NA         12.1     12.0

如何在数据框中添加额外的列,告诉我所有价格的因素相对于最低价格百分比?新列名称还应包括商店名称。

结果应如下所示:

#  article priceToys1 PercToys1 priceAllToys PercAllToys price123Toys Perc123Toys MinPrice
#1     Fix       10.0     100.0           NA          NA         12.0       120.0     10.0
#2    Foxi         NA        NA         11.4       100.0         12.4       108.8     11.4
#3    Stan       10.5     100.0           NA          NA         12.7       121.0     10.5
#4    Olli         NA        NA         11.9       100.0           NA          NA     11.9
#5  Barbie       10.7     100.0         11.7       109.4           NA          NA     10.7
#6     Ken       11.2     101.8         11.1       100.9         11.0       100.0     11.0
#7    Hulk       12.0     100.0           NA          NA         12.1       100.8     12.0

2 个答案:

答案 0 :(得分:3)

两种可能的解决方案:

1)使用data.table - 包:

# load the 'data.table'-package
library(data.table)

# get the columnnames on which to operate
cols <- names(dfData)[2:4] # or: grep("price", names(dfData), value = TRUE)

# convert dfData to a 'data.table'
setDT(dfData)

# compute the 'fraction'-columns
dfData[, paste0('Perc', gsub('price','',cols)) := lapply(.SD, function(x) round(100 * x / MinPrice, 1))
       , .SDcols = cols][]

给出:

   article priceToys1 priceAllToys price123Toys MinPrice PercToys1 PercAllToys Perc123Toys
1:     Fix       10.0           NA         12.0     10.0     100.0          NA       120.0
2:    Foxi         NA         11.4         12.4     11.4        NA       100.0       108.8
3:    Stan       10.5           NA         12.7     10.5     100.0          NA       121.0
4:    Olli         NA         11.9           NA     11.9        NA       100.0          NA
5:  Barbie       10.7         11.7           NA     10.7     100.0       109.3          NA
6:     Ken       11.2         11.1         11.0     11.0     101.8       100.9       100.0
7:    Hulk       12.0           NA         12.1     12.0     100.0          NA       100.8

2)基础R:

cols <- names(dfData)[2:4] # or: grep("price", names(dfData), value = TRUE)

dfData[, paste0('Perc', gsub('price','',cols))] <- round(100 * dfData[, cols] / dfData$MinPrice, 1)

会得到相同的结果。

答案 1 :(得分:1)

我们可以使用mutate_at

中的dplyr
library(dplyr)
library(magrittr)
dfData %<>% 
      mutate_at(vars(matches("^price")),  funs(Perc = round(100* ./MinPrice, 1)))
dfData