如何访问包含数据框中的参数的列,并按降序排序

时间:2018-02-03 04:57:47

标签: r

我正在处理一个数据框,其中包含列名,公司名称,部门名称all_production_2017,bad_production_2017 ......多年前回来了

现在我正在编写一个函数,该公司将名称作为参数,并总结该公司当年的产量。然后通过降低all_production_

中的顺序对其进行排序

我已将年份转换为字符串并过滤所需的行和列。但是我如何按特定列对其进行排序?我不知道如何访问该列名,因为参数year是其后缀。

这是我的数据框架结构的草图。

结构(列表(公司= c(" DLT"," DLT"," DLT"," MSF"," ; MSF"," MSF"),                division = c(" Marketing"," CHANG1"," CAHNG2"," MARKETING"," CHANG1M",& #34; CHANG2M&#34),                all_production_2000 = c(15,25,25,10,25,18),                good_production_2000 = c(10,24,10,8,10,10),                bad_production_2000 = c(2,1,2,1,3,5)))

从2000年到2017年的数据 我想写一个给出公司名称和一年的函数。 它可以过滤掉公司和相关的年份,并按顺序对all_production_thatyear进行排序。

到目前为止我已经完成了。

ExportCompanyYear <- function(company.name, year){
   year.string <- toString(year)
   x <- filter(company.data, company == company.name) %>%
      select(company, division, contains(year.string))
}

我只是不知道如何按降序排序,因为我不知道如何访问包含参数年份的列名。

2 个答案:

答案 0 :(得分:0)

虽然OP似乎提供了一个非常简单的sample data,其中只包含2000年的数据。

解决方法可能是: 1.将列表转换为data.frame 2.使用gather中的tidyr以可以应用过滤器的方式排列数据框

    ll <- structure(list(company = c("DLT", "DLT", "DLT", "MSF", "MSF", "MSF"),
       division = c("Marketing", "CHANG1", "CAHNG2", "MARKETING", "CHANG1M",
 "CHANG2M"), all_production_2000 = c(15, 25, 25, 10, 25, 18),
       good_production_2000 = c(10, 24, 10, 8, 10, 10), 
       bad_production_2000 = c(2, 1, 2, 1, 3, 5)))

df <- as.data.frame(ll)
library(tidyr)
gather(df, key = "key", value = "value", -c("company",  "division"))

#result:
# company  division                  key value
#1      DLT Marketing  all_production_2000    15
#2      DLT    CHANG1  all_production_2000    25
#3      DLT    CAHNG2  all_production_2000    25
#4      MSF MARKETING  all_production_2000    10
#5      MSF   CHANG1M  all_production_2000    25
#6      MSF   CHANG2M  all_production_2000    18
#7      DLT Marketing good_production_2000    10
#8      DLT    CHANG1 good_production_2000    24
#9      DLT    CAHNG2 good_production_2000    10
#10     MSF MARKETING good_production_2000     8
#11     MSF   CHANG1M good_production_2000    10
#12     MSF   CHANG2M good_production_2000    10
#13     DLT Marketing  bad_production_2000     2
#14     DLT    CHANG1  bad_production_2000     1
#15     DLT    CAHNG2  bad_production_2000     2

现在,可以在上面的data.frame上轻松应用过滤器。

答案 1 :(得分:0)

You definitely need to reshape your data in such a way that year values could be passed as a parameter.

To create a reproducible example, I have added another year 2001 in the data.

df = data.frame(company = c("DLT", "DLT", "DLT", "MSF", "MSF", "MSF"), division = c("Marketing", "CHANG1", "CAHNG2", "MARKETING", "CHANG1M", "CHANG2M"), all_production_2000 = c(15, 25, 25, 10, 25, 18), good_production_2000 = c(10, 24, 10, 8, 10, 10), bad_production_2000 = c(2, 1, 2, 1, 3, 5),all_production_2001 = 2*c(15, 25, 25, 10, 25, 18), good_production_2001 = 2*c(10, 24, 10, 8, 10, 10), bad_production_2001 = 2*c(2, 1, 2, 1, 3, 5))

Now you can reshape the data using the reshape function in R. Here, the variables "all_production","good_production","bad_production" are varying with time, and year values are changing for those variables.

So we specify v.names = c("all_production","good_production","bad_production").

df2 = reshape(df,direction="long",
v.names = c("all_production","good_production","bad_production"),
varying = names(df)[3:8],
idvar = c("company","division"),
timevar = "year",times = c(2000,2001))

For your data.frame you can specify times=2000:2017 and varying=3:ncol(df)

>df2
                   company  division year all_production good_production bad_production
DLT.Marketing.2000     DLT Marketing 2000             15               2             10
DLT.CHANG1.2000        DLT    CHANG1 2000             25               1             24
DLT.CAHNG2.2000        DLT    CAHNG2 2000             25               2             10
MSF.MARKETING.2000     MSF MARKETING 2000             10               1              8
MSF.CHANG1M.2000       MSF   CHANG1M 2000             25               3             10
MSF.CHANG2M.2000       MSF   CHANG2M 2000             18               5             10
DLT.Marketing.2001     DLT Marketing 2001             30               4             20
DLT.CHANG1.2001        DLT    CHANG1 2001             50               2             48
DLT.CAHNG2.2001        DLT    CAHNG2 2001             50               4             20
MSF.MARKETING.2001     MSF MARKETING 2001             20               2             16
MSF.CHANG1M.2001       MSF   CHANG1M 2001             50               6             20
MSF.CHANG2M.2001       MSF   CHANG2M 2001             36              10             20

Now you can filter and sort like this:

library(dplyr)
somefunc<-function(company.name,yearval){
    df2%>%filter(company==company.name,year==yearval)%>%arrange(-all_production)
}

>somefunc("DLT",2001)
  company  division year all_production good_production bad_production
1     DLT    CHANG1 2001             50               2             48
2     DLT    CAHNG2 2001             50               4             20
3     DLT Marketing 2001             30               4             20