在R中重新排列数据框的最简单方法

时间:2017-05-01 04:47:26

标签: r dataframe grep transform

在R中有一百万个和一个以上的教学数据争论和组织的站点,但我不确定哪个最有效,因为我的问题/我知道如何在python中轻松地做到这一点,但是什么是等效的在R中这样做的简单方法?

比如说,我有一个如下所示的数据框:

ROI   no  season value
a     1   summer 81.33328 
a     2   summer 15.34663
...

但是我想重新安排列,使它看起来像这样:

library(stringr)
df$new <- str_split_fixed(dat$ROI, "_", 2)

等等

到目前为止,我有这个:

   List<Integer> intList = new ArrayList<>();

    //example list [0, 20, 10, 9, 11, 7, 9, 14]

    List<Integer> result = new ArrayList<>();
    for (int i=0; i < intList.size()-1; i++) {

        for (int j=i+1; j < intList.size(); j++) {

            if (intList.get(j) > intList.get(i)) {
                result.add(intList.get(i));                 
                break;
            }
            i++;
        }
    }
    System.out.println(result);

我怎样才能最好地接近这个?

1 个答案:

答案 0 :(得分:1)

我们可以使用tidyverse

执行此操作
library(tidyverse)
#split the 'ROI' into two columns
res <- separate(df, ROI, into = c("ROI", 'no'), convert = TRUE) %>% 
          #reshape from wide to long format 
          gather(season, value, summer_1:winter_2) %>%
          #split the season column into two
          separate(season, into = c('season', 'n')) %>%
          #remove the columns that are not needed
          select(-n)

head(res)
#  ROI no season    value
#1   a  1 summer 29.25740
#2   a  2 summer 22.48911
#3   a  3 summer 70.42230
#4   b  1 summer 51.88971
#5   b  2 summer 66.26196
#6   b  3 summer 92.04438

或者其他选项是使用cSplit拆分列,使用melt中的data.table将其转换为“长”格式

library(splitstackshape)
res2 <- setnames(melt(cSplit(df, "ROI", sep="_"), id.var = c("ROI_1", "ROI_2"), 
  variable.name = "season"), 1:2, c("ROI", "no"))[, season := sub("_\\d+", "", season)][]
head(res2)
#   ROI no season    value
#1:   a  1 summer 29.25740
#2:   a  2 summer 22.48911
#3:   a  3 summer 70.42230
#4:   b  1 summer 51.88971
#5:   b  2 summer 66.26196
#6:   b  3 summer 92.04438

数据

set.seed(24)
ROI <- c("a_01","a_02","a_03","b_01","b_02","b_03")
summer_1 <- runif(6, min=0, max=100)
winter_1 <- runif(6, min=0, max=100)
summer_2 <- runif(6, min=0, max=100)
winter_2 <- runif(6, min=0, max=100)
df <- data.frame(ROI,summer_1,winter_1,summer_2,winter_2)