将R中的两个变量(按行)分组以创建一个变量

时间:2020-10-10 21:21:19

标签: r variables group-by

我有一个数据框,

Scanner scnr = new Scanner(System.in);

int[]newArray = new int[20];
int newArraySize = 0;   

while (newArraySize < newArray.length){
    try {
        newArray[newArraySize] = scnr.nextInt();
        newArraySize++;
    }catch(Exception e){
        scnr.nextLine();
    }
}

for (int i = 0; i < newArraySize; i++){
    System.out.println("The " + i + " input is " + newArray[i]);
} 

如何创建一个名为Disease Genemutation Mean. Total No of pateints No.of pateints. cancertype1 BRCA1 1 10 2 cancertype2 BRCA2 5 10 3 cancertype3 BRCA2 7 10 4 cancertype1 BRCA1 8 10 1 cancertype3 BRCA2 4 10 4 cancertype2 BRCA1 6 10 1 (来自cancertype 4cancer type 3)的新变量,其中包括合并这两个变量后拥有该变量的患者数量?

2 个答案:

答案 0 :(得分:2)

我们可以将replace%in%一起使用以替换这些值(假设“疾病”为character类)

df1 %>% 
   group_by(Disease = replace(Disease,
        Disease %in% c("cancertype2", "cancertype3"), "cancertype4")) %>%
   summarise(TotalNoofpateints = sum(TotalNoofpateints))

-输出

# A tibble: 2 x 2
#  Disease     TotalNoofpateints
#  <chr>                   <int>
#1 cancertype1                20
#2 cancertype4                40

答案 1 :(得分:1)

这是使用aggregate

的基本R选项
aggregate(
  Total.No.of.pateints ~ Disease,
  transform(
    df,
    Disease = replace(Disease, Disease %in% c("cancertype2", "cancertype3"), "cancertype4")
  ),
  sum
)

给予

      Disease Total.No.of.pateints
1 cancertype1                   20
2 cancertype4                   40

数据

> dput(df)
structure(list(Disease = c("cancertype1", "cancertype2", "cancertype3", 
"cancertype1", "cancertype3", "cancertype2"), Genemutation = c("BRCA1",
"BRCA2", "BRCA2", "BRCA1", "BRCA2", "BRCA1"), Mean. = c(1L, 5L, 
7L, 8L, 4L, 6L), Total.No.of.pateints = c(10L, 10L, 10L, 10L,
10L, 10L), No.of.pateints. = c(2L, 3L, 4L, 1L, 4L, 1L)), class = "data.frame", row.names = c(NA,
-6L))