R:在长格式数据框

时间:2018-04-05 03:33:03

标签: r dataframe melt

假设我有一个如下所示的数据框,其中一列始终具有相同的值:

> set.seed(1)
> mydf <- data.frame(name=LETTERS[1:10], treatment1=rnorm(10, 2, 1), treatment2=1.35, treatment3=rnorm(10, 5, 2))
> mydf
   name treatment1 treatment2 treatment3
1     A   1.373546       1.35  8.0235623
2     B   2.183643       1.35  5.7796865
3     C   1.164371       1.35  3.7575188
4     D   3.595281       1.35  0.5706002
5     E   2.329508       1.35  7.2498618
6     F   1.179532       1.35  4.9101328
7     G   2.487429       1.35  4.9676195
8     H   2.738325       1.35  6.8876724
9     I   2.575781       1.35  6.6424424
10    J   1.694612       1.35  6.1878026

在短格式数据框中,我知道如何通过以下方式检测和丢弃此列:

> mydf[sapply(mydf, function(x) length(unique(na.omit(x)))) == 1]
   treatment2
1        1.35
2        1.35
3        1.35
4        1.35
5        1.35
6        1.35
7        1.35
8        1.35
9        1.35
10       1.35

但是,现在我面对的是长格式的数据框:

> mymelt <- melt(mydf, id.vars="name")
> mymelt
   name   variable     value
1     A treatment1 1.3735462
2     B treatment1 2.1836433
3     C treatment1 1.1643714
4     D treatment1 3.5952808
5     E treatment1 2.3295078
6     F treatment1 1.1795316
7     G treatment1 2.4874291
8     H treatment1 2.7383247
9     I treatment1 2.5757814
10    J treatment1 1.6946116
11    A treatment2 1.3500000
12    B treatment2 1.3500000
13    C treatment2 1.3500000
14    D treatment2 1.3500000
15    E treatment2 1.3500000
16    F treatment2 1.3500000
17    G treatment2 1.3500000
18    H treatment2 1.3500000
19    I treatment2 1.3500000
20    J treatment2 1.3500000
21    A treatment3 8.0235623
22    B treatment3 5.7796865
23    C treatment3 3.7575188
24    D treatment3 0.5706002
25    E treatment3 7.2498618
26    F treatment3 4.9101328
27    G treatment3 4.9676195
28    H treatment3 6.8876724
29    I treatment3 6.6424424
30    J treatment3 6.1878026

我不想再次使用dcastmelt,是否有办法轻松地从treament2检测并删除mymelt? (请注意,在我的实际数据框中,我有2个variable列,用于标识治疗方法)。 谢谢!

1 个答案:

答案 0 :(得分:0)

您可以尝试以下操作:

library(dplyr)

# calculate count of unique values per group
df1 <- df %>% 
    group_by(variable) %>% 
    summarise(counts = n_distinct(value))

# get name of variable which has just one unique value
to_remove <- df1$variable[df1$counts == 1] # treatment2

# remove that value from the dataframe
df <- df[df$variable != to_remove, ]

   name   variable     value
1     A treatment1 1.3735462
2     B treatment1 2.1836433
3     C treatment1 1.1643714
4     D treatment1 3.5952808
5     E treatment1 2.3295078
6     F treatment1 1.1795316
7     G treatment1 2.4874291
8     H treatment1 2.7383247
9     I treatment1 2.5757814
10    J treatment1 1.6946116
21    A treatment3 8.0235623
22    B treatment3 5.7796865
23    C treatment3 3.7575188
24    D treatment3 0.5706002
25    E treatment3 7.2498618
26    F treatment3 4.9101328
27    G treatment3 4.9676195
28    H treatment3 6.8876724
29    I treatment3 6.6424424
30    J treatment3 6.1878026