我是R的新手,我无法找到如何将具有多个变量(样本1-4)的列拆分为单独的列,同时移动与之关联的数据的答案。这是一个例子:
Samples Content
Sample 1 70.7
Sample 1 91.6
Sample 1 92.6
Sample 1 65.2
Sample 1 80.0
Sample 1 82.1
Sample 1 88.1
Sample 1 92.2
Sample 1 53.3
Sample 1 80.0
Sample 1 60.3
Sample 1 89.7
Sample 1 84.8
Sample 1 94.0
Sample 1 71.8
Sample 1 76.9
Sample 1 91.4
Sample 1 57.9
Sample 1 61.9
Sample 1 71.5
Sample 2 88.7
Sample 2 67.6
Sample 2 61.7
Sample 2 70.8
Sample 2 45.3
Sample 2 55.6
Sample 2 64.6
Sample 2 62.7
Sample 2 72.4
Sample 2 46.8
Sample 2 59.0
Sample 2 63.7
Sample 2 67.0
Sample 2 71.6
Sample 2 48.3
Sample 2 55.6
Sample 2 62.5
Sample 2 60.0
Sample 2 72.9
Sample 2 47.4
Sample 3 42.3
Sample 3 48.2
Sample 3 64.0
Sample 3 33.3
Sample 3 19.0
Sample 3 41.0
Sample 3 53.1
Sample 3 46.5
Sample 3 30.0
Sample 3 43.4
Sample 3 43.7
Sample 3 92.0
Sample 3 53.0
Sample 3 33.0
Sample 3 48.4
Sample 3 43.2
Sample 3 41.8
Sample 3 62.5
Sample 3 33.3
Sample 3 49.3
Sample 4 51.8
Sample 4 57.3
Sample 4 43.3
Sample 4 42.3
Sample 4 37.6
Sample 4 54.9
Sample 4 71.1
Sample 4 33.8
Sample 4 43.1
Sample 4 39.1
Sample 4 63.0
Sample 4 74.0
Sample 4 31.0
Sample 4 48.3
Sample 4 42.9
Sample 4 62.2
Sample 4 35.4
Sample 4 33.8
Sample 4 40.7
Sample 4 41.2
我试过tidyr没有成功。我希望输出是这样的;
Sample 1 Sample 2 Sample 3 Sample 4
70.7 88.7 42.3 51.8
91.6 67.6 48.2 57.3
92.6 61.7 64.0 43.3
65.2 70.8 33.3 42.3
80.0 45.3 19.0 37.6
82.1 55.6 41.0 54.9
88.1 64.6 53.1 71.1
92.2 62.7 46.5 33.8
53.3 72.4 30.0 43.1
80.0 46.8 43.4 39.1
60.3 59.0 43.7 63.0
89.7 63.7 92.0 74.0
84.8 67.0 53.0 31.0
94.0 71.6 33.0 48.3
71.8 48.3 48.4 42.9
76.9 55.6 43.2 62.2
91.4 62.5 41.8 35.4
57.9 60.0 62.5 33.8
61.9 72.9 33.3 40.7
71.5 47.4 49.3 41.2
非常感谢,如果确定了解决方案,如果我想做回报,是否有答案?
额外 - 有没有办法对堆叠在一列中的数据进行t检验,例如第一个例子而不必转换它?
答案 0 :(得分:2)
您可能拥有"重复的标识符"问题使用tidyr::spread
。您首先需要生成Sample + identifier的唯一组合,您可以这样做(假设数据框名为df1
):
library(tidyverse) # for dplyr + tidyr
df1 %>%
group_by(Samples) %>%
mutate(id = row_number()) %>%
spread(Samples, Content) %>%
select(-id)
"如果我想做回报"
你的意思是走另一条路,从宽阔的形式回到原来的长形式?然后使用gather
。将其添加到上面代码的末尾,看看会发生什么:
%>% gather(Samples, Content)
t-test:有很多方法可以对长格式数据进行t检验。例如,比较样本1和2的基本R方式可能是:
t.test(df1[df1$Samples == "Sample 1", "Content"],
df1[df1$Samples == "Sample 2", "Content"])
答案 1 :(得分:1)
作为每个'样本'的元素数量。同样,我们可以使用unstack
base R
unstack(df1, Content~Samples)
# Sample.1 Sample.2 Sample.3 Sample.4
#1 70.7 88.7 42.3 51.8
#2 91.6 67.6 48.2 57.3
#3 92.6 61.7 64.0 43.3
#4 65.2 70.8 33.3 42.3
#5 80.0 45.3 19.0 37.6
#6 82.1 55.6 41.0 54.9
#7 88.1 64.6 53.1 71.1
#8 92.2 62.7 46.5 33.8
#9 53.3 72.4 30.0 43.1
#10 80.0 46.8 43.4 39.1
#11 60.3 59.0 43.7 63.0
#12 89.7 63.7 92.0 74.0
#13 84.8 67.0 53.0 31.0
#14 94.0 71.6 33.0 48.3
#15 71.8 48.3 48.4 42.9
#16 76.9 55.6 43.2 62.2
#17 91.4 62.5 41.8 35.4
#18 57.9 60.0 62.5 33.8
#19 61.9 72.9 33.3 40.7
#20 71.5 47.4 49.3 41.2
没有使用外部包
如果' Sample'元素不同,然后可以使用dcast
data.table
(在两种情况下都适用)
library(data.table)
dcast(setDT(df1), rowid(Samples)~Samples, value.var = "Content")