这是我的data.frame,名为test
strain variable value L1
1 AB1 n 582.00000 1
2 AB4 n 12.00000 1
3 CB4852 n 375.00000 1
4 CB4853 n 113.00000 1
5 CB4854 n 160.00000 1
这是一个融化的data.frame,其中L1变为1-30,每个L1和96个变量有78个变量......总共219,552行。
我想做的是获取此data.frame(测试)并创建L1(30)X变量(78)具有以下方向的新data.frames:
L1_variable(这将是一个df的名称)
strains1 strain2 .... strainN
row.name value value value
variable x value value value
因此为每个L1和变量创建一个新的df,它具有每个菌株列的给定变量的值。
这些将被放入一个函数中。
我在想一个函数需要创建然后在我的df测试中使用ddply,但我不知道如何实现它。
感谢任何和所有帮助
答案 0 :(得分:0)
没有必要创建单独的数据帧。您可以按如下方式重新整形数据框:
# creating sample data (extending your sample in order to be able to illustrate the method
df <- structure(list(strain = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("AB1", "AB4", "CB4852", "CB4853", "CB4854"), class = "factor"), variable = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("m", "n"), class = "factor"), value = c(582, 12, 375, 113, 160, 753, 92, 115, 163, 189, 462, 72, 305, 183, 360, 142, 132, 75, 308, 216), L1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("strain", "variable", "value", "L1"), class = "data.frame", row.names = c(NA, -20L))
# transforming the data with the reshape2 package
require(reshape2)
df2 <- dcast(df, L1 + variable ~ strain, value.var="value")
# creating a variable with unique identifiers
df2$L1var <- paste0(df2$L1, df2$variable)
这导致以下数据帧:
df2 <- structure(list(L1 = c(1L, 1L, 2L, 2L), variable = structure(c(1L, 2L, 1L, 2L), .Label = c("m", "n"), class = "factor"), AB1 = c(753, 582, 142, 462), AB4 = c(92, 12, 132, 72), CB4852 = c(115, 375, 75, 305), CB4853 = c(163, 113, 308, 183), CB4854 = c(189, 160, 216, 360), L1var = c("1m", "1n", "2m", "2n")), .Names = c("L1", "variable", "AB1", "AB4", "CB4852", "CB4853", "CB4854", "L1var"), row.names = c(NA, -4L), class = "data.frame")
如果您想为每个唯一标识符分配单独的文件,可以像这样分割df2
:
# split dataframe in list of dataframes
dfs <- split(df2, df2$L1var)
# save each dataframe in the list to a seperate file
lapply(seq_along(dfs), function(i)write.csv(dfs[i], file = paste0(names(dfs)[i],'.csv')))