我在R工作时有10个列表(files1
,files2
,files3
,... files10
)。每个列表包含多个数据帧。
现在,我想从每个列表中的每个数据帧中提取一些值。
我打算使用for循环
nt = c("A", "C", "G", "T")
for (i in files1) {
for (j in nt) {
name = paste(j, i, sep = "-") # here I want as output name = "files1-A". However this doesn't work. How can I get the name of the list "files1"?
colname = paste("percentage", j, sep = "") # here I was as output colname = percentageA. This works
assign(name, unlist(lapply(i, function(x) x[here I want to use the column with the name "percentageA", so 'colname'][x$position==1000])))
}
}
所以,我使用列表名称并将它们分配给变量时遇到了麻烦。
我知道只循环第一个列表,但是是否也可以立即遍历我的所有列表?
换句话说:我怎样才能将下面的代码放在for循环中?
A_files1 = unlist(lapply(files1, function(x) x$percentageA[x$position==1000]))
C_files1 = unlist(lapply(files1, function(x) x$percentageC[x$position==1000]))
G_files1 = unlist(lapply(files1, function(x) x$percentageG[x$position==1000]))
T_files1 = unlist(lapply(files1, function(x) x$percentageT[x$position==1000]))
A_files2 = unlist(lapply(files2, function(x) x$percentageA[x$position==1000]))
C_files2 = unlist(lapply(files2, function(x) x$percentageC[x$position==1000]))
G_files2 = unlist(lapply(files2, function(x) x$percentageG[x$position==1000]))
T_files2 = unlist(lapply(files2, function(x) x$percentageT[x$position==1000]))
....
A_files10 = unlist(lapply(files10, function(x) x$percentageA[x$position==1000]))
C_files10 = unlist(lapply(files10, function(x) x$percentageC[x$position==1000]))
G_files10 = unlist(lapply(files10, function(x) x$percentageG[x$position==1000]))
T_files10 = unlist(lapply(files10, function(x) x$percentageT[x$position==1000]))
答案 0 :(得分:0)
为了回答您的问题,我创建了一个包含数据框的虚假列表:
n = data.frame(andrea=c(1983, 11, 8),paja=c(1985, 4, 3))
s = data.frame(col1=c("aa", "bb", "cc", "dd", "ee"))
b = data.frame(col1=c(TRUE, FALSE, TRUE, FALSE, FALSE))
x = list(n, s, b, 3) # x contains copies of n, s, b
names(x) <- c("dataframe1","dataframe2","dataframe3","dataframe4")
files1 = x
现在,进入循环中发生的事情:
i = files1
j = "A"
如果您希望数据帧的名称包含在nt中的pedix(在这种情况下为nt = "A"
),则必须使用名称(i):
name_wrong = paste(j, i, sep = "-")
name = paste(names(i),j,sep = "-")
所以你获得了:
> name
[1] "dataframe1-A" "dataframe2-A" "dataframe3-A" "dataframe4-A"
我希望这是你需要的。
答案 1 :(得分:0)
我认为如果你扁平化数据结构,这些数据会更容易操作。您可以使用一个数据框,而不是10个数据框列表,所有观察结果都按其名称和文件名索引。
每个项目只有10或11个点的简化数据 我想列表中的项目有不同的行数?
files1 <- list(item1 = data.frame(position = 1:10,
percentageA = 1:10/10,
percentageC = 1:10/10,
percentageG = 1:10/10,
percentageT = 1:10/10),
item2 = data.frame(position = 1:11,
percentageA = 1:11/20,
percentageC = 1:11/20,
percentageG = 1:11/20,
percentageT = 1:11/20))
str(file)
# Select the 9th position using your code
A_files1 = unlist(lapply(files1, function(x) x$percentageA[x$position==9]))
C_files1 = unlist(lapply(files1, function(x) x$percentageC[x$position==9]))
G_files1 = unlist(lapply(files1, function(x) x$percentageG[x$position==9]))
T_files1 = unlist(lapply(files1, function(x) x$percentageT[x$position==9]))
# Add name to each data frame
# Inspired by this answer
# http://stackoverflow.com/a/18434780/2641825
# For information l[1] creates a single list item
# l[[1]] extracts the data frame from the list
#' @param i index
#' @param listoffiles list of data frames
addname <- function(i, listoffiles){
dtf <- listoffiles[[i]] # Extract the dataframe from the list
dtf$name <- names(listoffiles[i]) # Add the name inside the data frame
return(dtf)
}
# Add the name inside each data frame
files1 <- lapply(seq_along(files1), addname, files1)
str(files1) # look at the structure of the list
files1table <- Reduce(rbind,files1)
# Get the values of interest with
files1table$percentageA[files1table$position == 9]
# [1] 0.90 0.45
# Get all Letters of interest with
subset(files1table,position==9)
# position percentageA percentageC percentageG percentageT name
# 9 9 0.90 0.90 0.90 0.90 item1
# 19 9 0.45 0.45 0.45 0.45 item2
# Now create anoter list, files2, duplicate just for the sake of the example
files2 <- files1
# file1 and file2 both have a name column inside their dataframes already
# Create a list of list of dataframes
lolod <- list(files1 = files1, files2 = files2)
str(lolod) # a list of lists
# Flatten to a list of dataframes
# Use sapply to keep names based on this answer http://stackoverflow.com/a/9469981/2641825
lod <- sapply(lolod, Reduce, f=rbind, simplify = FALSE, USE.NAMES = TRUE)
# Add the name inside each data frame again
addfilename <- function(i, listoffiles){
dtf <- listoffiles[[i]] # Extract the dataframe from the list
dtf$filename <- names(listoffiles[i]) # Add the name inside the data frame
return(dtf)
}
lod <- lapply(seq_along(lod), addfilename, lod)
# Flatten to a dataframe
d <- Reduce(rbind, lod)
# Now the data structure is flattened and much easier to deal with
subset(d,position==9)
# position percentageA percentageC percentageG percentageT name filename
# 9 9 0.90 0.90 0.90 0.90 item1 files1
# 19 9 0.45 0.45 0.45 0.45 item2 files1
# 30 9 0.90 0.90 0.90 0.90 item1 files2
# 40 9 0.45 0.45 0.45 0.45 item2 files2
这个答案比我预期的要长得多。我希望我没有吓到你。 灵感来自tidy data,简化数据结构将有助于您以后的工作。如果您在原始数据中提供了名称,则可能不需要这个复杂的列表重命名事项。