生成样本数据并使用问题

Question

我在R工作时有10个列表（files1，files2，files3，... files10）。每个列表包含多个数据帧。

现在，我想从每个列表中的每个数据帧中提取一些值。

我打算使用for循环

nt = c("A", "C", "G", "T")
for (i in files1) {
    for (j in nt) {
        name = paste(j, i, sep = "-") # here I want as output name = "files1-A". However this doesn't work. How can I get the name of the list "files1"?
        colname = paste("percentage", j, sep = "") # here I was as output colname = percentageA. This works
        assign(name, unlist(lapply(i, function(x) x[here I want to use the column with the name "percentageA", so 'colname'][x$position==1000])))
    }
}

所以，我使用列表名称并将它们分配给变量时遇到了麻烦。

我知道只循环第一个列表，但是是否也可以立即遍历我的所有列表？

换句话说：我怎样才能将下面的代码放在for循环中？

A_files1 = unlist(lapply(files1, function(x) x$percentageA[x$position==1000]))
C_files1 = unlist(lapply(files1, function(x) x$percentageC[x$position==1000]))
G_files1 = unlist(lapply(files1, function(x) x$percentageG[x$position==1000]))
T_files1 = unlist(lapply(files1, function(x) x$percentageT[x$position==1000]))

A_files2 = unlist(lapply(files2, function(x) x$percentageA[x$position==1000]))
C_files2 = unlist(lapply(files2, function(x) x$percentageC[x$position==1000]))
G_files2 = unlist(lapply(files2, function(x) x$percentageG[x$position==1000]))
T_files2 = unlist(lapply(files2, function(x) x$percentageT[x$position==1000]))

....

A_files10 = unlist(lapply(files10, function(x) x$percentageA[x$position==1000]))
C_files10 = unlist(lapply(files10, function(x) x$percentageC[x$position==1000]))
G_files10 = unlist(lapply(files10, function(x) x$percentageG[x$position==1000]))
T_files10 = unlist(lapply(files10, function(x) x$percentageT[x$position==1000]))

Answer 1

为了回答您的问题，我创建了一个包含数据框的虚假列表：

n = data.frame(andrea=c(1983, 11, 8),paja=c(1985, 4, 3)) 
s = data.frame(col1=c("aa", "bb", "cc", "dd", "ee")) 
b = data.frame(col1=c(TRUE, FALSE, TRUE, FALSE, FALSE)) 
x = list(n, s, b, 3)   # x contains copies of n, s, b
names(x) <- c("dataframe1","dataframe2","dataframe3","dataframe4")
files1 = x

现在，进入循环中发生的事情：

i = files1
j = "A"

如果您希望数据帧的名称包含在nt中的pedix（在这种情况下为nt = "A"），则必须使用名称（i）：

name_wrong = paste(j, i, sep = "-") 
name       = paste(names(i),j,sep = "-")

所以你获得了：

> name
[1] "dataframe1-A" "dataframe2-A" "dataframe3-A" "dataframe4-A"

我希望这是你需要的。

Answer 2

我认为如果你扁平化数据结构，这些数据会更容易操作。您可以使用一个数据框，而不是10个数据框列表，所有观察结果都按其名称和文件名索引。

生成样本数据并使用问题

中的代码

每个项目只有10或11个点的简化数据我想列表中的项目有不同的行数？

files1 <- list(item1 = data.frame(position = 1:10,
                                  percentageA = 1:10/10,
                                  percentageC = 1:10/10,
                                  percentageG = 1:10/10,
                                  percentageT = 1:10/10),
               item2 = data.frame(position = 1:11,
                                  percentageA = 1:11/20,
                                  percentageC = 1:11/20,
                                  percentageG = 1:11/20,
                                  percentageT = 1:11/20))
str(file)

# Select the 9th position using your code
A_files1 = unlist(lapply(files1, function(x) x$percentageA[x$position==9]))
C_files1 = unlist(lapply(files1, function(x) x$percentageC[x$position==9]))
G_files1 = unlist(lapply(files1, function(x) x$percentageG[x$position==9]))
T_files1 = unlist(lapply(files1, function(x) x$percentageT[x$position==9]))

将数据框列表展平为一个数据框

# Add name to each data frame
# Inspired by this answer
# http://stackoverflow.com/a/18434780/2641825


# For information l[1] creates a single list item
# l[[1]] extracts the data frame from the list
#' @param i index
#' @param listoffiles list of data frames
addname <- function(i, listoffiles){
     dtf <- listoffiles[[i]] # Extract the dataframe from the list
     dtf$name <- names(listoffiles[i]) # Add the name inside the data frame
     return(dtf)
}
# Add the name inside each data frame
files1 <- lapply(seq_along(files1), addname, files1)
str(files1) # look at the structure of the list
files1table <-  Reduce(rbind,files1) 

# Get the values of interest with
files1table$percentageA[files1table$position == 9]
# [1] 0.90 0.45

# Get all Letters of interest with
subset(files1table,position==9)

#   position percentageA percentageC percentageG percentageT  name
# 9         9        0.90        0.90        0.90        0.90 item1
# 19        9        0.45        0.45        0.45        0.45 item2

将所有数据框列表的列表展平为单个数据框

# Now create anoter list, files2, duplicate just for the sake of the example 
files2 <- files1 
# file1 and file2 both have a name column inside their dataframes already 
# Create a list of list of dataframes
lolod <- list(files1 = files1, files2 = files2) 
str(lolod) # a list of lists
# Flatten to a list of dataframes
# Use sapply to keep names based on this answer http://stackoverflow.com/a/9469981/2641825
lod <- sapply(lolod,  Reduce, f=rbind, simplify = FALSE, USE.NAMES = TRUE) 
# Add the name inside each data frame again
addfilename <- function(i, listoffiles){
     dtf <- listoffiles[[i]] # Extract the dataframe from the list
     dtf$filename <- names(listoffiles[i]) # Add the name inside the data frame
     return(dtf)
}
lod <- lapply(seq_along(lod), addfilename, lod)


# Flatten to a dataframe
d <- Reduce(rbind, lod)
# Now the data structure is flattened and much easier to deal with

subset(d,position==9)
#    position percentageA percentageC percentageG percentageT  name filename
# 9         9        0.90        0.90        0.90        0.90 item1   files1
# 19        9        0.45        0.45        0.45        0.45 item2   files1
# 30        9        0.90        0.90        0.90        0.90 item1   files2
# 40        9        0.45        0.45        0.45        0.45 item2   files2

这个答案比我预期的要长得多。我希望我没有吓到你。灵感来自tidy data，简化数据结构将有助于您以后的工作。如果您在原始数据中提供了名称，则可能不需要这个复杂的列表重命名事项。

R - 使用for循环中的列表名称

2 个答案:

生成样本数据并使用问题

将数据框列表展平为一个数据框

将所有数据框列表的列表展平为单个数据框