Question

I have a list of 4 data frames in R, such that each element of the list contains data ordered by atom number(factor) e.g.

$P1
AtomNum  Moiety  Energy
1519     P1      -1.2
1519     P1      -1.6
1520     P1      -2.3
1520     P1      -2.4
1521     P1      -3.6
1521     P1      -3.1 

$P1' 
AtonNum   Moiety   Energy
1522      P1'      -3.5
1522      P1'      -3.1    
1523      P1'      -2.5
1523      P1'      -2.9
1524      P1'      -1.8
1524      P1'      -1.5

$P2 
AtomNum   Moiety    Energy
1525      P2        -1.1
1525      P2        -1.9
1526      P2        -1.8 
1526      P2        -1.7
1527      P2        -3.1
1527      P2        -2.9

$P2'
AtomNum   Moiety    Energy
1528       P2'      -3.4 
1528       P2'      -3.6 
1529       P2'      -2.7 
1529       P2'      -2.5 
1530       P2'      -1.7 
1530       P2'      -1.2

I don't know if this is possible, but I would like to take the mean of the energy values per atom (group) and then sum these values for each element of the list. Something to the effect of

sum(mean(x$AtomNum)) where x is the list

Can I do this with the data in a list form?

Answer 1

If you need to have the mean of energy by AtomNum and Moiety, and then sum by Moiety, you can do, using data.table:

require(data.table)
l_dt <- rbindlist(ll)
l_dt[, mean(Energy), by=.(AtomNum, Moiety)][, .(Energy=sum(V1)), by=Moiety]
#   Moiety Energy
#1:     P1  -7.10
#2:     P2  -6.25

data

ll <- structure(list(P1 = structure(list(AtomNum = c(1519L, 1519L, 
1520L, 1520L, 1521L, 1521L), Moiety = structure(c(1L, 1L, 1L, 
1L, 1L, 1L), .Label = "P1", class = "factor"), Energy = c(-1.2, 
-1.6, -2.3, -2.4, -3.6, -3.1)), .Names = c("AtomNum", "Moiety", 
"Energy"), class = "data.frame", row.names = c(NA, -6L)), P2 = structure(list(
    AtomNum = c(1525L, 1525L, 1526L, 1526L, 1527L, 1527L), Moiety = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "P2", class = "factor"), Energy = c(-1.1, 
    -1.9, -1.8, -1.7, -3.1, -2.9)), .Names = c("AtomNum", "Moiety", 
"Energy"), class = "data.frame", row.names = c(NA, -6L))), .Names = c("P1", 
"P2"))

EDIT

In case you have a data.frame df and not a list, you can do:

require(data.table)
setDT(df)[, mean(Energy), by=.(AtomNum, Moiety)][, .(Energy=sum(V1)), by=Moiety]

or, as mentioned by @DavidArenburg, using dplyr:

require(dplyr)
df %>% 
   group_by(Moiety, AtomNum) %>% 
   summarise(Energy = mean(Energy)) %>% 
   summarise(sum(Energy))

Answer 2

Sure, use sapply to iterate over the list and tapply to iterate over groups:

sapply(ll, function(x) sum( with(x, tapply(Energy,AtomNum,mean)) ) )

Using @CathG's example data, this returns

   P1    P2 
-7.10 -6.25

I'd advocate binding the data.frames together instead, as covered in the other answer.

Sub-indexing a list to apply a function in R

2 个答案: