I have a list of 4 data frames in R, such that each element of the list contains data ordered by atom number(factor) e.g.
$P1
AtomNum Moiety Energy
1519 P1 -1.2
1519 P1 -1.6
1520 P1 -2.3
1520 P1 -2.4
1521 P1 -3.6
1521 P1 -3.1
$P1'
AtonNum Moiety Energy
1522 P1' -3.5
1522 P1' -3.1
1523 P1' -2.5
1523 P1' -2.9
1524 P1' -1.8
1524 P1' -1.5
$P2
AtomNum Moiety Energy
1525 P2 -1.1
1525 P2 -1.9
1526 P2 -1.8
1526 P2 -1.7
1527 P2 -3.1
1527 P2 -2.9
$P2'
AtomNum Moiety Energy
1528 P2' -3.4
1528 P2' -3.6
1529 P2' -2.7
1529 P2' -2.5
1530 P2' -1.7
1530 P2' -1.2
I don't know if this is possible, but I would like to take the mean of the energy values per atom (group) and then sum these values for each element of the list. Something to the effect of
sum(mean(x$AtomNum)) where x is the list
Can I do this with the data in a list form?
答案 0 :(得分:6)
If you need to have the mean of energy by AtomNum
and Moiety
, and then sum by Moiety
, you can do, using data.table
:
require(data.table)
l_dt <- rbindlist(ll)
l_dt[, mean(Energy), by=.(AtomNum, Moiety)][, .(Energy=sum(V1)), by=Moiety]
# Moiety Energy
#1: P1 -7.10
#2: P2 -6.25
data
ll <- structure(list(P1 = structure(list(AtomNum = c(1519L, 1519L,
1520L, 1520L, 1521L, 1521L), Moiety = structure(c(1L, 1L, 1L,
1L, 1L, 1L), .Label = "P1", class = "factor"), Energy = c(-1.2,
-1.6, -2.3, -2.4, -3.6, -3.1)), .Names = c("AtomNum", "Moiety",
"Energy"), class = "data.frame", row.names = c(NA, -6L)), P2 = structure(list(
AtomNum = c(1525L, 1525L, 1526L, 1526L, 1527L, 1527L), Moiety = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "P2", class = "factor"), Energy = c(-1.1,
-1.9, -1.8, -1.7, -3.1, -2.9)), .Names = c("AtomNum", "Moiety",
"Energy"), class = "data.frame", row.names = c(NA, -6L))), .Names = c("P1",
"P2"))
EDIT
In case you have a data.frame df
and not a list, you can do:
require(data.table)
setDT(df)[, mean(Energy), by=.(AtomNum, Moiety)][, .(Energy=sum(V1)), by=Moiety]
or, as mentioned by @DavidArenburg, using dplyr
:
require(dplyr)
df %>%
group_by(Moiety, AtomNum) %>%
summarise(Energy = mean(Energy)) %>%
summarise(sum(Energy))
答案 1 :(得分:3)
Sure, use sapply
to iterate over the list and tapply
to iterate over groups:
sapply(ll, function(x) sum( with(x, tapply(Energy,AtomNum,mean)) ) )
Using @CathG's example data, this returns
P1 P2
-7.10 -6.25
I'd advocate binding the data.frames together instead, as covered in the other answer.