我有一个适合的过程,运行100次。每次输出都是一个数据帧 - 我使用n1 = c(4, 5, 6)
df1 = data.frame(n1, row.names = c("height", "weight", "favcolor"))
n2 = c(2, 3, 5, 7)
df2 = data.frame(n2, row.names = c("height", "weight", "inseam", "favcolor"))
来创建一个包含100个数据帧的列表。
前两个数据框可能看起来像这样(我有超过1列):
arguments imply differing number of rows
我想组合这些数据帧(例如,取平均高度值或权重的标准偏差)。
我的第一个想法是将此列表转换为数据框 - 因为new_row <- c(NA, NA)
row.names(new_row) <- "inseam"
而无法正常工作。
我的第二个想法是为每个没有出现的rowname插入NAs(所以,我会添加:
height 3
weight 4
inseam 5
favcolor 6.5
但是我无法让这个原因付诸实施,而且我甚至无法测试是否添加了一个&#34;内缝&#34;包含到df1末尾的NAs的行,当我取出&#34; inseam&#34;的平均值时我得到了正确的答案。
在这种情况下,正确的平均值是:
> mega_df[1]
[[1]]
coeff error pval
rf2 -1.15099200 0.5752430 4.540538e-02
rf3 -0.53430218 0.4928974 2.783635e-01
rf4 0.08784138 0.4933079 8.586711e-01
rf5 0.96002601 0.5070957 5.833327e-02
rm2 -0.36188368 0.4626464 4.340949e-01
rm3 0.01805873 0.4355164 9.669251e-01
rm4 0.45008373 0.4319557 2.974268e-01
rm5 1.04056503 0.4441024 1.912556e-02
rc_cat1 0.86231928 0.2827566 2.290799e-03
rc_cat2 1.21335473 0.2448206 7.192321e-07
rc_cat3 0.96196637 0.2044198 2.528247e-06
rc_cat4 1.04477290 0.3302644 1.559142e-03
rc_cat5 -0.58902623 1.5893867 7.109357e-01
rc_cat6 0.62569607 0.1720676 2.765407e-04
rc_cat7 0.29381724 0.4115594 4.752815e-01
rs2 0.12333678 0.7186019 8.637250e-01
rs3 1.22018613 0.6423970 5.750837e-02
rs4 1.96075220 0.6454184 2.381892e-03
rs5 2.58404946 0.6543862 7.853927e-05
1|3 0.01561497 0.4851330 9.743229e-01
3|4 1.82853786 0.4937675 2.128663e-04
4|5 3.73480100 0.5023435 1.047616e-13
> mega_df[2]
[[1]]
coeff error pval
rf2 -0.23364248 5.849338e-01 6.895734e-01
rf3 0.24054894 5.219730e-01 6.449094e-01
rf4 0.84072979 5.208259e-01 1.064788e-01
rf5 1.47867154 5.346970e-01 5.684640e-03
rm2 -0.29555400 4.465509e-01 5.080612e-01
rm3 0.31147504 4.131024e-01 4.508553e-01
rm4 0.73696523 4.141224e-01 7.514424e-02
rm5 1.14273148 4.271863e-01 7.472508e-03
rc_cat1 1.27479299 3.094432e-01 3.794740e-05
rc_cat2 1.10917318 2.619011e-01 2.284654e-05
rc_cat3 0.65782540 2.161602e-01 2.340525e-03
rc_cat4 0.40512225 3.301662e-01 2.198131e-01
rc_cat5 12.78797722 5.612311e-08 0.000000e+00
rc_cat6 0.41622889 1.677804e-01 1.310894e-02
rc_cat7 0.16833629 3.806498e-01 6.583198e-01
rs2 -0.02279305 7.225878e-01 9.748360e-01
rs3 0.68299485 6.759050e-01 3.122608e-01
rs4 1.36149302 6.780722e-01 4.465519e-02
rs5 2.18484594 6.863696e-01 1.456612e-03
1|3 0.35419237 5.844931e-01 5.445266e-01
3|4 2.12603072 5.928308e-01 3.354863e-04
4|5 3.97564508 5.999369e-01 3.431064e-11
> mega_df[3]
[[1]]
coeff error pval
rf2 -0.2733408 0.5884741 6.422961e-01
rf3 0.1764257 0.5257697 7.372050e-01
rf4 0.6504428 0.5248386 2.152271e-01
rf5 1.3967757 0.5356706 9.119879e-03
rm2 -0.2361284 0.4870015 6.277745e-01
rm3 0.2078729 0.4609270 6.519977e-01
rm4 0.6390950 0.4622065 1.667555e-01
rm5 1.1738653 0.4730686 1.308730e-02
rc_cat1 0.9337627 0.2958630 1.599133e-03
rc_cat2 1.0292916 0.2493133 3.651281e-05
rc_cat3 0.7088285 0.2012026 4.267587e-04
rc_cat4 0.6296966 0.3664883 8.576193e-02
rc_cat6 0.5475018 0.1720841 1.464662e-03
rc_cat7 0.4521113 0.3588440 2.077017e-01
rs2 -0.4663666 0.7031265 5.071541e-01
rs3 0.7810059 0.6489673 2.287985e-01
rs4 1.5178641 0.6522175 1.995271e-02
rs5 2.1916080 0.6578769 8.643075e-04
1|3 0.2569225 0.4659460 5.813597e-01
3|4 2.0648302 0.4769118 1.493906e-05
4|5 3.9312070 0.4855339 5.648509e-16
所以,我的问题:
如果您有一个数据框列表,其中行名称是有意义的,需要像上面的示例一样进行聚合,那么最好的方法是什么?
################修改
以下是我的3个数据集的完整数据:
avcoef averror avpval std(coef)
rf2 W X Y Z
rf3 ...
.
.
.
我希望做一些最终返回的基本聚合:
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript" src="http://code.jquery.com/jquery-1.4.4.min.js"></script>
<script type="text/javascript">
function searchText(text) {
var pattern = new RegExp("/" + text + "/g");
var totalMatchCount = ($('Test').text().match(pattern) || []).length;
alert(totalMatchCount)
}
</script>
</head>
<body>
<div id="Test">
2015-11-05 22:01:00,062 WARN [] (Thread-1169 (HornetQ-client-global-threads-643998766)) The license you are using has expired or is bad. value: -5
2015-11-05 22:02:00,026 WARN [] (Thread-1165 (-client-global-threads-643998766)) The license you are using has expired or is bad. value: -5
2015-11-05 22:03:00,049 WARN [] (Thread-1180 (-client-global-threads-643998766)) The license you are using has expired or is bad. value: -5
</div>
<button onclick="searchText('WARN')">Search</button>
</body>
</html>
答案 0 :(得分:3)
我想你可以在每个数据集中创建一个新列,它将包含行名,然后相应地合并,如:
l <- lapply(list(df1, df2), function(x) {x$New <- row.names(x) ; x})
Res <- Reduce(function(...) merge(..., by = "New", all = TRUE), l)
cbind(Res[1], Means = rowMeans(Res[-1], na.rm = TRUE))
# Row.names Means
# 1 favcolor 6.5
# 2 height 3.0
# 3 inseam 5.0
# 4 weight 4.0
这可能与this
高度相关修改:适用于新数据集
l <- lapply(list(mega_df1, mega_df2, mega_df3), function(x) {x$RowName <- row.names(x) ; x})
Res <- Reduce(function(...) merge(..., by = "RowName", all = TRUE), l)
library(data.table) ## v1.9.6+
dcast(melt(setDT(Res), "RowName"),
RowName ~ sub("\\..*", "", variable),
mean,
na.rm = TRUE,
value.var = "value")
# RowName coeff error pval
# 1: cat1 1.0236250 0.2960209 1.309293e-03
# 2: cat2 1.1172732 0.2520117 2.002619e-05
# 3: cat3 0.7762068 0.2072609 9.232706e-04
# 4: cat4 0.6931972 0.3423063 1.023781e-01
# 5: cat5 6.0994755 0.7946934 3.554678e-01
# 6: cat6 0.5298089 0.1706440 4.950048e-03
# 7: cat7 0.3047549 0.3836844 4.471010e-01