我开始使用R而且我的问题很可能很简单,但是我花了很多时间试图找出我做错了什么并且无济于事。
我要感谢你,因为我上周发现这个网站正在搜索其他问题。但现在作为一个新人,通常很难解释其他人的代码。
我的RStudio版本是:1.1.442
我的问题是,我有两个数据框,一个有几年,一个带有一些项目可以在几个拖网中找到,我需要总结项目并制作另一个变量。其中出现了每年和拖网项目的总结。
所以,我做了一个循环,为了有相同的底拖网和同一年,以便总结项目。
我简化了数据库。
BT<- c(1, 1, 2, 2, 2, 3, 3, 3, 3, 3)
YEAR<- c(2007, 2007, 2008, 2008, 2008, 2009, 2009, 2009, 2009, 2009)
W<- c(95, 6, 60, 50, 4, 21, 56, 44, 23, 4)
Data1= data.frame(BT,YEAR,W)
Trawl<- c(1, 2, 3)
Year<- c(2007, 2008, 2009)
Data2= data.frame(Trawl,Year)
peso=cbind()
for(i in 1:length(Data1$BT)) {
ind=which(Data2$Trawl == Data1$BT[i] & Data2$Year == Data1$YEAR[i])
print(i)
print(ind)
print(Data1$W[ind])
peso[i]=Data1$W[ind]
sumaGr[i]=colSums(peso[i])
}
我明白了:
colSums中的错误(peso [i]): &#39; X&#39;必须是至少两维的数组
但我不知道如何解决它。 我将非常感谢您的帮助和建议。 先感谢您。
答案 0 :(得分:1)
您似乎正在实施一些拆分应用组合计算。您可以通过以下几种方式进行操作。
Data3 <- aggregate(Data1$W, by = list(Data1$BT, Data1$YEAR), sum)
colnames(Data3) <- c("Trawl", "YEAR", "sumaGr")
Data3
dplyr
Data3 <- Data1 %>%
group_by(BT, YEAR) %>%
summarise(sumaGr = sum(W)) %>%
rename(Trawl = BT)
Data3
data.table
library(data.table)
Data3 <- setDT(Data1)[,.(sumaGr = sum(W)), by = .(BT, YEAR)]
setnames(Data2, "BT", "Trawl")
Data3
以下是基础R解决方案的输出:
# Trawl YEAR sumaGr
# 1 1 2007 101
# 2 2 2008 114
# 3 3 2009 148
答案 1 :(得分:0)
library(dplyr)
df <- inner_join(Data1, Data2, by=c("YEAR"="Year"))
df %>% group_by(Year, Trawl) %>% mutate(sum = sum(W), avg = mean(W))
# A tibble: 10 x 6
# Groups: Year, Trawl [3]
BT Year W Trawl sum avg
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1. 2007. 95. 1. 101. 50.5
2 1. 2007. 6. 1. 101. 50.5
3 2. 2008. 60. 2. 114. 38.0
4 2. 2008. 50. 2. 114. 38.0
5 2. 2008. 4. 2. 114. 38.0
6 3. 2009. 21. 3. 148. 29.6
7 3. 2009. 56. 3. 148. 29.6
8 3. 2009. 44. 3. 148. 29.6
9 3. 2009. 23. 3. 148. 29.6
10 3. 2009. 4. 3. 148. 29.6
df %>% group_by(Year, Trawl) %>% summarise(sum = sum(W), avg = mean(W))
# A tibble: 3 x 4
# Groups: Year [?]
Year Trawl sum avg
<dbl> <dbl> <dbl> <dbl>
1 2007. 1. 101. 50.5
2 2008. 2. 114. 38.0
3 2009. 3. 148. 29.6
答案 2 :(得分:0)
if(!require(dplyr)) {
install.packages("dplyr")
require(dplyr)
} # for 'inner_join()' install and/or load package dplyr
# Rename for fusion of the two data frames
colnames(Data1) <- c("BT", "Year", "W")
# colnames for 'By=' must look the same!
data1.new <- inner_join(Data1, Data2, by="Year")
# inspect data1.new
data1.new
# split by "Trawl"
df.list <- split(data1.new, data1.new$Trawl)
# summarize each of the data frames in this list
summaries.list <- lapply(df.list, summary)
# But I think what youw ant is colMeans, colSums etc.
colMeans.list <- lapply(df.list, colMeans)
colSums.list <- lapply(df.list, colSums)
# colMeans(df) is acatually function(df) {apply(df, 2, FUN=mean)}
# in this way you can use any variadic function to make it
# applicable to a whole column (variadic functions are those
# which can take any number of arguments).
# if there is a non-variadic function - let's say max():
# let's say
# max() takes only two arguments (that's not true ...)
# but let's assume it takes only two arguments, then
# function(your.vector) Reduce(max, your.vector) makes it variadic
# e.g. maximum of a column:
colMax <- function(df) {apply(df, 2, FUN=function(vec) Reduce(max, vec))}
colMax.list <- lapply(df.list, colMax)
# inspect those objects
colMeans.list
colSums.list
colMax.list
# you can reduce the results using Reduce(rbind, ...)
means.by.trawl.mat <- Reduce(rbind, colMeans.list)
sums.by.trawl.mat <- Reduce(rbind, colSums.list)
maxs.by.trawl.mat <- Reduce(rbind, colMax.list)
# give rownames
rownames(means.by.trawl.mat) <- means.by.trawl.mat[, "BT"]
rownames(sums.by.trawl.mat) <- sums.by.trawl.mat[, "BT"]
rownames(maxs.by.trawl.mat) <- maxs.by.trawl.mat[, "BT"]
# result
> means.by.trawl.mat
BT Year W Trawl
1 1 2007 50.5 1
2 2 2008 38.0 2
3 3 2009 29.6 3
> sums.by.trawl.mat
BT Year W Trawl
2 2 4014 101 2
6 6 6024 114 6
15 15 10045 148 15
> maxs.by.trawl.mat
BT Year W Trawl
1 1 2007 95 1
2 2 2008 60 2
3 3 2009 56 3