我有一个数据框如Follows,其值范围从-100到+100,大约有2000列和60000行。我想单独添加负值和正值,但它给我的值是不对的。我的数据框看起来像:
-40.11814993 -42.32948849 -43.60532899 -44.5204376 -41.63980543
-22.37778647 -25.46700883 -27.81140156 -28.82498654 -25.35257089
7.686002395 5.269545374 2.654357646 1.929572443 4.904498013
17.73773603 15.68456051 14.07506837 13.69786317 14.47200364
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
56.72239048 53.39504497 52.53564149 47.32158272 43.41294833
59.792111 54.45067217 52.77495286 48.27796907 44.49740268
63.99660216 56.19211308 53.75103818 50.25612484 46.40220142
59.43969877 50.4747657 47.5165962 44.37596015 40.85330564
52.78043922 42.79880307 39.09252048 36.4381745 33.07448607
我使用以下代码,如果有人进行更正,我想按列添加:
trialdata=read.table('trial.csv',header=TRUE,sep=',')
frame=data.frame(trialdata[3:20])
negativesum=apply(frame<0, MARGIN=2, FUN = sum, na.rm = TRUE)
positivesum=apply(frame>0, MARGIN=2, FUN = sum, na.rm = TRUE)
答案 0 :(得分:1)
无需使用申请,请尝试以下:
#dummy dataframe
df <-
read.table(text="-40.11814993 -42.32948849 -43.60532899 -44.5204376 -41.63980543
-22.37778647 -25.46700883 -27.81140156 -28.82498654 -25.35257089
7.686002395 5.269545374 2.654357646 1.929572443 4.904498013
17.73773603 15.68456051 14.07506837 13.69786317 14.47200364
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
56.72239048 53.39504497 52.53564149 47.32158272 43.41294833
59.792111 54.45067217 52.77495286 48.27796907 44.49740268
63.99660216 56.19211308 53.75103818 50.25612484 46.40220142
59.43969877 50.4747657 47.5165962 44.37596015 40.85330564
52.78043922 42.79880307 39.09252048 36.4381745 33.07448607")
#result
sum(df[df<0],na.rm = TRUE)
#[1] -342.047
sum(df[df>0],na.rm = TRUE)
#[1] 1328.735
答案 1 :(得分:1)
假设你想要列式和(因为你使用apply(.., 2, ...)
),我建议如下(使用假数据):
set.seed(123)
frame <- matrix(sample(-100:100, 100, replace = TRUE), ncol = 10)
ind_pos <- which(frame>0, arr.ind = TRUE)
ind_neg <- which(frame<=0, arr.ind = TRUE)
data.frame(positive = tapply(frame[ind_pos], ind_pos[,2], sum), negative = tapply(frame[ind_neg], ind_neg[,2], sum))
positive negative
1 319 -161
2 314 -267
3 345 -113
4 323 -248
5 72 -383
6 202 -338
7 280 -171
8 142 -293
9 346 -227
10 122 -293
答案 2 :(得分:1)
如果我们需要获得列式求和,我们可以使用apply系列中的任何一个来获取输出。在这里,我们使用vapply
,这会更快一些。 (使用来自@ zx8754帖子的数据)
vapply(df, function(x) c(sum(x[x>0], na.rm=TRUE),
sum(x[x<=0], na.rm=TRUE)), double(2L))
# V1 V2 V3 V4 V5
#[1,] 318.15498 278.2655 262.40018 242.29725 227.61685
#[2,] -62.49594 -67.7965 -71.41673 -73.34542 -66.99238
使用dplyr
library(dplyr)
library(tidyr)
summarise_each(df, funs(positive=sum(.[.>0], na.rm=TRUE),
negative=sum(.[.<=0], na.rm=TRUE))) %>%
gather(Var, Val) %>%
separate(Var, c('Var1', 'Var2')) %>%
spread(Var1, Val)
# Var2 V1 V2 V3 V4 V5
#1 negative -62.49594 -67.7965 -71.41673 -73.34542 -66.99238
#2 positive 318.15498 278.2655 262.40018 242.29725 227.61685
而不是对summarise_each
的一次调用,在bind_rows
summarise_each
更短的选项就是 bind_rows(summarise_each(df, funs(positive=sum(.[.>0], na.rm=TRUE))),
summarise_each(df, funs(negative=sum(.[.<=0], na.rm=TRUE)))
)
{{1}}