在数据框的列中添加负数和正数

时间:2015-02-21 22:10:48

标签: r

我有一个数据框如Follows,其值范围从-100到+100,大约有2000列和60000行。我想单独添加负值和正值,但它给我的值是不对的。我的数据框看起来像:

-40.11814993    -42.32948849    -43.60532899    -44.5204376 -41.63980543
-22.37778647    -25.46700883    -27.81140156    -28.82498654    -25.35257089
7.686002395 5.269545374 2.654357646 1.929572443 4.904498013
17.73773603 15.68456051 14.07506837 13.69786317 14.47200364
NA  NA  NA  NA  NA
NA  NA  NA  NA  NA
NA  NA  NA  NA  NA
NA  NA  NA  NA  NA
NA  NA  NA  NA  NA
56.72239048 53.39504497 52.53564149 47.32158272 43.41294833
59.792111   54.45067217 52.77495286 48.27796907 44.49740268
63.99660216 56.19211308 53.75103818 50.25612484 46.40220142
59.43969877 50.4747657  47.5165962  44.37596015 40.85330564
52.78043922 42.79880307 39.09252048 36.4381745  33.07448607

我使用以下代码,如果有人进行更正,我想按列添加:

trialdata=read.table('trial.csv',header=TRUE,sep=',')
frame=data.frame(trialdata[3:20])
negativesum=apply(frame<0, MARGIN=2, FUN = sum, na.rm = TRUE)
positivesum=apply(frame>0, MARGIN=2, FUN = sum, na.rm = TRUE)

3 个答案:

答案 0 :(得分:1)

无需使用申请,请尝试以下:

#dummy dataframe
df <- 
read.table(text="-40.11814993    -42.32948849    -43.60532899    -44.5204376 -41.63980543
-22.37778647    -25.46700883    -27.81140156    -28.82498654    -25.35257089
7.686002395 5.269545374 2.654357646 1.929572443 4.904498013
17.73773603 15.68456051 14.07506837 13.69786317 14.47200364
NA  NA  NA  NA  NA
NA  NA  NA  NA  NA
NA  NA  NA  NA  NA
NA  NA  NA  NA  NA
NA  NA  NA  NA  NA
56.72239048 53.39504497 52.53564149 47.32158272 43.41294833
59.792111   54.45067217 52.77495286 48.27796907 44.49740268
63.99660216 56.19211308 53.75103818 50.25612484 46.40220142
59.43969877 50.4747657  47.5165962  44.37596015 40.85330564
52.78043922 42.79880307 39.09252048 36.4381745  33.07448607")

#result
sum(df[df<0],na.rm = TRUE)
#[1] -342.047
sum(df[df>0],na.rm = TRUE)
#[1] 1328.735

答案 1 :(得分:1)

假设你想要列式和(因为你使用apply(.., 2, ...)),我建议如下(使用假数据):

set.seed(123)
frame <- matrix(sample(-100:100, 100, replace = TRUE), ncol = 10)
ind_pos <- which(frame>0, arr.ind = TRUE)
ind_neg <- which(frame<=0, arr.ind = TRUE)

data.frame(positive = tapply(frame[ind_pos], ind_pos[,2], sum), negative = tapply(frame[ind_neg], ind_neg[,2], sum))
   positive negative
1       319     -161
2       314     -267
3       345     -113
4       323     -248
5        72     -383
6       202     -338
7       280     -171
8       142     -293
9       346     -227
10      122     -293

答案 2 :(得分:1)

如果我们需要获得列式求和,我们可以使用apply系列中的任何一个来获取输出。在这里,我们使用vapply,这会更快一些。 (使用来自@ zx8754帖子的数据)

 vapply(df, function(x)  c(sum(x[x>0], na.rm=TRUE), 
               sum(x[x<=0], na.rm=TRUE)), double(2L))
 #          V1       V2        V3        V4        V5
 #[1,] 318.15498 278.2655 262.40018 242.29725 227.61685
 #[2,] -62.49594 -67.7965 -71.41673 -73.34542 -66.99238

使用dplyr

的选项
 library(dplyr)
 library(tidyr) 
 summarise_each(df, funs(positive=sum(.[.>0], na.rm=TRUE), 
                       negative=sum(.[.<=0], na.rm=TRUE))) %>%
              gather(Var, Val) %>%
              separate(Var, c('Var1', 'Var2')) %>% 
              spread(Var1, Val)
 #     Var2        V1       V2        V3        V4        V5
 #1 negative -62.49594 -67.7965 -71.41673 -73.34542 -66.99238
 #2 positive 318.15498 278.2655 262.40018 242.29725 227.61685

而不是对summarise_each的一次调用,在bind_rows

之后,summarise_each更短的选项就是 bind_rows(summarise_each(df, funs(positive=sum(.[.>0], na.rm=TRUE))), summarise_each(df, funs(negative=sum(.[.<=0], na.rm=TRUE))) )
{{1}}