Question

我有一个数据集，需要知道数字1出现的平均次数，数字0出现，数字-1出现。但这不是传统的平均水平。我解释一下：

这是我的数据集的一部分：

position
 1
 1
 1
 0
 0
-1
 0
-1
-1
-1
-1
-1
 1
 1

因此，如果我按矢量显示每个数字出现的次数，我会：

position '1'   position '-1'  position '0'
  X1  X2         X1  X2         X1  X2
  1   1          -1  -1          0   0
  1   1              -1          0
  1                  -1
                     -1
                     -1

这样我可以找到1的平均值为：（X1 + X2）/ 2其中2是出现的向量数。这取决于并且可以是由数字出现的连续次数给出的任何数字。

这有点令人困惑，但我希望你理解我的观点。我一直在考虑如何做到这一点，但找不到办法。

非常感谢！

Answer 1

如@KonradRudolph所述，

processSuccess(Response data){ } processError(CustomException ce){ }是要走的路。然后，您可以使用rle来获得正确的格式

split

而且，要进行平均，with(rle(position), split(lengths, values)) # $`-1` # [1] 1 5 # # $`0` # [1] 2 1 # # $`1` # [1] 3 2可以正常工作

tapply

Answer 2

您还可以将diff与library(dplyr) data %>% mutate(group = c(0, cumsum(diff(position)!=0))) %>% group_by(position) %>% summarise(mean = n()/length(unique(group))) Source: local data frame [3 x 2] position mean (int) (dbl) 1 -1 3.0 2 0 1.5 3 1 2.5：

一起使用

this.cLIENT_BranchesTableAdapter.Fill(this.gcDataSet.CLIENT_Branches);

Answer 3

有点冗长，但这显示了所有这些如何结合在一起：

library(dplyr)
position <- c(1, 1, 1, 0, 0, -1, 0, -1, -1, -1, -1, -1, 1, 1)
rle_pos <- rle(position)

df <- data_frame(position_code = rle_pos$values,
                 length = rle_pos$lengths)

df
# Source: local data frame [6 x 2]
# 
#   position_code length
#           (dbl)  (int)
# 1             1      3
# 2             0      2
# 3            -1      1
# 4             0      1
# 5            -1      5
# 6             1      2

df %>%
  group_by(position_code) %>%
  summarise(count = n(),
            sum_lengths = sum(length)) %>%
  mutate(average = sum_lengths / count)

# Source: local data frame [3 x 4]
# 
#   position_code count sum_lengths average
#           (dbl) (int)       (int)   (dbl)
# 1            -1     2           6     3.0
# 2             0     2           3     1.5
# 3             1     2           5     2.5

如何将我的数据集除以R中出现的值的次数

3 个答案: