Question

女士们，先生们，我在汇总数据样本时遇到了一个问题，同时希望查看尝试的方法产生的“零计数”。我的数据如下：

library(dplyr)
set.seed(529)
sampledata <- data.frame(StartPos = rep(1:10, times = 10),
              Velocity = c(sample(c(-36, 36), 100, replace = T)),
              Response = c(sample(c("H", "M", "W"), 50, replace=T),
                           sample(c("M", "W"), 50, replace = T)))

数据由100行组成，起始位置的范围为1-10（每行随机生成10次（类似于起始位置3的20倍，可能存在20次））。每个开始位置也有一个响应，可能是“ H”（命中），“ M”（未命中）或“ W”（错）。某些StartPositions可能没有H。还有一个名为“速度”（Velocity）的列，其值-36和36描述了从特定StartPos（右侧-36，左侧36）开始的Stimlus方向。

我唯一真正关心的是StartPos和带点击率的力度-用于随后的百分比计算。

要计算每边进行的测试次数，我创建了以下过滤器/计数器：

numbofrunsperside <- sampledata %>%
  mutate(Direction = case_when( # add direction
    Velocity < 0 ~ "Right",
    Velocity > 0 ~ "Left",
    TRUE ~ "None")) %>%
  group_by(StartPos, Direction) %>% # for each combination
  count(Velocity, .drop=FALSE) # count
numbofrunsperside

对于具有各自的起始位置和方向（左/右）的点击计数：

sampledata_hit_counts <- sampledata %>%
  mutate(Direction = case_when( # add direction 
    Velocity < 0 ~ "Right",
    Velocity > 0 ~ "Left",
    TRUE ~ "None")) %>% 
  filter(Response == "H") %>% 
  group_by(StartPos, Direction, .drop=FALSE) %>% # for each combination 
  count(StartPos, .drop=FALSE) # count
sampledata_hit_counts

此问题发生在这里：每边数据帧的运行次数有20行，而sampledata_hit_counts的行数只有12行。

当我尝试使用以下方法计算命中百分比时，出现以下错误消息：

sampledata_hit_counts$PTest = sampledata_hit_counts$n / 
numbofrunsperside$n

$<-.data.frame（*tmp*中的错误，PTest，值= c（0.2，0.2，0.25，0.166666666666667，：替换有20行，数据有12行另外：警告消息：在sampledata_hit_counts $ n / numbofrunsperside $ n中：较长的物体长度不是较短的物体长度的倍数

解决此问题的一种方法是在sampledata_hit_counts中包含用于不同方向和startpos的“零计数”-以便每个df中的行数相同。很遗憾，我不知道这样做的方法...非常感谢您的帮助！

Answer 1

您可以进行左联接：

library(dplyr)

numbofrunsperside %>%
    left_join(
        sampledata_hit_counts, 
        by = c("StartPos", "Direction"), 
        suffix = c("_runs", "_hits")
    ) %>% 
    mutate(
        p_test = ifelse(is.na(n_hits), 0, n_hits) / n_runs
    ) %>% 
    pull(p_test)
#[1] 0.2000000 0.0000000 0.0000000 0.1666667 0.0000000 0.0000000 0.3333333 0.1428571 0.0000000 0.1250000 0.1666667 0.5000000 0.2000000
#[14] 0.4000000 0.1666667 0.0000000 0.0000000 0.3333333 0.5000000 0.0000000

有没有一种方法可以通过对样本数据使用dplyr来显示“零计数”？

1 个答案: