r中频率计数从一个类别到另一个类别的分布

时间:2018-12-12 16:49:53

标签: r dataframe

我有一个数据框

emp.data <- data.frame(
  emp_id = c(1:32), 
  dealer_code = c("A1","A2","A3","A4","A5","A3","A8","A4","A6","A6","A7","A1","A8","A9","A1","A2","A7","A8","A1","A1","A2","A2","A5","A4","A4","A10","A10","A10","A10","A3","A3","A11"),
  region = c("UK","US","OZ","IN","US","OZ","UK","IN","PAK","PAK","IN","UK","UK","OZ","UK","US","IN","UK","UK","UK","US","US","US","IN","IN","PAK","PAK","PAK","PAK","OZ","OZ","UK"))

如果我使用频率表

df <- emp.data %>%
  group_by(dealer_code) %>%
  count() 

我想将计数大于3的交易者代码分配给计数小于3的交易者代码,但前提是供者和接收者应具有相同的区域(以及许多其他条件)

预期输出为

emp.op <- data.frame(
  emp_id = c(1:32), 
  dealer_code = c("A1","A2","A3","A4","A5","A3","A8","A4","A6","A6","A7","A1","A8","A9","A1","A2","A7","A8","A1","A1","A2","A2","A5","A4","A4","A10","A10","A10","A10","A3","A3","A11"),
  region = c("UK","US","OZ","IN","US","OZ","UK","IN","PAK","PAK","IN","UK","UK","OZ","UK","US","IN","UK","UK","UK","US","US","US","IN","IN","PAK","PAK","PAK","PAK","OZ","OZ","UK"),
  changed_code =c("A1","A2","A3","A4","A5","A3","A8","A4","A6","A6","A7","A1","A8","A9","A1","A2","A7","A8","A11","A11","A2","A5","A5","A4","A7","A10","A6","A10","A10","A3","A9","A11"))

在changed_code列中,经销商代码已移位,并且总体计数基于同一区域是统一的

df_2 <- emp.op %>%
  group_by(changed_code) %>%
  count() 

0 个答案:

没有答案