我有以下dataframe yy
fundId Year Qtr StockCurrentQtr StockNextQtr
1 2015 1 1,2,3,4,5 2,3,4,51
1 2015 2 2,3,4,51 7,8,9,4,2
1 2015 3 7,8,9,4,2 NA
2 2015 1 10,11,14 14,16,19
2 2015 2 14,16,19 20,21,45
2 2015 3 20,21,45 NA
我想知道StockNextQtr
StocCurrentQtr
每一行的group_by
和fundId
之间的差异,或者列'StockCurrentQtr'{{1} } group_by
fundId
我遇到以下错误:
StockDiff列的长度必须为3(组大小)或1,而不是5
答案 0 :(得分:2)
您不必在这里使用apply
。只是rowwise
,即
library(dplyr)
df %>%
mutate_at(vars(4:5), funs(strsplit(., ','))) %>%
rowwise() %>%
mutate(new = toString(setdiff(StocCurrentQtr, StockNextQtr)))
给出,
Source: local data frame [6 x 6] Groups: <by row> # A tibble: 6 x 6 fundId Year Qtr StocCurrentQtr StockNextQtr new <int> <int> <int> <list> <list> <chr> 1 1 2015 1 <chr [5]> <chr [4]> 1, 5 2 1 2015 2 <chr [4]> <chr [5]> 3, 51 3 1 2015 3 <chr [5]> <chr [1]> 7, 8, 9, 4, 2 4 2 2015 1 <chr [3]> <chr [3]> 10, 11 5 2 2015 2 <chr [3]> <chr [3]> 14, 16, 19 6 2 2015 3 <chr [3]> <chr [1]> 20, 21, 45
以R为底的等价物,
mapply(function(x, y)toString(setdiff(x, y)), strsplit(df$StocCurrentQtr, ','),
strsplit(df$StockNextQtr, ','))
#[1] "1, 5" "3, 51" "7, 8, 9, 4, 2" "10, 11" "14, 16, 19" "20, 21, 45"
如果缺少StockNextQtr
,我们可以先创建它,然后以与以前相同的方式继续操作,即
df %>%
group_by(fundId) %>%
mutate(StockNextQtr = lead(StocCurrentQtr)) %>%
mutate_at(vars(4:5), funs(strsplit(., ','))) %>%
rowwise() %>%
mutate(new = toString(setdiff(StocCurrentQtr, StockNextQtr)))
答案 1 :(得分:0)
我找到了另一种方式
yy <- yy %>% group_by(fundId, Year, Qtr) %>% mutate(new = paste(setdiff((unlist(strsplit(StockCurrentQtr,split = ","))), unlist(strsplit(StockNextQtr,split = ","))),collapse = ","))