两个逗号分隔的字符串之间的区别

时间:2018-12-24 07:14:00

标签: r dataframe dplyr tidyverse

我有以下 yy

    fundId  Year    Qtr   StockCurrentQtr    StockNextQtr
    1       2015    1     1,2,3,4,5         2,3,4,51
    1       2015    2     2,3,4,51          7,8,9,4,2
    1       2015    3     7,8,9,4,2         NA
    2       2015    1     10,11,14          14,16,19
    2       2015    2     14,16,19          20,21,45
    2       2015    3     20,21,45          NA

我想知道StockNextQtr StocCurrentQtr每一行的group_byfundId之间的差异,或者列'StockCurrentQtr'{{1} } group_by

fundId

我遇到以下错误:

  

StockDiff列的长度必须为3(组大小)或1,而不是5

2 个答案:

答案 0 :(得分:2)

您不必在这里使用apply。只是rowwise,即

library(dplyr)

df %>% 
 mutate_at(vars(4:5), funs(strsplit(., ','))) %>% 
 rowwise() %>% 
 mutate(new = toString(setdiff(StocCurrentQtr, StockNextQtr)))

给出,

Source: local data frame [6 x 6]
Groups: <by row>

# A tibble: 6 x 6
  fundId  Year   Qtr StocCurrentQtr StockNextQtr new          
   <int> <int> <int> <list>         <list>       <chr>        
1      1  2015     1 <chr [5]>      <chr [4]>    1, 5         
2      1  2015     2 <chr [4]>      <chr [5]>    3, 51        
3      1  2015     3 <chr [5]>      <chr [1]>    7, 8, 9, 4, 2
4      2  2015     1 <chr [3]>      <chr [3]>    10, 11       
5      2  2015     2 <chr [3]>      <chr [3]>    14, 16, 19   
6      2  2015     3 <chr [3]>      <chr [1]>    20, 21, 45

以R为底的等价物,

mapply(function(x, y)toString(setdiff(x, y)), strsplit(df$StocCurrentQtr, ','), 
                                              strsplit(df$StockNextQtr, ','))

#[1] "1, 5"          "3, 51"         "7, 8, 9, 4, 2" "10, 11"        "14, 16, 19"    "20, 21, 45"

如果缺少StockNextQtr,我们可以先创建它,然后以与以前相同的方式继续操作,即

df %>% 
 group_by(fundId) %>% 
 mutate(StockNextQtr = lead(StocCurrentQtr)) %>% 
 mutate_at(vars(4:5), funs(strsplit(., ','))) %>% 
 rowwise() %>% 
 mutate(new = toString(setdiff(StocCurrentQtr, StockNextQtr)))

答案 1 :(得分:0)

我找到了另一种方式

yy <- yy %>% group_by(fundId, Year, Qtr) %>% mutate(new = paste(setdiff((unlist(strsplit(StockCurrentQtr,split = ","))), unlist(strsplit(StockNextQtr,split = ","))),collapse = ","))