将列转换为以R分隔的逗号

时间:2018-08-21 20:58:02

标签: r dataframe

我在Excel中有两个具有大数据的列A和B.我们必须同时考虑列A和B,我试图将列C作为输出。现在,我正在用excel做所有事情。因此,我认为R中可能有解决此问题的方法,但实际上不知道该怎么做。感谢您的帮助..谢谢 我有

 Column A   ColumnB    Column C(output column)
    A1         10           A2
    A2         10           A1
    B1         3         B2,B3,B4
    B2         3         B1,B3,B4
    B3         3         B1,B2,B4
    B4         3         B1,B2,B3
    C1         6          C2,C3
    C2         6          C1,C3
    C3         6          C1,C2

5 个答案:

答案 0 :(得分:3)

我们可以按B列分组,然后找到当前A列字符和该组中整个字符之间的设置差异:

library(tidyverse)
df %>%
  group_by(ColumnB) %>%
  mutate(ColumnC=map_chr(ColumnA, ~toString(setdiff(ColumnA, .x))))

# A tibble: 9 x 3
# Groups:   ColumnB [3]
  ColumnA ColumnB ColumnC   
  <fct>     <int> <chr>     
1 A1           10 A2        
2 A2           10 A1        
3 B1            3 B2, B3, B4
4 B2            3 B1, B3, B4
5 B3            3 B1, B2, B4
6 B4            3 B1, B2, B3
7 C1            6 C2, C3    
8 C2            6 C1, C3    
9 C3            6 C1, C2    

答案 1 :(得分:2)

我认为问题的措词不是很清楚,但我想解释的结果是您希望C列具有B列每组的所有值,而忽略A列的值。如下所示:

  1. nest列A并将其重新连接到原始数据框
  2. flatten,因此您现在有了A列值的向量
  3. 使用setdiff获取非A列的值
  4. 使用str_c折叠成逗号分隔的字符串

您可以看到所需的C列已被复制。

library(tidyverse)
tbl <- structure(list(ColumnA = c("A1", "A2", "B1", "B2", "B3", "B4", "C1", "C2", "C3"), ColumnB = c(10L, 10L, 3L, 3L, 3L, 3L, 6L, 6L, 6L), ColumnC = c("A2", "A1", "B2,B3,B4", "B1,B3,B4", "B1,B2,B4", "B1,B2,B3", "C2,C3", "C1,C3", "C1,C2")), problems = structure(list(row = 9L, col = "ColumnC", expected = "", actual = "embedded null", file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(ColumnA = structure(list(), class = c("collector_character", "collector")), ColumnB = structure(list(), class = c("collector_integer", "collector")), ColumnC = structure(list(), class = c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

tbl %>%
  left_join(
    tbl %>% select(-ColumnC) %>% nest(ColumnA)
  ) %>%
  mutate(
    data = flatten(data),
    output = map2(data, ColumnA, ~ setdiff(.x, .y)),
    output = map_chr(output, ~ str_c(., collapse = ","))
    )
#> Joining, by = "ColumnB"
#> # A tibble: 9 x 5
#>   ColumnA ColumnB ColumnC  data      output  
#>   <chr>     <int> <chr>    <list>    <chr>   
#> 1 A1           10 A2       <chr [2]> A2      
#> 2 A2           10 A1       <chr [2]> A1      
#> 3 B1            3 B2,B3,B4 <chr [4]> B2,B3,B4
#> 4 B2            3 B1,B3,B4 <chr [4]> B1,B3,B4
#> 5 B3            3 B1,B2,B4 <chr [4]> B1,B2,B4
#> 6 B4            3 B1,B2,B3 <chr [4]> B1,B2,B3
#> 7 C1            6 C2,C3    <chr [3]> C2,C3   
#> 8 C2            6 C1,C3    <chr [3]> C1,C3   
#> 9 C3            6 C1,C2    <chr [3]> C1,C2

reprex package(v0.2.0)于2018-08-21创建。

答案 2 :(得分:0)

df = read.table(text = "
ColumnA   ColumnB   
A1         10          
A2         10          
B1         3        
B2         3        
B3         3        
B4         3        
C1         6        
C2         6        
C3         6        
", header=T, stringsAsFactors=F)

library(tidyverse)

df %>%
  group_by(ColumnB) %>%                                         # for each ColumnB value
  mutate(vals = list(ColumnA),                                  # create a list of all Column A values for each row
         vals = map2(vals, ColumnA, ~.x[.x != .y]),             # exclude the value in Column A from that list
         vals = map_chr(vals, ~paste0(.x, collapse = ","))) %>% # combine remaining values in the list                                        
  ungroup()                                                     # forget the grouping

# # A tibble: 9 x 3
#   ColumnA ColumnB vals    
#   <chr>     <int> <chr>   
# 1 A1           10 A2      
# 2 A2           10 A1      
# 3 B1            3 B2,B3,B4
# 4 B2            3 B1,B3,B4
# 5 B3            3 B1,B2,B4
# 6 B4            3 B1,B2,B3
# 7 C1            6 C2,C3   
# 8 C2            6 C1,C3   
# 9 C3            6 C1,C2

答案 3 :(得分:0)

我的理解是找到共享列B当前值的列A的所有其他条目

按B分组,并找到与该值相关联的所有A都可以解决问题(此后进行一些清理,从结果列C中删除A的当前条目)

a <- c("a1", "a2","b1", "b2","b3", "b4","c1","c2","c3","d1")

b <- c(10,10,3,3,3,3,6,6,6,5)

dta <- data.frame(a,b, stringsAsFactors = F)

dta<-dta %>% 
group_by(b) %>% 
mutate(c = paste0(a,collapse = ",")) %>% 
ungroup() %>% 
mutate(c = str_replace(c,pattern = paste0(",",a),replacement = "")) %>% 
mutate(c = str_replace(c,pattern = paste0(a,","),replacement = "")) %>% 
mutate(c = ifelse(c==a,NA,c))

答案 4 :(得分:0)

tidyverse解决方案的另一个版本。 separate函数非常有用,可以将现有列分隔为新列。通过这样做,我们可以创建Group列以确保所有操作都在每个组中。 map2map函数非常适合进行矢量化操作。 dat2是最终输出。

library(tidyverse)

dat2 <- dat %>%
  separate(ColumnA, into = c("Group", "Number"), remove = FALSE, convert = TRUE, sep = 1) %>%
  group_by(Group) %>%
  mutate(List = list(ColumnA)) %>%
  mutate(List = map2(List, ColumnA, ~.x[!(.x %in% .y)])) %>%
  mutate(ColumnC = map_chr(List, ~str_c(.x, collapse = ","))) %>%
  ungroup() %>%
  select(starts_with("Column"))
dat2
# # A tibble: 9 x 3
#   ColumnA ColumnB ColumnC 
#   <chr>     <int> <chr>   
# 1 A1           10 A2      
# 2 A2           10 A1      
# 3 B1            3 B2,B3,B4
# 4 B2            3 B1,B3,B4
# 5 B3            3 B1,B2,B4
# 6 B4            3 B1,B2,B3
# 7 C1            6 C2,C3   
# 8 C2            6 C1,C3   
# 9 C3            6 C1,C2 

数据

dat <- read.table(text = "ColumnA   ColumnB
    A1         10 
                  A2         10 
                  B1         3
                  B2         3
                  B3         3
                  B4         3
                  C1         6
                  C2         6
                  C3         6",
                  stringsAsFactors = FALSE, header = TRUE)