Question

我的数据框有131万行。

###GSON ##---------------Begin: proguard configuration for Gson ---------- # Gson uses generic type information stored in a class file when working with fields. Proguard # removes such information by default, so configure it to keep all of it. -keepattributes Signature # For using GSON @Expose annotation -keepattributes *Annotation* # Gson specific classes -dontwarn sun.misc.** #-keep class com.google.gson.stream.** { *; } # Prevent proguard from stripping interface information from TypeAdapterFactory, # JsonSerializer, JsonDeserializer instances (so they can be used in @JsonAdapter) -keep class * implements com.google.gson.TypeAdapterFactory -keep class * implements com.google.gson.JsonSerializer -keep class * implements com.google.gson.JsonDeserializer ##---------------End: proguard configuration for Gson ---------- -keep class com.myapp.data.remote.request.** { *; } # <--- Add your models package here有2列。 data.frame是一个数字。 Column 1是值列表

像这样的东西

Column 2

我需要这样说：

Col1 | Col2  
1    | a, b, c  
2    | d, e, f  
3    | a, e, f

它必须很快，因为有1.3亿行。

Answer 1

如果Col2是字符串

，则使用unnest

library(tidyr)
library(dplyr)
dt %>% 
    mutate(Col2 = strsplit(Col2,",")) %>% 
    unnest(Col2)
# A tibble: 9 x 2
   Col1  Col2
  <dbl> <chr>
1     1     a
2     1     b
3     1     c
4     2     d
5     2     e
6     2     f
7     3     a
8     3     e
9     3     f

数据输入：

dt=data_frame(Col1 = c(1,2,3),Col2 = c('a, b, c','d, e, f','a, e, f'))

当你提到 时，它是值列表 ，所以你只需要

dt %>% unnest(Col2)
# A tibble: 9 x 2
   Col1  Col2
  <dbl> <chr>
1     1     a
2     1     b
3     1     c
4     2     d
5     2     e
6     2     f
7     3     a
8     3     e
9     3     f

数据输入

dt
# A tibble: 3 x 2
   Col1      Col2
  <dbl>    <list>
1     1 <chr [3]>
2     2 <chr [3]>
3     3 <chr [3]>

包含列到行的值的列

1 个答案: