我的数据框有131万行。
###GSON
##---------------Begin: proguard configuration for Gson ----------
# Gson uses generic type information stored in a class file when working with fields. Proguard
# removes such information by default, so configure it to keep all of it.
-keepattributes Signature
# For using GSON @Expose annotation
-keepattributes *Annotation*
# Gson specific classes
-dontwarn sun.misc.**
#-keep class com.google.gson.stream.** { *; }
# Prevent proguard from stripping interface information from TypeAdapterFactory,
# JsonSerializer, JsonDeserializer instances (so they can be used in @JsonAdapter)
-keep class * implements com.google.gson.TypeAdapterFactory
-keep class * implements com.google.gson.JsonSerializer
-keep class * implements com.google.gson.JsonDeserializer
##---------------End: proguard configuration for Gson ----------
-keep class com.myapp.data.remote.request.** { *; } # <--- Add your models package here
有2列。 data.frame
是一个数字。 Column 1
是值列表
像这样的东西
Column 2
我需要这样说:
Col1 | Col2
1 | a, b, c
2 | d, e, f
3 | a, e, f
它必须很快,因为有1.3亿行。
答案 0 :(得分:0)
如果Col2是字符串
,则使用unnest
library(tidyr)
library(dplyr)
dt %>%
mutate(Col2 = strsplit(Col2,",")) %>%
unnest(Col2)
# A tibble: 9 x 2
Col1 Col2
<dbl> <chr>
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
7 3 a
8 3 e
9 3 f
数据输入:
dt=data_frame(Col1 = c(1,2,3),Col2 = c('a, b, c','d, e, f','a, e, f'))
当你提到 时,它是值列表 ,所以你只需要
dt %>% unnest(Col2)
# A tibble: 9 x 2
Col1 Col2
<dbl> <chr>
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
7 3 a
8 3 e
9 3 f
数据输入
dt
# A tibble: 3 x 2
Col1 Col2
<dbl> <list>
1 1 <chr [3]>
2 2 <chr [3]>
3 3 <chr [3]>