假设我有一个带有字符串向量var2的数据框
class MyForm(forms.Form):
field = forms.CharField(label='Boo')
def __init__(self, user, **kwargs):
# We'll assume whatever instantiates this form knows
# to pass the user object in; for CBVs,
# that's an override of `get_form_kwargs()`.
super().__init__(**kwargs)
if user.is_superuser:
self.fields['field'].label = 'Yay!'
最有效的方法是将每n个字符中的var2拆分为新的列,直到每个字符串的末尾,
例如,如果每4个字符,输出将如下所示:
var1 var2
1 abcdefghi
2 abcdefghijklmnop
3 abc
4 abcdefghijklmnopqrst
字符串包?使用“ str_split_fixed”
或使用正则表达式:
var1 var2 new_var1 new_var2 new_var3 new_var4 new_var5
1 abcdefghi abcd efgh i
2 abcdefghijklmnop abcd efgh ijkl mnop
3 abc abc
4 abcdefghijklmnopqrst abcd efgh ijkl mnop qrst
根据var2的长度来创建转到new_var_n的新列的能力,例如,可以为10000个字符。
答案 0 :(得分:4)
这是data.table
和我从this answer提取并经过稍微修改的辅助函数fixed_split
的一个选项(它使用tstrsplit
而不是strsplit
)
library(data.table)
fixed_split <- function(text, n) {
data.table::tstrsplit(text, paste0("(?<=.{",n,"})"), perl=TRUE)
}
定义n
(字符数)和new_vars
(首先添加的列数)
n <- 4
new_vars <- ceiling(max(nchar(df$var2)) / n)
setDT(df)[, paste0("new_var", seq_len(new_vars)) := fixed_split(var2, n = n)][]
# var1 var2 new_var1 new_var2 new_var3 new_var4 new_var5
#1: 1 abcdefghi abcd efgh i <NA> <NA>
#2: 2 abcdefghijklmnop abcd efgh ijkl mnop <NA>
#3: 3 abc abc <NA> <NA> <NA> <NA>
#4: 4 abcdefghijklmnopqrst abcd efgh ijkl mnop qrst
答案 1 :(得分:3)
或者,您可以在基本R中尝试read.fwf
。不需要特殊的程序包:
tmp <- read.fwf(
textConnection(dtf$var2),
widths = rep(4, ceiling(max(nchar(dtf$var2) / 4))),
stringsAsFactors = FALSE)
cbind(dtf, tmp)
# var1 var2 V1 V2 V3 V4 V5
# 1 1 abcdefghi abcd efgh i <NA> <NA>
# 2 2 abcdefghijklmnop abcd efgh ijkl mnop <NA>
# 3 3 abc abc <NA> <NA> <NA> <NA>
# 4 4 abcdefghijklmnopqrst abcd efgh ijkl mnop qrst
答案 2 :(得分:2)
这是使用User::applyNotDeleted(User::where(function(Builder $query) use ($email) {
$query->where('email', 'test@test.com')->orWhere('email', 'test2@test2.com');
}))->get();
和strsplit
强制的替代方法
matrix
答案 3 :(得分:0)
对同一变量使用连续的substr
:
library(data.table)
dff <- fread("var1 var2
1 abcdefghi
2 abcdefghijklmnop
3 abc
4 abcdefghijklmnopqrst")
var2 <- dff[["var2"]]
for (j in 1:5) {
set(dff, j = paste0("new_var", j), value = substr(var2, 4*j - 3, 4*j))
}
dff
#> var1 var2 new_var1 new_var2 new_var3 new_var4 new_var5
#> 1: 1 abcdefghi abcd efgh i
#> 2: 2 abcdefghijklmnop abcd efgh ijkl mnop
#> 3: 3 abc abc
#> 4: 4 abcdefghijklmnopqrst abcd efgh ijkl mnop qrst
由reprex package(v0.2.0)于2018-08-05创建。
答案 4 :(得分:0)
您可以使用tidyr::separate
:
library(tidyr)
n <- ((max(nchar(df$var2)) - 1) %/% 4) + 1
df %>% separate(var2, into=paste0("new_var", seq(n)), sep=seq(n-1)*4, remove = FALSE)
# var1 var2 new_var1 new_var2 new_var3 new_var4 new_var5
# 1 1 abcdefghi abcd efgh i
# 2 2 abcdefghijklmnop abcd efgh ijkl mnop
# 3 3 abc abc
# 4 4 abcdefghijklmnopqrst abcd efgh ijkl mnop qrst
我们首先使用整数除法计算将要拥有的组数,然后动态定义新名称并使用sep
参数中的数值在相关位置进行拆分。
数据
df <- read.table(text="var1 var2
1 abcdefghi
2 abcdefghijklmnop
3 abc
4 abcdefghijklmnopqrst",strin=F,h=T)