我有一个长度为 410 的字符串(即字符数)。我想按以下方式将其拆分为子字符串:
this is a test string
,不能有像 this is a test st
这样的子串,它应该像 this is a test
可重现的数据:
ex_str = "This has an advantage of avoiding name conflicts i.e. what if you have a function named `DataFrame()` in your global environment. Using `pandas.DataFrame()` ensures that right function is called. To build on it further, python also provides an option of importing a function with your name of choice i.e. `import pandas as pd`. Now to call out `pandas` internal functions you can use `pd` like `pd.DataFrame()`"
nchar(ex_str)
#> [1] 410
由 reprex package (v0.3.0) 于 2021 年 1 月 29 日创建
预期输出:
s1 = "This has an advantage of avoiding name conflicts i.e. what if you have a function named `DataFrame()` in your global environment. Using `pandas.DataFrame()` ensures that right function is called."
s2 = "To build on it further, python also provides an option of importing a function with your name of choice i.e. `import pandas as pd`. Now to call out `pandas` internal functions you can use `pd` like `pd.DataFrame()`"
nchar(s1) #nchar() should be less than 260
#> [1] 195
nchar(s2)
#> [1] 214
由 reprex package (v0.3.0) 于 2021 年 1 月 29 日创建
这个问题对我来说似乎太难开始了,任何帮助将不胜感激。
答案 0 :(得分:1)
spl <- strsplit(ex_str, " ")[[1]]
out <- c()
while (length(spl) > 0) {
ind <- which((cumsum(nchar(spl)) + seq_along(spl)) > 260)[1]
if (is.na(ind)) ind <- length(spl) + 1L
if (ind == 1L) {
warning("first word is too long, adding anyway", call. = FALSE)
out <- c(out, spl[1])
spl <- spl[-1]
} else {
out <- c(out, paste(spl[seq_len(ind-1)], collapse = " "))
spl <- spl[-seq_len(ind-1)]
}
}
nchar(out)
# [1] 253 156
out
# [1] "This has an advantage of avoiding name conflicts i.e. what if you have a function named `DataFrame()` in your global environment. Using `pandas.DataFrame()` ensures that right function is called. To build on it further, python also provides an option of"
# [2] "importing a function with your name of choice i.e. `import pandas as pd`. Now to call out `pandas` internal functions you can use `pd` like `pd.DataFrame()`"