如何使用不同的分隔符连接每个第n个元素的字符串

时间:2018-03-01 08:53:02

标签: r regex

我想连接每个第10个元素的不同分隔符的单词(字符串),这样每个单词用逗号分隔,直到每10个单词,然后用逗号和换行符分隔。最终目的是将一个单词列表整齐地打印到表格中。

我可以写一个循环,但我希望使用gsub和正则表达式在这些相关问题中提出更优雅的解决方案: herehere涉及在每个第n个字符后插入/替换字符串,但在我的情况下,我的单词具有可变长度(字符)。

编辑:我正在寻找可以应用于任意数量字的矢量的解决方案。

对于可重现的数据,我使用此source

中的代码生成40个随机字的向量
MHmakeRandomString <- function(n, length) {
  randomString <- c(1:n)
  for (i in 1:n) {
    randomString[i] <- paste(sample(c(0:9, letters, LETTERS), length, replace=TRUE),
                             collapse="")}
  return(randomString)
}
set.seed(4)
word_vector <- MHmakeRandomString(n=40, length=5)
word_vector
# [1] "A0ihO" "gIUW4" "Kh6Xp" "sYAXL" "IZvuE" "PtQvw" "zeSEt" "YsCo0" "WfzbU" "5TTIz"
# [11] "oKTOO" "qaaTK" "y4QUd" "C4vNY" "lDplP" "Gjrg8" "UHzUT" "32ZcV" "c7xgl" "5Lr2H"
# [21] "fDgxt" "zFdYO" "hohuK" "vrNU4" "8oRg5" "IYcyl" "pblbO" "SHhq0" "yFjWa" "rzYLr"
# [31] "m2AXf" "QdhtM" "TWpkh" "4499K" "5Bcv8" "0DeqI" "6BdTy" "fJgKX" "tUZeh" "HPso5"

我通常会paste(x, collapse),然后使用gridExtra

打印到表格
word_sep <- paste(word_vector, collapse=", ")
# [1] "z6LHb, 1ubB1, o9TZ2, 8s8bV, sZmcB, blirI, gMfo1, xXkkt, gFMrA, hXdaO, 
# lNP2Q, p9B9G, JXTsJ, qVsWS, ntiT8, d0QRv, uoR1D, L99Bg, THWQo, meuev, 
# IO0Au, 0yWmh, 72d3g, FJRDS, PtbJT, JaXVK, OPo9m, i0678, 6BpXZ, b6hzT, 
# BDQBk, ANC5h, 7QPgM, JJSxf, nnX7Z, rbEfm, XXl4Z, kHMuI, wFLyM, P8rlp"

library(gridExtra)
plot_grid(tableGrob(word_sep))

当前表格输出:在这种情况下,我有一个非常长的单词列表和指定的表格宽度,所以我需要换行符。 Current table Output

我想要的输出看起来像这个黑客版本:

word_sep2 <- paste(c(paste(MHmakeRandomString(n=10, length=5), collapse=", "), ",\n",
               paste(MHmakeRandomString(n=10, length=5), collapse=", "), ",\n",
               paste(MHmakeRandomString(n=10, length=5), collapse=", "), ",\n",
               paste(MHmakeRandomString(n=10, length=5), collapse=", ")), collapse="")
word_sep2
# [1] "0ahiL, 2pA5c, dKWuR, 79sw5, MeL1I, KpB1w, UNLSo, LlDlN, jNOcI, tv8R5,
# \norf60, avKFo, jZFxE, U7RQW, SSmxD, czlMt, 75zEB, 2jLwG, 08dmN, H3sVW,
# \nCZwQt, ggumo, wHUpj, Z7WGR, BHYLE, eWksX, Lbt3D, P1Brf, OpEvk, 1WFVa,
# \nEeFd4, afX7B, nyBzF, vbNLz, U7MU0, H4rx4, AKgv8, Kbzri, KKajp, Yg6EW"

plot_grid(tableGrob(word_sep2))

所需的表格输出: Desired table output

2 个答案:

答案 0 :(得分:3)

您可以使用

gsub("((?:[^,]*,){10}) ", "\\1\n", word_sep)

请参阅online regex demo

<强>详情

  • ((?:[^,]*,){10}) - 第1组(替换模式中称为\1)匹配10次连续出现
    • [^,]* - 除,
    • 以外的任何0 +字符
    • , - 逗号
  • - 空格

请参阅R demo

MHmakeRandomString <- function(n, length) {
   randomString <- c(1:n)
   for (i in 1:n) {
     randomString[i] <- paste(sample(c(0:9, letters, LETTERS), length, replace=TRUE),
                              collapse="")}
   return(randomString)
}
set.seed(4)
word_vector <- MHmakeRandomString(n=40, length=5)
word_sep <- paste(word_vector, collapse=", ")
f <- gsub("((?:[^,]*,){10}) ", "\\1\n", word_sep)
cat(f, collapse="\n")

答案 1 :(得分:1)

我猜你可以用paste

来做到这一点
paste(word_vector, rep(c(", ", ",\n"), c(9,1)), collapse = "", sep = "")
[1] "A0ihO, gIUW4, Kh6Xp, sYAXL, IZvuE, PtQvw, zeSEt, YsCo0, WfzbU, 5TTIz,\noKTOO, qaaTK, y4QUd, C4vNY, lDplP, Gjrg8, UHzUT, 32ZcV, c7xgl, 5Lr2H,\nfDgxt, zFdYO, hohuK, vrNU4, 8oRg5, IYcyl, pblbO, SHhq0, yFjWa, rzYLr,\nm2AXf, QdhtM, TWpkh, 4499K, 5Bcv8, 0DeqI, 6BdTy, fJgKX, tUZeh, HPso5,\n"

这是用cat打印时的样子:

res <- paste(word_vector, rep(c(", ", ",\n"), c(9,1)), collapse = "", sep = "")
cat(res)
# A0ihO, gIUW4, Kh6Xp, sYAXL, IZvuE, PtQvw, zeSEt, YsCo0, WfzbU, 5TTIz,
# oKTOO, qaaTK, y4QUd, C4vNY, lDplP, Gjrg8, UHzUT, 32ZcV, c7xgl, 5Lr2H,
# fDgxt, zFdYO, hohuK, vrNU4, 8oRg5, IYcyl, pblbO, SHhq0, yFjWa, rzYLr,
# m2AXf, QdhtM, TWpkh, 4499K, 5Bcv8, 0DeqI, 6BdTy, fJgKX, tUZeh, HPso5,