拆分字符串并在每个数字前添加拆分

时间:2016-09-02 09:28:34

标签: r

所以我有包含参考书目的载体

bibliography <- c("1. Cohen, A. C. (1955). Restriction and selection insamples from bivariate normal distributions. Journal
of the American Statistical Association, 50, 884–893.  2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.
Biometrika, 75, 11–20.  3.Arismendi, J. C. (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41–75")

我想在表示编号的每个数字之前拆分字符串并添加新行/符号,即1.和2.和3.所以,如果我要说50个参考书目,我想自动分割矢量中的所有字符串,并在每个代表编号的数字之前添加分隔符。

到目前为止,我已经尝试了这个(这不是最好的选择,因为第三个书目被遗漏了):

   bibliography <- unlist(strsplit(bibliography, "  "))
    bibliography <- bibliography[-length(bibliography)] <- paste0(bibliography[-length(bibliography)], ' \\\\ ')

输出就是这个(这是我想要的输出):

   [1] "1. Cohen, A. C. (1955). Restriction and selection in samples from bivariate normal distributions. Journal\nof the American Statistical Association, 50, 884–893. \\\\ "
    [2] "2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.\nBiometrika, 75, 11–20. \\\\ "

但这很耗时,因为我必须在每个数字(即1.和2.)之前手动添加双空格才能使此代码生效。

我也看过这里

Add new line before every number in a string

Inserting Newline character before every number occurring in a string?

2 个答案:

答案 0 :(得分:2)

这可以让你得到你想要的地方:

library(stringr)
library(dplyr)

# The first line adds the "~" character at the right break point
str_split(gsub("([1-9]\\.[]*[A-Z])","~\\1",bibliography), "~") %>%
unlist()  %>%
str_trim(side = c("both")) # Trimming potential spaces at the strings sides

答案 1 :(得分:1)

我尝试了基于正则表达式的方法

bibliography <- c("1. Cohen, A. C. (1955). Restriction and selection insamples from bivariate normal distributions. Journal of the American Statistical Association, 50, 884–893.  2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.
                  Biometrika, 75, 11–20.  3.Arismendi, J. C. (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41–75")

out <- gsub("([^0-9][0-9]{1}\\.|^[0-9]{1}\\.)", "\t\\1",bibliography)
out <- unlist(strsplit(out, "\t"))
out <- gsub("^\\s+|\\s+$", "", out)
out <- out[-1]

你可能会试一试。