R在空行上分裂文本

时间:2016-08-15 16:01:12

标签: r

我有一个非常长的文件,如下所示:

"Ach! Hans, Run!" 
2RRGG
Enchantment
At the beginning of your upkeep, you may say "Ach! Hans, run! It's the . . ." and name a creature card. If you do, search your library for the named card, put it into play, then shuffle your library. That creature has haste. Remove it from the game at end of turn.
UNH-R

A Display of My Dark Power
Scheme
When you set this scheme in motion, until your next turn, whenever a player taps a land for mana, that player adds one mana to his or her mana pool of any type that land produced.
ARC-C

AErathi Berserker
2RRR
Creature -- Human Berserker
2/4
Rampage 3 (Whenever this creature becomes blocked, it gets +3/+3 until end of turn for each creature blocking it beyond the first.)
LE-U

AEther Adept
1UU
Creature -- Human Wizard
2/2
When AEther Adept enters the battlefield, return target creature to its owner's hand.
M11-C, M12-C, DDM-C

...

我想将此文件加载到data.frame或vector" oracle"中,由每个空行(实际上是空格和换行符)拆分,以便

oracle[1] 

给出类似

的输出
"Ach! Hans, Run!" 2RRGG Enchantment At the beginning of your upkeep, you may say "Ach! Hans, run! It's the . . ." and name a creature card. If you do, search your library for the named card, put it into play, then shuffle your library. That creature has haste. Remove it from the game at end of turn. UNH-R

我尝试过像

这样的代码
oracle <- read.table(file = "All Sets.txt", quote = "", sep="\n")

以及scan(),但

oracle[1]

给出非常长的,不期望的输出。

谢谢!

3 个答案:

答案 0 :(得分:3)

根据您编辑的问题尝试此操作:

oracle <- readLines("BenYoung2.txt")
nvec <- length(oracle)
breaks <- which(! nzchar(oracle))
nbreaks <- length(breaks)
if (breaks[nbreaks] < nvec) {
  breaks <- c(breaks, nvec + 1L)
  nbreaks <- nbreaks + 1L
}
if (nbreaks > 0L) {
  oracle <- mapply(function(a,b) paste(oracle[a:b], collapse = " "),
                   c(1L, 1L + breaks[-nbreaks]),
                   breaks - 1L)
}


oracle[1]
# [1] "\"Ach! Hans, Run!\"  2RRGG Enchantment At the beginning of your upkeep, you may say \"Ach! Hans, run! It's the . . .\" and name a creature card. If you do, search your library for the named card, put it into play, then shuffle your library. That creature has haste. Remove it from the game at end of turn. UNH-R"

编辑:虽然如果你总是将真正的空行作为中断,这可以正常工作,你可以使用这一行代替只使用带空白行的行:

breaks <- which(grepl("^[[:space:]]*$", oracle))

当线条真正为空时,这会得到相同的结果。

答案 1 :(得分:2)

我认为最简单的方法是建立一个新的变量来说明该行属于哪个组,然后将其分组并调用paste。在基地R:

lines <- readLines(textConnection(txt))

i <- cumsum(lines == '')

by(lines, i, paste, collapse='\n')

答案 2 :(得分:0)

最直接的方法是首先拆分换行符(即\n),然后扔掉空行。

text = "line1

line2
line3
"

split1 = unlist(strsplit(text, "\n"))
filter = split1[split1 != ""]
# [1] "line1" "line2" "line3"