我有一个包含文本数据的文本文件(我的意思是,不是表格或数字,而是实际的英文句子)。每两句话都有一个三重星号(***)来区分前一句和下一句。我需要每个组作为列表的元素。我已经尝试了readLines
,readchar
和strsplit
,但无法使其发挥作用。
这是一个例子:
Hello Everyone.
My name is James.
***
Hello James!
My name is Amy.
Nice to meet you.
***
Hi Amy!
My name is Sue.
所以我需要一个包含三个元素的列表,每个元素都是一个包含该组的向量。请注意,组中的句子中有换行符。
答案 0 :(得分:2)
假设data.txt
保留您的文字条目。这是你想要的(在基础R中):
data <- readLines("data.txt");
#Optionally remove empty lines
data <- data[data != ""];
# Split based on triple asterisk entries
lst <- split(data, cumsum(data == "***"));
# Remove triple asterisk entries
lst <- lapply(lst, function(x) x[x != "***"])
print(lst);
$`0`
[1] "Hello Everyone." "My name is James."
$`1`
[1] "Hello James!" "My name is Amy."
$`2`
[1] "Hi Amy!" "My name is Sue."
答案 1 :(得分:1)
试试这个。如果您的文字位于文件中,请将textConnection(Lines)
替换为"myfile.txt"
。
Lines <- "Hello Everyone.
My name is James.
***
Hello James!
My name is Amy.
Nice to meet you.
***
Hi Amy!
My name is Sue."
# L <- paste(readLines("myfile.txt"), collapse = "\n")
L <- paste(readLines(textConnection(Lines)), collapse = "\n")
v <- strsplit(L, "\n\n***\n\n", fixed = TRUE)[[1]]
给出以下长度为3的字符向量:
> v
[1] "Hello Everyone.\nMy name is James."
[2] "Hello James!\nMy name is Amy.\nNice to meet you."
[3] "Hi Amy!\nMy name is Sue."
如果你想要一个单独行的字符向量列表而不是字符向量,那么再次应用strsplit
:
strsplit(v, "\n")
或者如果您只想强迫v
列表:
as.list(v)