Question

问题

1

艺术家对星系的印象是新星的责任。来自VYU大学的团队专注于一类化合物。看到年轻人喜欢这场足球比赛。

2

科学家们已经取得了突破，并通过揭示一个强大的人物来解决了几十年前的谜团。由于培养而不是自然，心脏病发作更多。 SA足球运动员Senzo Meyiwa为了挽救女友而被枪杀

预期输出

1艺术家对星系的印象是新星的责任。

1来自VYU大学的团队专注于一类化合物。

1年轻人被视为享受足球比赛。

2科学家通过揭示一个强大的人物，取得了突破并解决了几十年前的谜团。

2由于培养而不是自然，心脏病发作更多。

2 SA足球运动员Senzo Meyiwa为了挽救女友而被枪杀

数据采用csv格式，它有大约1000个数据点，数字在列（1）中，句子在列（2）中。我需要拆分字符串并保留该特定句子的行号。需要你的帮助来构建r代码

注意：数字和句子是两个不同的列

我已尝试将此代码用于字符串拆分，但我需要行索引的代码

x$qwerty <- as.character(x$qwerty)

sa<-list(strsplit(x$qwerty,".",fixed=TRUE))[[1]]

s<-unlist(sa)

write.csv(s,"C:\\Users\\Suhas\\Desktop\\out23.csv")

Answer 1

R中矢量化的一个不便之处在于它们从矢量“内部”起作用。也就是说，它们对元素本身进行操作，而不是向量上下文中的元素。因此，用户失去了跟踪索引的先天能力，即正在操作的元素位于原始对象中的位置。

解决方法是单独生成索引。这可以通过seq_along轻松实现，1:length(qwerty)是paste的优化版本。然后，您可以只将paste索引和结果放在一起。在您的情况下，您显然希望在unlist 之前{{1}}进行。

Answer 2

如果您的数据集如上所示，可能会有所帮助。您可以从文件中读取readLines("file.txt")

lines <- readLines(n=7)
1

An artist impression of a star system is responsible for a nova. The team from university of VYU focus on a class of compounds. The young people was seen enjoying the football match.

2

Scientists have made a breakthrough and solved a decades-old mystery by revealing how a powerful. Heart attacks more due to nurture than nature. SA footballer Senzo Meyiwa shot dead to save girlfriend



lines1 <- lines[lines!='']
indx <- grep("^\\d", lines1)
lines2 <- unlist(strsplit(lines1, '(?<=\\.)(\\b| )', perl=TRUE))
indx <- grepl("^\\d+$", lines2)
res <- unlist(lapply(split(lines2,cumsum(indx)), 
     function(x) paste(x[1], x[-1])), use.names=FALSE)

 res
 #[1] "1 An artist impression of a star system is responsible for a nova."                                 
 #[2] "1 The team from university of VYU focus on a class of compounds."                                   
 #[3] "1 The young people was seen enjoying the football match."                                           
 #[4] "2 Scientists have made a breakthrough and solved a decades-old mystery by revealing how a powerful."
 #[5] "2 Heart attacks more due to nurture than nature."                                                   
 #[6] "2 SA footballer Senzo Meyiwa shot dead to save girlfriend"

如果您想将其作为2 column data.frame

dat <- data.frame(id=rep(lines2[indx],diff(c(which(indx),
          length(indx)+1))-1), Col1=lines2[!indx], stringsAsFactors=FALSE)

head(dat,2)
#  id                                                            Col1
#1  1 An artist impression of a star system is responsible for a nova.
#2  1   The team from university of VYU focus on a class of compounds.

如何创建列并替换值

2 个答案: