使用R为每一行应用多个参数

时间:2015-09-07 04:02:23

标签: r

我需要帮助从给定关键字的段落中获取句子并删除不必要的信息。

以下是我拥有的文件示例。

Heading     Years    Text
Head1       2015     <rrrt> I am a boy and I <rrr2> like a girl <t44> from my class. She is pretty. /rr /r    /r I am cute.
Head2       2015     She is cute. She is beautiful.
Head3       2014         Hi, I am Jane. I play guitar. May is my friend. 

我想用给定的关键字(am)提取句子。包含“我”的句子。另外,对于每个句子,我都希望得到标题和年份。并且摆脱不必要的信息,例如&lt; ***&gt; ,/ r。

以下是我想用R:

实现的输出
Heading     Years    Text
Head1       2015     I am a boy and I like a girl from my class.
Head1       2015     I am a cute.
Head3       2014     Hi, I am Jane.

提前谢谢。

更新:

Heading     Text
Apple       "Jane is pretty." Good afternoon
Orange      Tom said she is pretty. Also she is kind hearted. Tom listened in class.
Pear        Added Lim, He is a great guy...and clever. Mary turned her head away.

我想得到的输出是:

Heading     Text
Apple       "Jane is pretty."
Orange      Tom said she is pretty. Also she is kind hearted.
Pear        Added Lim, He is a great guy...and clever.

我想捕捉人们说的话。谢谢。

2 个答案:

答案 0 :(得分:2)

我们可以拆分&#39;文本&#39;每个句子末尾的列到listgrep以提取am的句子,使用{{1}将list转换为data.frame },然后使用原始数据集stack

merge

注意:如果&#39;文字&#39;列为df2 <- stack(setNames(lapply(strsplit(df1$Text, '(?<=[.])(?=\\s*)\\s+', perl=TRUE), grep, pattern='\\bam\\b', value=TRUE), df1$Heading))[2:1] colnames(df2) <- colnames(df1)[c(1,3)] res <- merge(df1[1:2], df2) res # Heading Years Text #1 Head1 2015 I am a boy and I like a girl from my class. #2 Head1 2015 I am cute. #3 Head3 2014 Hi, I am Jane. ,在factor中使用as.character(df1$Text)

更新

对于新数据集,我们可以使用strsplit删除<>以及/r之间的字符,然后像以前一样继续。

gsub

数据

v1 <- gsub('\\<[^>]+\\>\\s*|/r+\\s*', '', df1N$Text, perl=TRUE)
                              Hi, I am Jane. Head3
df2N <- stack(setNames(lapply(strsplit(v1, '(?<=[.])(?=\\s*)\\s+', 
    perl=TRUE), grep, pattern='\\bam\\b', value=TRUE), df1N$Heading))[2:1]
colnames(df2N) <- colnames(df1N)[c(1,3)]
res1 <- merge(df1N[1:2], df2N)
res1
#  Heading Years                                        Text
#1   Head1  2015 I am a boy and I like a girl from my class.
#2   Head1  2015                                  I am cute.
#3   Head3  2014                              Hi, I am Jane.

答案 1 :(得分:2)

head <- c("Head1", "Head2", "Head3")
years <- c(2015, 2015, 2014)
Text <- c("I am a boy and I like a girl from my class. She is pretty. I am cute.","She is cute. She is beautiful.", "Hi, I am Jane. I play guitar. May is my friend.")
#As strsplit doesn't work on factors, converting text to characters
df$Text <- as.character(df$Text)
df <- data.frame(head, years, Text)


words <- unlist(strsplit(df$Text, "[.]"))
test <- words[grep("am", words)]
i <- 0
a <- array()
for(i in 1:length(test)) {
a[i] <- grep(test[i], df$Text)
}
newdf <- data.frame(df[a, 1:2], test)

newdf
#head years                                       test
#1   Head1  2015 I am a boy and I like a girl from my class
#1.1 Head1  2015                                  I am cute
#3   Head3  2014                              Hi, I am Jane