我想使用R匹配一些特定的字符串,并且只保留该匹配上方的行,这是一些示例数据。包含数百个类似案例的文件:
first_case<- data.frame(line =
c("#John Wayne: Su, 11.01.2013 08:24:42#
He is present / I guess, Does great job
--------------------------------------------------
#Michal Thorn: Fr, 12.09.2015 17:23:01#
Works quite frequently with people
--------------------------------------------------
#Sandra Nunes: Mo, 20.05.2011 09:00:29#
She has some new clients"))
second_case<- data.frame(line =
c("#Boris Jonson: Mo, 30.09.2017 09:20:42#
He is present
--------------------------------------------------
#Jacky Fine: Th, 02.02.2013 18:23:01#
Does great job
--------------------------------------------------
#Michael Bissping: Mo, 25.03.2012 10:00:29#
Hard to count on"))
third_case<- data.frame(line =
c("#Isabelle Warren: Sa, 02.12.2013 02:24:42#
Not around / anymore
--------------------------------------------------
#Tobias Maker: Mo, 02.03.2013 10:23:01#
Works quite frequently with people
--------------------------------------------------
#Toe Michael : Mo, 20.05.2011 09:00:29#
She has some new clients & Does great job"))
all_cases <- rbind(first_case,second_case,third_case)
在这里,我尝试过滤那些位于上方1行的行:
Does great job
通过查看Does great job
是否以新行结尾并采用上面的第一行:
dplyr::filter(all_cases, grepl("((.*\n){1})Does great job",line))
预期结果:
first_case<- data.frame(line =
c("#John Wayne: Su, 11.01.2013 08:24:42#"))
second_case<- data.frame(line =
c("#Jacky Fine: Th, 02.02.2013 18:23:01#"))
third_case<- data.frame(line =
c("#Toe Michael : Mo, 20.05.2011 09:00:29#"))
expected_result <- rbind(first_case,second_case,third_case)
1 #John Wayne: Su, 11.01.2013 08:24:42#
2 #Jacky Fine: Th, 02.02.2013 18:23:01#
3 #Toe Michael : Mo, 20.05.2011 09:00:29#
不幸的是,这将返回零行。感谢任何见解!
答案 0 :(得分:3)
这是一种使用strsplit
的基本R方法。我们可以形成行的列表/向量,然后直接使用grep
查找与Does great job
匹配的行的索引。然后,只需返回紧接其前的行即可。
line <- "#Boris Jonson: Mo, 30.09.2017 09:20:42#
He is present
--------------------------------------------------
#Jacky Fine: Th, 02.02.2013 18:23:01#
Does great job
--------------------------------------------------
#Michael Bissping: Mo, 25.03.2012 10:00:29#
Hard to count on"
terms <- unlist(strsplit(line, "\n"))
terms[grep("Does great job", terms) - 1]
[1] " #Jacky Fine: Th, 02.02.2013 18:23:01#"
我的答案没有涵盖很多边缘情况,第一个是匹配逻辑。如果搜索词匹配多次或根本不匹配怎么办?另外,grep
中使用的模式应该有多具体?
答案 1 :(得分:3)
您可以尝试:
library(stringr)
library(dplyr)
all_cases %>% transmute(x=str_extract(line,".*(?=\n.*?Does great job)"))
# x
#1 #John Wayne: Su, 11.01.2013 08:24:42#
#2 #Jacky Fine: Th, 02.02.2013 18:23:01#
#3 #Toe Michael : Mo, 20.05.2011 09:00:29#
改进的解决方案,以便独立地利用每人三个人的每一行:
all_cases %>% separate(line,c("a","b","c"),sep="-{3,}") %>%
gather(k,v,a,b,c) %>%
transmute(x=str_extract(v,".*(?=\n.*?Does great job)")) %>%
filter(!is.na(x))
答案 2 :(得分:1)