我有一个csv文件,其中包含许多条目,如下所示(提供了一个示例):
Customer 1 car purchase
08/22/2016 08:10:00 Agent 1 (Agt1)
Customer 1 car purchase and service purchase.\n
Service indicates tires needed\n
possible oil change as well.\n
Tire quote provided.\n
*Name: Service advisor \n
*Phone: 123-456-7890 \n
Customer 1 called back to schedule appt.\n
我正在尝试编写R代码,输出如下(对于每个条目)
Customer 1 car purchase and service purchase.
Service indicates tires needed and possible oil change as well.
Tire quote provided.
Customer 1 called back to schedule appt.
我希望删除前两行和任何带有* Name和* Phone out的行。
我尝试的一件事是使用将每个条目分配给临时变量然后
stri_split_lines (temp)
x=stri_split_lines(temp)
y=x[[1]][3:length(x[[1]])]
这提取出前两行。但是我不知道如何使用* Name和* Phone提取行,因为它们可能位于文本的任何位置。我也相信可能有更好的方法:) 有关如何实现这一目标的任何想法? 这些行最后都是\ n,因此我希望使用正则表达式进行拆分,但是无法使其工作。 谢谢!
答案 0 :(得分:0)
您可以使用readLines
或strsplit
来读取每个条目(必要时使用lapply
),然后grep
进行索引:
x <- readLines(textConnection('Customer 1 car purchase
08/22/2016 08:10:00 Agent 1 (Agt1)
Customer 1 car purchase and service purchase.
Service indicates tires needed
possible oil change as well.
Tire quote provided.
*Name: Service advisor
*Phone: 123-456-7890
Customer 1 called back to schedule appt.'))
x <- trimws(x) # clean up extra white space
x[c(-1, -2, -grep('\\*Name|\\*Phone', x))]
## [1] "Customer 1 car purchase and service purchase."
## [2] "Service indicates tires needed"
## [3] "possible oil change as well."
## [4] "Tire quote provided."
## [5] "Customer 1 called back to schedule appt."
如果你愿意的话, paste
回到一个区块。