使用R

时间:2018-02-14 23:27:13

标签: r string url substring

我有一个带有一列网址的数据框,我希望在第一个问号后删除所有网址。有些网址没有问号,我希望这些网址保持不变。简而言之,我想剥离所有跟踪。这是一个示例网址。

  

https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/?utm_source=exacttarget&utm_medium=newsletter&utm_term=dummydotcom-dummycomnewsletter&utm_content=na-readblog-blogpost&utm_campaign=dummy

这是我正在寻找的结果。

  

https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/

2 个答案:

答案 0 :(得分:3)

假设您的数据框名为df,并且其中有一个名为url的列:

df$url <- sub('\\?.*', '', df$url)

答案 1 :(得分:2)

使用strsplit

url <- "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/?utm_source=exacttarget&utm_medium=newsletter&utm_term=dummydotcom-dummycomnewsletter&utm_content=na-readblog-blogpost&utm_campaign=dummy"

result <- strsplit(url, "\\?")[[1]][1]

输出:

> result
[1] "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/"

以下是在矢量而不是单个字符串上使用它的示例:

strings <- c("here?string", "another?string", "stringnoquestion", "one?more")

> sapply(strsplit(strings, "\\?"), function(x) x[1])
[1] "here"             "another"          "stringnoquestion" "one"

strsplit返回一个列表,因为它被编写为适用于矢量和奇异元素。因此,在第一个示例中,[[1]]正在访问列表的第一个元素,然后[1]正在访问其中的第一个元素,即?之前的网址。

这是第一个分为步骤的例子:

# Returns a list of length one
> strsplit(url, "\\?")
[[1]]
[1] "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/"                                                                    
[2] "utm_source=exacttarget&utm_medium=newsletter&utm_term=dummydotcom-dummycomnewsletter&utm_content=na-readblog-blogpost&utm_campaign=dummy"

# Each element of the list is a vector
> strsplit(url, "\\?")[[1]]
[1] "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/"                                                                    
[2] "utm_source=exacttarget&utm_medium=newsletter&utm_term=dummydotcom-dummycomnewsletter&utm_content=na-readblog-blogpost&utm_campaign=dummy"

# The first element of that vector
> strsplit(url, "\\?")[[1]][1]
[1] "https://www.dummy.com/2017/11/29/four-questions-we-have-about-stuff/"