通过"组合"重塑数据框架数据框中的列

时间:2014-03-18 04:32:34

标签: r

我有一个带有url字符串的数据框,并且使用R中的stringr包来生成新的列,其中包含一个关于字符串是否包含元素的布尔值。

library(stringr)

url = data.frame(u=c("http://www.subaru.com/vehicles/impreza/index.html",
        "http://www.subaru.com/index.html?s_kwcid=subaru&k_clickid=214495e6-dbe0-6668-9222-00003d7cd876&prid=87&k_affcode=76602",
        "http://www.subaru.com/customer-support.html",
        "http://www.subaru.com/",
        "http://www.subaru.com/vehicles/forester/index.html"))
url

cs = c("customer-support")
f = c("forester")
one_match <- str_c(cs, collapse = "|")
two_match <- str_c(f, collapse = "|")

main <- function(df) {
  df$customer_support <- as.numeric(str_detect(url$u, one_match))
  df
}
d1 = main(url)

main <- function(df) {
  df$forester <- as.numeric(str_detect(url$u, two_match))
  df
}
d2 = main(url)

mydt = join(d1, d2)
mydt

以上代码产生以下结果。

  

mydt

    u
1                                                                      http://www.subaru.com/vehicles/impreza/index.html
2 http://www.subaru.com/index.html?s_kwcid=subaru&k_clickid=214495e6-dbe0-6668-9222-00003d7cd876&prid=87&k_affcode=76602
3                                                                            http://www.subaru.com/customer-support.html
4                                                                                                 http://www.subaru.com/
5                                                                     http://www.subaru.com/vehicles/forester/index.html
  customer_support forester
1                0        0
2                0        0
3                1        0
4                0        0
5                0        1

我想要做的是重塑数据框,以便重组第2列和第3列,以便它们被组合而不再是布尔值

应该看起来像:

page
0
0
customer_support
0
forester

我尝试了许多不同的东西,包括重塑,变换,dcast等的变化,似乎没有任何东西可以完成任务。任何人都可以帮助我获得所需的输出。

1 个答案:

答案 0 :(得分:2)

您不需要编写如此复杂的功能。您只需使用greplifelse功能,如下所示

urldata = data.frame(u = c("http://www.subaru.com/vehicles/impreza/index.html", "http://www.subaru.com/index.html?s_kwcid=subaru&k_clickid=214495e6-dbe0-6668-9222-00003d7cd876&prid=87&k_affcode=76602", 
    "http://www.subaru.com/customer-support.html", "http://www.subaru.com/", "http://www.subaru.com/vehicles/forester/index.html"))

cs = c("customer-support")
f = c("forester")

urldata
##                                                                                                                        u
## 1                                                                      http://www.subaru.com/vehicles/impreza/index.html
## 2 http://www.subaru.com/index.html?s_kwcid=subaru&k_clickid=214495e6-dbe0-6668-9222-00003d7cd876&prid=87&k_affcode=76602
## 3                                                                            http://www.subaru.com/customer-support.html
## 4                                                                                                 http://www.subaru.com/
## 5                                                                     http://www.subaru.com/vehicles/forester/index.html

urldata$page <- ifelse(grepl(cs, urldata$u), cs, ifelse(grepl(f, urldata$u), f, 0))
urldata
##                                                                                                                        u
## 1                                                                      http://www.subaru.com/vehicles/impreza/index.html
## 2 http://www.subaru.com/index.html?s_kwcid=subaru&k_clickid=214495e6-dbe0-6668-9222-00003d7cd876&prid=87&k_affcode=76602
## 3                                                                            http://www.subaru.com/customer-support.html
## 4                                                                                                 http://www.subaru.com/
## 5                                                                     http://www.subaru.com/vehicles/forester/index.html
##               page
## 1                0
## 2                0
## 3 customer-support
## 4                0
## 5         forester