我的数据框中有一列是字符列表。这是专栏categories
str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 3 variables:
$ categories:List of 4
..$ : chr "Tex-Mex" "Mexican" "Fast Food" "Restaurants"
..$ : chr "Hawaiian" "Restaurants" "Barbeque"
..$ : chr "Restaurants" "Italian" "Seafood"
..$ : chr "Restaurants" "Mexican" "American (Traditional)"
$ name : chr "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
$ type : chr "business" "business" "business" "business"
以下是前四行中的dput
:
structure(list(categories = list(c("Tex-Mex", "Mexican", "Fast Food",
"Restaurants"), c("Hawaiian", "Restaurants", "Barbeque"), c("Restaurants",
"Italian", "Seafood"), c("Restaurants", "Mexican", "American (Traditional)"
)), name = c("Taco Bell", "Ohana Hawaiian BBQ", "Carrabba's Italian Grill",
"Don Tequila"), type = c("business", "business", "business",
"business")), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"), .Names = c("categories", "name", "type"))
我想从该列表中提取一些值,以便这些值是唯一保留在该向量中的值。
例如,我想过滤掉所有不是“墨西哥”而不是“餐馆”的值。所以剩下的唯一价值就是“墨西哥”和“餐馆”。为此,我尝试了这个解决方案:
df_test <- df %>% unnest(categories) %>%
filter(str_detect(categories, "Mexican")
(str_detect(categories, "Restaurants")) %>%
nest(categories)
但结果如下:
str(df_test)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 3 variables:
$ name: chr "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
$ type: chr "business" "business" "business" "business"
$ data:List of 4
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
.. ..$ categories: chr "Mexican" "Restaurants"
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 1 variable:
.. ..$ categories: chr "Restaurants"
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 1 variable:
.. ..$ categories: chr "Restaurants"
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
.. ..$ categories: chr "Restaurants" "Mexican"
问题是,之后该列不是像type
列那样的字符向量。
是否有可能过滤掉这些字符,以便在此过程之后列是普通字符向量,如name
和type
列?
我不想替换通过此过程删除的值/行。因此,如果某一行中没有“墨西哥”或“餐馆”,则该行将被删除。
二手包:
dplyr
stringr
答案 0 :(得分:1)
使用lapply
对列表进行子集化
lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])
[[1]]
[1] "Mexican" "Restaurants"
[[2]]
[1] "Restaurants"
[[3]]
[1] "Restaurants"
[[4]]
[1] "Restaurants" "Mexican"
添加没有匹配条件的行来过滤行
df1 <- rbind(df1, c(list("Nothing to match"), "drop me", "business"))
df1$categories <- lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])
df1[sapply(df1$categories, length) > 0, ]
将列表折叠为字符串
df1$categories <- sapply(df1$categories, function(x) paste(sort(x[x %in% c("Mexican", "Restaurants")]), collapse=" "))
df1[nchar(df1$categories) > 0, ]
# A tibble: 4 x 3
categories name type
<chr> <chr> <chr>
1 Mexican Restaurants Taco Bell business
2 Restaurants Ohana Hawaiian BBQ business
3 Restaurants Carrabba's Italian Grill business
4 Mexican Restaurants Don Tequila business