我想使用表达式列表来编写新字段。
在我的数据框中,Bisaccategory1包含图书类别的完整描述。表示该字段中的部分值的特定字符串可用于定义名为" Genre"的新字段。一个特定类型将是"非小说",它映射到25个独特的完整描述。我可以通过指定其中包含的某些模式来识别这些完整描述:
nonfiction<-c("BIOGRAPHY & AUTOBIOGRAPHY","BODY, MIND & SPIRIT","BUSINESS & ECONOMICS","COMICS & GRAPHIC NOVELS",
"COMPUTERS","COOKING","FAMILY & RELATIONSHIPS","HEALTH & FITNESS","HISTORY","HOUSE & HOME","HUMOR",
"LITERARY CRITICISM","NATURE","PERFORMING
ARTS","PETS","PHOTOGRAPHY","POETRY","POLITICAL SCIENCE","RELIGION",
"SCIENCE","SELF-HELP","SOCIAL SCIENCE","SPORTS & RECREATION","TRANSPORTATION","TRUE CRIME")
然后我可以匹配这些字符串以完成Biscategory1值,如下所示:
matches <- unique (grep(paste(nonfiction,collapse="|"),
detail$Bisaccategory1, value=TRUE))
但我不清楚如何使用这些&#34;匹配&#34;分配价值&#34;非虚构&#34;到我的新流派领域。
这是样本数据:
structure(list(Author = c("James Swallow", "Billy Crystal", "Mark Divine",
"Charles Cumming", "Victoria Schwab", "Louise Penny", "Elizabeth Warren",
"Linda Castillo", "Paul Fischer", "Sandy Hall", "Louise Penny",
"Louise Penny", "Lisa Scottoline", "Linda Castillo", "Evan Osnos",
"Porter Erisman"), Title = c("24: Deadline", "700 Sundays - Still Foolin' 'Em",
"8 Weeks to Sealfit", "A Colder War", "A Dark Shade of Magic",
"A Fatal Grace", "A Fighting Chance", "A Hidden Secret", "A Kim Jong-Il Production",
"A Little Something Different", "A Rule Against Murder", "A Trick of the Light",
"Accused", "After the Storm", "Age of Ambition", "Alibaba's World"
), Bisac = c("FICTION / Thrillers / General", "BIOGRAPHY & AUTOBIOGRAPHY / Entertainment & Performing Arts",
"HEALTH & FITNESS / Exercise", "FICTION / Thrillers / Espionage",
"FICTION / Fantasy / Historical", "FICTION / Mystery & Detective / Traditional",
"BIOGRAPHY & AUTOBIOGRAPHY / Political", "FICTION / Mystery & Detective / Police Procedural",
"HISTORY / Asia / Korea", "JUVENILE FICTION / Love & Romance",
"FICTION / Mystery & Detective / Traditional", "FICTION / Mystery & Detective / Traditional",
"FICTION / Thrillers / Legal", "FICTION / Mystery & Detective / Police Procedural",
"HISTORY / Asia / China", "BUSINESS & ECONOMICS / E-Commerce / General"
)), .Names = c("Author", "Title", "Bisac"), class = "data.frame", row.names = c(NA,
-16L))
我知道我可以做类似的事情:
df$Genre[Bisaccategory1=="BODY, MIND & SPIRIT / Inspiration & Personal Growth"]<-"nonfiction"
但我有数百个类别,而且这不是真正可扩展的。我很感激任何建议。
答案 0 :(得分:2)
函数grep
将返回一个匹配的逻辑索引,而不是grepl
。您可以使用它来对Genre列进行子集化。我将那些不是“非小说”的作品分配给了小说,但你可以随心所欲地制作它们。
matches <- grepl(paste(nonfiction,collapse="|"), detail$Bisac)
detail$Genre <- "fiction"
detail$Genre[matches] <- "non-fiction"
# Bisac Genre
# 1 FICTION / Thrillers / General fiction
# 2 BIOGRAPHY & AUTOBIOGRAPHY / Entertainment & Performing Arts non-fiction
# 3 HEALTH & FITNESS / Exercise non-fiction
# 4 FICTION / Thrillers / Espionage fiction
# 5 FICTION / Fantasy / Historical fiction
# 6 FICTION / Mystery & Detective / Traditional fiction
# 7 BIOGRAPHY & AUTOBIOGRAPHY / Political non-fiction
答案 1 :(得分:0)
library(dplyr)
library(tidyr)
library(stringi)
non_fiction_books =
detail %>%
mutate(Bisac = Bisac %>% stri_split_fixed(" / ") ) %>%
unnest(Bisac) %>%
mutate(Bisac = Bisac %>% stri_trans_toupper) %>%
right_join(data_frame(Bisac = non_fiction) ) %>%
select(-Bisac) %>%
distinct