我已经完成了谷歌搜索,查看了当前勘误表的书籍,并在堆栈溢出中搜索了错误,但没有找到答案。我正在阅读第4-10页的书中。
这部分运行良好:
original_books <- austen_books() %>%
group_by(book) %>%
mutate(linenumber = row_number(),
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
ignore_case = TRUE)))) %>%
ungroup()
original_books
tidy_books <- original_books %>%
unnest_tokens(word, text)
tidy_books
data(stop_words)
tidy_books<- tidy_books %>%
anti_join(stop_words)
tidy_books %>%
count(word, sort = TRUE)
tidy_books %>%
count(word, sort= TRUE) %>%
filter(n>600) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col() +
xlab(NULL) +
coord_flip()
hgwells <- gutenberg_download(c(35, 36, 5230, 159))
tidy_hgwells <- hgwells %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)
tidy_hgwells %>%
count(word, sort=TRUE)
bronte <- gutenberg_download(c(1260, 768, 969, 9182, 767))
tidy_bronte <- bronte %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)
tidy_bronte %>%
count(word, sort=TRUE)
frequency <- bind_rows(mutate(tidy_bronte, author="Bronte Sisters"),
mutate(tidy_hgwells, author = "H.G. Wells"),
mutate(tidy_books, author = "Jane Austen")) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
count(author, word) %>%
group_by(author) %>%
mutate(proportion = n / sum(n)) %>%
select(-n) %>%
spread(author, proportion) %>%
gather(author, proportion, 'Bronte Sisters':'H.G. Wells')
frequency
但是当我运行这段代码时:
ggplot(frequency, aes(x=proportion, y='Jane Austen',
color=abs('Jane Austen' - proportion))) +
geom_abline(color="gray40", lty=2) +
geom_jitter(alpha=0.1, size=2.5, width=0.3, height=0.3) +
geom_text(aes(label= word), check_overlap=TRUE, vjust=1.5) +
scale_x_log10(labels= percent_format()) +
scale_y_log10(labels= percent_format()) +
scale_color_gradient(limits= c(0, 0.001),
low= "darkslategray4", high = "gray75") +
facet_wrap(~author, ncol=2) +
theme(legend.position="none") +
labs(y="Jane Austen", x=NULL)
我收到此错误:“Jane Austen”中的错误 - 比例: 二元运算符的非数字参数
这是频率的结构:
> str(frequency)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 57818 obs. of 4 variables:
$ word : chr "a" "a'most" "a'n't" "aback" ...
$ Jane Austen: num 9.19e-06 NA 4.60e-06 NA NA ...
$ author : chr "Bronte Sisters" "Bronte Sisters" "Bronte Sisters"
"Bronte Sisters" ...
$ proportion : num 3.19e-05 1.59e-05 NA 3.98e-06 3.98e-06 ...
比例和简奥斯汀有数值,但也有NA。我试图删除它们,但它没有帮助,而且我认为这本书会把它作为一个潜在的问题。
这些是我正在使用的库。当我运行它们时,我没有看到任何可能掩盖函数的冲突:
library(dplyr)
library(tidytext)
library(janeaustenr)
library(stringr)
library(tidyr)
library(ggplot2)
library(gutenbergr)
library(scales)
我在Windows 10上使用RStudio版本1.1.442。我正在使用R 3.4.4
关于什么可能出错的任何想法?
答案 0 :(得分:3)
您的问题很容易被忽视。你需要在简奥斯汀附近引用“反引号”。 Jane Austen在这种情况下不是名称,而是frequency
中的列名。带空格的列名需要反引号。
应该是:
ggplot(frequency, aes(x = proportion, y = `Jane Austen`, color = abs(`Jane Austen` - proportion))) +
.....
不
ggplot(frequency, aes(x=proportion, y='Jane Austen', color = abs('Jane Austen' - proportion))) +
.....