我正在进行数据清理。我在Dplyr中使用mutate很多,因为它逐步生成新列,我可以很容易地看到它是如何发生的。
以下是我遇到此错误的两个示例
Error: incompatible size (%d), expecting %d (the group size) or 1
示例1:从邮政编码获取城镇名称。数据就像这样:
Zip
1 02345
2 02201
我注意到当数据中包含NA时,它无法正常工作。
没有NA就行了:
library(dplyr)
library(zipcode)
data(zipcode)
test = data.frame(Zip=c('02345','02201'),stringsAsFactors=FALSE)
test %>%
rowwise() %>%
mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )
导致
Source: local data frame [2 x 2]
Groups: <by row>
Zip Town1
1 02345 Manomet
2 02201 Boston
使用NA它不起作用:
library(dplyr)
library(zipcode)
data(zipcode)
test = data.frame(Zip=c('02345','02201',NA),stringsAsFactors=FALSE)
test %>%
rowwise() %>%
mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )
导致
Error: incompatible size (%d), expecting %d (the group size) or 1
例2。我想摆脱以下数据中Town列中出现的冗余状态名称。
Town State
1 BOSTON MA MA
2 NORTH AMAMS MA
3 CHICAGO IL IL
我就是这样做的: (1)将Town中的字符串分成单词,例如&#39; BOSTON&#39;和&#39; MA&#39;第1行 (2)看看这些词中的任何一个是否与该行的状态相符 (3)删除匹配的单词
library(dplyr)
test = data.frame(Town=c('BOSTON MA','NORTH AMAMS','CHICAGO IL'), State=c('MA','MA','IL'), stringsAsFactors=FALSE)
test %>%
mutate(Town.word = strsplit(Town, split=' ')) %>%
rowwise() %>% # rowwise ensures every calculation only consider currect row
mutate(is.state = match(State,Town.word ) ) %>%
mutate(Town1 = Town.word[-is.state])
这导致:
Town State Town.word is.state Town1
1 BOSTON MA MA <chr[2]> 2 BOSTON
2 NORTH AMAMS MA <chr[2]> NA NA
3 CHICAGO IL IL <chr[2]> 2 CHICAGO
含义:例如,第1行显示is.state == 2,表示Town中的第二个单词是州名。摆脱那项工作后,Town1是正确的城镇名称。
现在我想在第2行修复NA,但添加na.omit会导致错误:
test %>%
mutate(Town.word = strsplit(Town, split=' ')) %>%
rowwise() %>% # rowwise ensures every calculation only consider currect row
mutate(is.state = match(State,Town.word ) ) %>%
mutate(Town1 = Town.word[-na.omit(is.state)])
结果:
Error: incompatible size (%d), expecting %d (the group size) or 1
我检查了数据类型和大小:
test %>%
mutate(Town.word = strsplit(Town, split=' ')) %>%
rowwise() %>% # rowwise ensures every calculation only consider currect row
mutate(is.state = match(State,Town.word ) ) %>%
mutate(length(is.state) ) %>%
mutate(class(na.omit(is.state)))
结果:
Town State Town.word is.state length(is.state) class(na.omit(is.state))
1 BOSTON MA MA <chr[2]> 2 1 integer
2 NORTH AMAMS MA <chr[2]> NA 1 integer
3 CHICAGO IL IL <chr[2]> 2 1 integer
所以长度为%d =%1。有人可能有错吗?感谢
答案 0 :(得分:3)
你可以sub
出来吗?
test %>%
rowwise() %>%
mutate(Town=sub(sprintf('[, ]*%s$', State), '', Town))
## Source: local data frame [3 x 2]
## Groups: <by row>
##
## Town State
## 1 BOSTON MA
## 2 NORTH AMAMS MA
## 3 CHICAGO IL
(如果发生这种情况,这种方式也会在城镇之后捕获逗号。)
注意:如果你在ungroup()
使用rowwise_df
(就是这样),它也会擦除tbl_df
类并输出一个直的data.frame,这很好对于您的数据,如果您不小心并且正在查看大量数据(因为我已经做过无数次),它们会破坏您的屏幕。 (Github引用#936和#553。)