我在R中有一个如下所示的数据框:
Word Base Number Type
- - - -
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural
我想改造它,看起来像这样:
Word_Sg Word_Pl Base Num_Singular Num_Plural
-- -- -- -- --
shoe shoes shoe 4834 49955
toy toys toy 75465 23556
key keys key 39485 6546
NA jazz jazz NA 58765
因此,而不是将两行用于奇异和&的值。复数,我想有两个列,一个用于单数的数字,一个用于复数的数字。
我使用dplyr::summarize
尝试了一些事情,但到目前为止,没有任何成功。这是我到目前为止提出的代码:
dataframe1 <- dataframe %>%
mutate(Num_Singular = case_when(Type == "singular" ~ Number)) %>%
mutate(Num_Plural = case_when(Type == "plural" ~ Number)) %>%
dplyr::select(Word, Base, Num_Singular, Num_Plural) %>%
group_by(Base) %>%
dplyr::summarize(Num_Singular = paste(na.omit(Num_Singular)),
Num_Plural = paste(na.omit(Num_Plural))
然而,它给了我这个错误:
Error in summarise_impl(.data, dots) :
Column `Num_Singular` must be length 1 (a summary value), not 2)
我认为问题可能是有些行不一定具有单数和复数,而只有(或“爵士乐”)。虽然大多数行都有。
那我怎么能在R或dplyr中做到这一点?
答案 0 :(得分:4)
如果你先看看前几列::
select(dat, Base, Word, Type)[1:2,]
# Base Word Type
# 1 shoe shoe singular
# 2 shoe shoes plural
从这里开始,考虑将其扩展为单数/复数列,有效地从&#34; tall&#34;到&#34;宽&#34;。 (如果Type
中有两个以上的类别会更加明显。)
select(dat, Base, Word, Type) %>%
spread(Type, Word) %>%
rename(Word_Pl=plural, Word_Sg=singular)
# Base Word_Pl Word_Sg
# 1 jazz jazz <NA>
# 2 key keys key
# 3 shoe shoes shoe
# 4 toy toys toy
您也可以轻松地为Number
重复此操作。从那里开始,只需根据关键列Base
合并/加入它们:{/ 1}:
full_join(
select(dat, Base, Word, Type) %>%
spread(Type, Word) %>%
rename(Word_Pl=plural, Word_Sg=singular),
select(dat, Base, Number, Type) %>%
spread(Type, Number) %>%
rename(Num_Pl=plural, Num_Sg=singular),
by = "Base"
)
# Base Word_Pl Word_Sg Num_Pl Num_Sg
# 1 jazz jazz <NA> 58765 NA
# 2 key keys key 6546 39485
# 3 shoe shoes shoe 49955 4834
# 4 toy toys toy 23556 75465
消耗品数据:
library(dplyr)
library(tidyr)
dat <- read.table(text='Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural', header=TRUE, stringsAsFactors=FALSE)
答案 1 :(得分:0)
核心思想是通过它的类型识别每个数据点,以及它是单词还是数字......然后它很容易传播到您想要的格式。 (我不打算重命名变量或专门命名它们以匹配您的预期输出,因为这很容易做,而不是问题的一部分)
library(dplyr)
library(tidyr)
dat <- read.table(header = T, stringsAsFactors = F, text='
Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural')
dat %>%
gather(variable, value, Word, Number) %>%
unite(Type, variable, Type) %>%
spread(Type, value, convert = T) %>%
as_tibble()
# # A tibble: 4 x 5
# Base Number_plural Number_singular Word_plural Word_singular
# <chr> <int> <int> <chr> <chr>
# 1 jazz 58765 NA jazz NA
# 2 key 6546 39485 keys key
# 3 shoe 49955 4834 shoes shoe
# 4 toy 23556 75465 toys toy
答案 2 :(得分:0)
您可以plural
加入数据的singular
和Base
子集,然后移除Type
列并重新排序其他列...
full_join(filter(dat, Type == "plural"),
filter(dat, Type == "singular"),
by = "Base",
suffix = c("_Pl", "_Sg")) %>%
select(Word_Sg, Word_Pl, Base, Number_Sg, Number_Pl)
# Word_Sg Word_Pl Base Number_Sg Number_Pl
# 1 shoe shoes shoe 4834 49955
# 2 toy toys toy 75465 23556
# 3 key keys key 39485 6546
# 4 <NA> jazz jazz NA 58765
答案 3 :(得分:0)
tidyr
的新pivot_wider()
函数使此操作变得简单...
library(dplyr)
library(tidyr)
dat <- read.table(header = T, stringsAsFactors = F, text='
Word Base Number Type
shoe shoe 4834 singular
shoes shoe 49955 plural
toy toy 75465 singular
toys toy 23556 plural
key key 39485 singular
keys key 6546 plural
jazz jazz 58765 plural')
dat %>%
pivot_wider(id_cols = Base, names_from = Type, values_from = c(Word, Number))
# # A tibble: 4 x 5
# Base Word_singular Word_plural Number_singular Number_plural
# <chr> <chr> <chr> <int> <int>
# 1 shoe shoe shoes 4834 49955
# 2 toy toy toys 75465 23556
# 3 key key keys 39485 6546
# 4 jazz NA jazz NA 58765