我有以下df,它是从excel文件中获得的:
df1 <- data.frame( Colour = c("Green","Red","Blue"),
Code = c("N","U", "U"),
User1 = c("John","Brad","Peter"),
User2 = c("Meg","Meg","John"),
User3= c("", "Lucy", ""))
我需要重新排列它以获得一个数据框,其中所有名称都列在第一列中(仅一次),颜色(和相应的代码)出现在以下列中,如下所示:
df2 <- data.frame(User=c("John","Brad","Peter","Meg","Lucy"),
Color1 = c("Green","Red","Blue","Green","Red"),
Code1 = c("N","U","U","N","U"),
Color2=c("Blue","","","Red",""),
Code2=c("U","","","U",""))
我感谢一些帮助。非常感谢,
答案 0 :(得分:4)
由于与@ akrun的回答存在概念上的相似性,我对此发表评论犹豫不决,但你也可以通过我的&#34; splitstackshape&#34;中的merged.stack
来做到这一点。与基础R的reshape
一起打包。
library(splitstackshape)
reshape(
getanID(
merged.stack(df1, var.stubs = "User", sep = "var.stubs")[User != ""],
"User"),
direction = "wide", idvar = "User", timevar = ".id", drop = ".time_1")
# User Colour.1 Code.1 Colour.2 Code.2
# 1: Peter Blue U NA NA
# 2: John Blue U Green N
# 3: Meg Green N Red U
# 4: Brad Red U NA NA
# 5: Lucy Red U NA NA
merged.stack
使数据变长,getanID
创建一个ID变量,用于转换为宽格式时,reshape
执行从此半宽形式到宽广的实际转换形式。
这是我能想到的最好的&#34; dplyr&#34; +&#34; tidyr&#34;用户。看起来相当冗长,但不应该太难遵循:
library(dplyr)
library(tidyr)
df1 %>%
gather(var, User, User1:User3) %>% # Get the data into a long form
filter(User != "") %>% # Drop empty rows
group_by(User) %>% # Group by User
mutate(Id = sequence(n())) %>% # Create a new id variable
gather(var2, value, Colour, Code) %>% # Go long a second time
unite(Key, var2, Id) %>% # Combine values to create a key
spread(Key, value, fill = "") # Convert back to a wide form
# Source: local data frame [6 x 6]
#
# var User Code_1 Code_2 Colour_1 Colour_2
# 1 User1 Brad U Red
# 2 User1 John N Green
# 3 User1 Peter U Blue
# 4 User2 John U Blue
# 5 User2 Meg N U Green Red
# 6 User3 Lucy U Red
答案 1 :(得分:4)
它不漂亮,但这是纯基R中的另一个解决方案,它使用了几个reshape()
的调用:
reshape(transform(subset(reshape(df1,varying=grep('^User',names(df1)),dir='l',v.names='User'),User!=''),id=NULL,time=ave(c(User),User,FUN=seq_along),User=factor(User)),dir='w',idvar='User',sep='');
## User Colour1 Code1 Colour2 Code2
## 1.1 John Green N Blue U
## 2.1 Brad Red U <NA> <NA>
## 3.1 Peter Blue U <NA> <NA>
## 1.2 Meg Green N Red U
## 2.3 Lucy Red U <NA> <NA>
答案 2 :(得分:3)
我们可以使用dcast
的开发版本中的data.table
,即v1.9.5 +。它可能需要多个value.var
列。我们将data.frame
转换为data.table
(setDT(df1)
),将melt
数据与id列转换为&#39; Color&#39;和&#39;代码&#39;,删除&#39;用户&#39;不等于&#39;&#39; ([User!='']
),根据&#39;用户&#39;创建分组序列。列和dcast
。要安装的说明是here
library(data.table)#v1.9.5+
dcast(melt(setDT(df1), id.var=c('Colour', 'Code'),
value.name='User')[User!=''][,
N:=1:.N, User], User~N, value.var=c('Colour', 'Code'))
# User 1_Colour 2_Colour 1_Code 2_Code
#1: Brad Red NA U NA
#2: John Green Blue N U
#3: Lucy Red NA U NA
#4: Meg Green Red N U
#5: Peter Blue NA U NA
或者@Arun在评论中提到,我们可以使用subset
中的dcast
参数代替[User!='']
dcast(melt(setDT(df1), id.var=c('Colour', 'Code'),
value.name='User')[,N:= 1:.N, User],
subset=.(User !=''), User~N, value.var=c('Colour', 'Code'))
# User 1_Colour 2_Colour 1_Code 2_Code
#1: Brad Red NA U NA
#2: John Green Blue N U
#3: Lucy Red NA U NA
#4: Meg Green Red N U
#5: Peter Blue NA U NA