我正在研究R
假设所有数据都是字符串
还要考虑我的实际数据集是巨大的
列X有重复的
当X中的数据出现不止一次时,创建所有可能的不同Y对(当x是常数时),同时也保留Z列。
X Y Z
1 a RED
1 b BLUE
1 c PINK
1 d YELLOW
2 a PURPLE
3 a ORANGE
3 b GREEN
4 a BLACK
4 b WHITE
4 c BROWN
所以我想要实现的结果是
X Y1 Y2 Z1 Z2
1 a b RED BLUE
1 a c RED PINK
1 a d RED YELLOW
1 b c BLUE PINK
1 b d BLUE YELLOW
1 c d PINK YELLOW
2 a NA PURPLE NA
3 a b ORANGE GREEN
4 a b BLACK WHITE
4 a c BLACK BROWN
4 b c WHITE BROWN
我认为在R中如何编码“查找并创建所有不同的列对”的问题由< Expand data frame into combinations of row pairs>解决。
所以我的问题是如何在R中编码
“对于每一个X,每次有重复,将它们组合在一起,以找到并创建所有可能的Y和Z对(对于每个特定的X)”
我希望我的问题清楚明白!
请帮忙! :)
我实际数据集的一小部分(将有更多列)(X =分类; Y =主机种类; Z =其他所有内容):
parspecies |pargenus |hostspecies |hostgenus
----------------------------------------------------------------------
Blattophagus beci |Blatophagus |Platyzostreia castanea |Platyzostreia
Blissoxenos esakii |Blissoxenos |Dimorphopterus japonicus |Dimorphopterus
Blissoxenos esakii |Blissoxenos |Iphicrates spinicaput |Iphicrates
Blissoxenos esakii |Blissoxenos |Macropes obnubilus Macropes
Caenocholax fenesi |Caenocholax |Camponotus atriaps |Camponotus
Caenocholax fenesi |Caenocholax |Camponotus planatus |Camponotus
答案 0 :(得分:0)
这是在基础R
中执行此操作的一种方法#set up a list of matrices with the df$Y pairs (include NAs up to length 2)
combs <- tapply(df$Y,df$X,function(x) {length(x) <- max(2,length(x));return(t(combn(x,2)))})
#convert to a data.frame
df2 <- as.data.frame(do.call(rbind,combs),stringsAsFactors = FALSE)
names(df2) <- c("Y1","Y2")
#recreate values of df$X that are lost by previous steps
df2$X <- rep(as.numeric(names(combs)),times=sapply(combs,nrow))
#merge in the colours in df$Z
df2 <- merge(df2,df,by.x=c("X","Y2"),by.y=c("X","Y"),all.x=TRUE)
df2 <- merge(df2,df,by.x=c("X","Y1"),by.y=c("X","Y"),all.x=TRUE,suffixes=c("1","2"))
#get correct column order after merge
df2[,4:5] <- df2[,5:4]
df2
X Y1 Y2 Z1 Z2
1 1 a b RED BLUE
2 1 a c RED PINK
3 1 a d RED YELLOW
4 1 b c BLUE PINK
5 1 b d BLUE YELLOW
6 1 c d PINK YELLOW
7 2 a <NA> PURPLE <NA>
8 3 a b ORANGE GREEN
9 4 a b BLACK WHITE
10 4 a c BLACK BROWN
11 4 b c WHITE BROWN
数据:
df <- data.frame(X = c(1L, 1L, 1L, 1L, 2L, 3L, 3L, 4L, 4L, 4L),
Y = c("a", "b", "c", "d", "a", "a", "b", "a", "b", "c"),
Z = c("RED", "BLUE", "PINK", "YELLOW", "PURPLE", "ORANGE", "GREEN", "BLACK", "WHITE", "BROWN"),
stringsAsFactors = FALSE)
答案 1 :(得分:0)
考虑每个X组的by
merge
本身dfList = by(df, df$X, function(i){
tmp <- merge(i, i, by="X", suffix=c("1", "2"))
if (nrow(tmp) > 1) {
tmp <- subset(tmp, Y1 < Y2)[c("X","Y1","Y2","Z1","Z2")]
} else {
tmp[c("Y2","Z2")] <- NA
}
return(tmp)
})
newdf <- do.call(rbind, dfList)
rownames(newdf) <- NULL
newdf
# X Y1 Y2 Z1 Z2
# 1 1 a b RED BLUE
# 2 1 a c RED PINK
# 3 1 a d RED YELLOW
# 4 1 b c BLUE PINK
# 5 1 b d BLUE YELLOW
# 6 1 c d PINK YELLOW
# 7 2 a <NA> PURPLE <NA>
# 8 3 a b ORANGE GREEN
# 9 4 a b BLACK WHITE
# 10 4 a c BLACK BROWN
# 11 4 b c WHITE BROWN
。
dfList = by(df, df$X, function(i){
subset(merge(i, i, by="X", suffix=c("1", "2")), Y1 < Y2)[c("X","Y1","Y2","Z1","Z2")]
})
如果不是特殊的 PURPLE (1行组df),那么单线会做:
<nav class="navbar navbar-default navbar-fixed-top">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#myNavbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand pageTitle">iMobile</a>
</div>
<div class="collapse navbar-collapse" id="myNavbar">
<router-outlet name="navbar"></router-outlet>
<ul class="nav navbar-nav navbar-right" *ngIf="authenticated">
<li><a [routerLink]="['/login']"><span class="glyphicon glyphicon-log-out"></span> Log Out</a></li>
</ul>
</div>
</div>
</nav>
<alert></alert>
<router-outlet></router-outlet>