数据帧1
Gene_1 ID_1
Gdi2 G3GR73
Pitrm1 G3GR85
G3GRA0
Tmem43 G3GS14
Tmf1 G3GS63
Ddx3x G3GSH5
Bdh1 G3GSJ7
Pak2 G3GSK4
Tfrc G3GSM5
Umps G3GSP0
Gart G3GT56
Pgm3 G3GTC9
Cpt2 G3GTN3
Vps26b G3GTV9
Mthfd1l G3GU10
Rbm19 G3GU41
G3GU60
Prkab1 G3GU67
Tigar G3GUK0
数据框2
Gene_2 ID_2
Bak1 A1E3K4
Pitrm1 G3GR85
Gtpbp4 G3GR93
Lbr G3GRA0
Tmem43 G3GS14
Tmf1 G3GS63
Ddx3x G3GSH5
Bdh1 G3GSJ7
Tfrc G3GSM5
Umps G3GSP0
Gart G3GT56
Pgm3 G3GTC9
Grb2 G3GTE4
Cpt2 G3GTN3
Vps26b G3GTV9
Mthfd1l G3GU10
Rbm19 G3GU41
G3GU60
Prkab1 G3GU67
原始数据
ID_3
A1E3K4
G3GR73
G3GR85
G3GR93
G3GRA0
G3GRB1
G3GRB9
G3GRD8
G3GRE1
G3GRM2
G3GRT0
G3GRW3
G3GRX2
G3GS14
G3GS63
G3GS70
G3GS82
G3GSH2
G3GSH5
我尝试了cbind和match_order函数,但是它们并不能完全满足我的要求。还尝试从其中两个数据集创建一个数据框,但由于它们的大小不同而无法创建。
joint <- data.frame(ori$ID_3, df_1$ID_1, df_1$Gene_1)
data.frame(ori $ ID_3,df_1 $ ID_1,df_1 $ Gene_1)中的错误: 参数意味着行数不同:1255、544
目标是最终对数据集3中的所有大约1300个条目进行类似的处理
Gene_1 ID_1 Gene_2 ID_2 ID_3
Bak1 A1E3K4 A1E3K4
Gdi2 G3GR73 G3GR73
Pitrm1 G3GR85 Pitrm1 G3GR85 G3GR85
Gtpbp4 G3GR93 G3GR93
G3GRA0 Lbr G3GRA0 G3GRA0
G3GRB1
G3GRB9
G3GRD8
G3GRE1
G3GRM2
G3GRT0
G3GRW3
G3GRX2
Tmem43 G3GS14 Tmem43 G3GS14 G3GS14
Tmf1 G3GS63 Tmf1 G3GS63 G3GS63
G3GS70
G3GS82
G3GSH2
Ddx3x G3GSH5 Ddx3x G3GSH5 G3GSH5
G3GSJ5
Bdh1 G3GSJ7 Bdh1 G3GSJ7 G3GSJ7
Pak2 G3GSK4 G3GSK4
G3GSL6
Tfrc G3GSM5 Tfrc G3GSM5 G3GSM5
这是我第一次使用R进行这种类型的分析,因此所有建议/代码示例都将受到感激。
更新1
基于Stewarts的代码和建议,我得出了以下结论。文件1有2列和544个观测值,文件2有2列和419个观测值,文件3有1254个观测值。文件已正确加入,但最终文件只有33个观测值,而不是1254个观测值。
getwd()
file1 <- read.csv("cigr_db.csv", sep=",", header=T)
file2 <- read.csv("picr_db.csv", sep=",", header=T)
file3 <- read.csv("progen_data.csv", sep=",", header=T)
# Change the ID column name to be the same in each dataframe, so we can match on it
colnames(file1)[2] <- 'ID'
colnames(file2)[2] <- 'ID'
colnames(file3)[1] <- 'ID'
v <- plyr::join(df1, df2, type='full')
v <- plyr::join(v, df3, type='full')
v
write.csv(v, file = "all_condt.csv")````
答案 0 :(得分:0)
假设所有三个列都包含相同的值,您是否需要复制ID_X
列?如果没有,则可以使用join
软件包中的plyr
函数:
library(plyr)
file1 <- 'Gene_1,ID_1
Gdi2,G3GR73
Pitrm1,G3GR85
,G3GRA0
Tmem43,G3GS14
Tmf1,G3GS63
Ddx3x,G3GSH5
Bdh1,G3GSJ7
Pak2,G3GSK4
Tfrc,G3GSM5
Umps,G3GSP0
Gart,G3GT56
Pgm3,G3GTC9
Cpt2,G3GTN3
Vps26b,G3GTV9
Mthfd1l,G3GU10
Rbm19,G3GU41
,G3GU60
Prkab1,G3GU67
Tigar,G3GUK0'
df1 <- read.table(textConnection(file1), sep=",", header=T)
file2 <- 'Gene_2,ID_2
Bak1,A1E3K4
Pitrm1,G3GR85
Gtpbp4,G3GR93
Lbr,G3GRA0
Tmem43,G3GS14
Tmf1,G3GS63
Ddx3x,G3GSH5
Bdh1,G3GSJ7
Tfrc,G3GSM5
Umps,G3GSP0
Gart,G3GT56
Pgm3,G3GTC9
Grb2,G3GTE4
Cpt2,G3GTN3
Vps26b,G3GTV9
Mthfd1l,G3GU10
Rbm19,G3GU41
,G3GU60
Prkab1,G3GU67'
df2 <- read.table(textConnection(file2), sep=",", header=T)
file3 <- 'ID_3
A1E3K4
G3GR73
G3GR85
G3GR93
G3GRA0
G3GRB1
G3GRB9
G3GRD8
G3GRE1
G3GRM2
G3GRT0
G3GRW3
G3GRX2
G3GS14
G3GS63
G3GS70
G3GS82
G3GSH2
G3GSH5'
df3 <- read.table(textConnection(file3), sep=",", header=T)
# Change the ID column name to be the same in each dataframe, so we can match on it
colnames(df1)[2] <- 'ID'
colnames(df2)[2] <- 'ID'
colnames(df3)[1] <- 'ID'
v <- plyr::join(df1, df2, type='full')
v <- plyr::join(v, df3, type='full')
哪个给:
> v
Gene_1 ID Gene_2
1 Gdi2 G3GR73 <NA>
2 Pitrm1 G3GR85 Pitrm1
3 G3GRA0 Lbr
4 Tmem43 G3GS14 Tmem43
5 Tmf1 G3GS63 Tmf1
6 Ddx3x G3GSH5 Ddx3x
7 Bdh1 G3GSJ7 Bdh1
8 Pak2 G3GSK4 <NA>
9 Tfrc G3GSM5 Tfrc
10 Umps G3GSP0 Umps
11 Gart G3GT56 Gart
12 Pgm3 G3GTC9 Pgm3
13 Cpt2 G3GTN3 Cpt2
14 Vps26b G3GTV9 Vps26b
15 Mthfd1l G3GU10 Mthfd1l
16 Rbm19 G3GU41 Rbm19
17 G3GU60
18 Prkab1 G3GU67 Prkab1
19 Tigar G3GUK0 <NA>
20 <NA> A1E3K4 Bak1
21 <NA> G3GR93 Gtpbp4
22 <NA> G3GTE4 Grb2
23 <NA> G3GRB1 <NA>
24 <NA> G3GRB9 <NA>
25 <NA> G3GRD8 <NA>
26 <NA> G3GRE1 <NA>
27 <NA> G3GRM2 <NA>
28 <NA> G3GRT0 <NA>
29 <NA> G3GRW3 <NA>
30 <NA> G3GRX2 <NA>
31 <NA> G3GS70 <NA>
32 <NA> G3GS82 <NA>
33 <NA> G3GSH2 <NA>