我有两个单独的数据库(Database_1和Database_2),我想将Database_2的高度轮廓添加到Database_1中,作为Database_1中的新列。
Database_1:
Horse_type
Stallion
Race_horse
Work_horse
Work_horse
Database_2:
Horse_type Height_profile
Stallion Large
Race_horse Medium
Work_horse Small
Pure_breed Huge
到目前为止,我仅尝试使用for循环来完成此操作。
for (row in 1:nrow(Database_1)) {
if(Database_1$Horse_type == Database_2$Horse_type) {
Database_1$New_Column <- Database_2$height_profile
}
}
我希望输出:
Database_1:
Horse_type Height_profile
Stallion Large
Race_horse Medium
Work_horse Small
Work_horse Small
但是实际输出是:
“有50个或更多警告(请使用warnings()查看前50个警告”)
答案 0 :(得分:1)
循环似乎是执行此操作的一种复杂方法。您只需merge
公用列Horse_type
的两个数据帧,值便会添加为新列:
database_1 <- merge(database_1, database_2, by = "Horse_type")
答案 1 :(得分:0)
您的循环有几个问题。让我大吃一惊的是,您创建了row
来表示要循环的列表中的每个元素,但切勿在实际循环中调用它。下次可能需要研究一下...无论如何,这行得通:
#create dataframes
df1 <- as.data.frame(list(Horse_type= c("Stallion",
"Race_horse",
"Work_horse",
"Work_horse")
), stringsAsFactors = F,
)
df2 <- as.data.frame(list(Horse_type= c("Stallion",
"Race_horse",
"Work_horse",
"Pure_breed"),
Height_profile= c("Large",
"Medium",
"Small",
"Huge")
), stringsAsFactors = F,
)
#initialize empty column to capture output of loop iteratively
New_column <- NULL
for (i in 1:nrow(df1)) {
New_column[i] <- df2$Height_profile[
which(df1$Horse_type[i] == df2$Horse_type)
]
}
#attach output of loop as a variable to df1
df1$height <- New_column
答案 2 :(得分:0)
您可以使用data.table
软件包-
> setkey(database1,"Horse_type")
> setkey(database2,"Horse_type")
> setDT(database2)[setDT(database1),]
Horse_type Height_profile
1: Race_horse Medium
2: Stallion Large
3: Work_horse Small
4: Work_horse Small
OR
> merge(database1,database2)
Horse_type Height_profile
1 Race_horse Medium
2 Stallion Large
3 Work_horse Small
4 Work_horse Small