我有两个数据框:
employee <- c("John Doe","Peter Gynn","Jolie Hope")
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
data1 <- data.frame(employee, salary, startdate)
employee <- c("John Doe", "Rob", "Peter Gynn", "Ellen A")
city <- c("city1", "city2", "city3", "city1")
age <- c( 1,3,4,2)
data2 <- data.frame(employee, city, age)
我正在尝试将它们组合在一起,我遇到了以下问题:因素会转为整数。
data1$city <- NA
data1$age <- NA
data1[1:3, c("city", "age")] <- data2[1:3, c("city", "age")]
结果:
> data1
employee salary startdate city age
1 John Doe 21000 2010-11-01 1 1
2 Peter Gynn 23400 2008-03-25 2 3
3 Jolie Hope 26800 2007-03-14 3 4
> class(data1[,4])
[1] "integer"
有人可以解释为什么因子会变成整数,为什么以下有效?
data1[, c("city", "age")] <- data2[1:3, c("city", "age")]
> data1
employee salary startdate city age
1 John Doe 21000 2010-11-01 city1 1
2 Peter Gynn 23400 2008-03-25 city2 3
3 Jolie Hope 26800 2007-03-14 city3 4
有没有办法避免这种情况?我想避免使用(绑定函数)。
答案 0 :(得分:1)
如果我们需要在&lt; data1&#39;中创建两个变量来自&#39; data2&#39;在列中,我们可以直接创建它而不是创建新变量NA
,然后将值替换为&#39; data2&#39;中的相应列。 (我没有为行索引指定1:3,因为data1的nrow
是3)。
data1[c('city', 'age')] <- data2[1:3, c("city", "age")]
data1
# employee salary startdate city age
#1 John Doe 21000 2010-11-01 city1 1
#2 Peter Gynn 23400 2008-03-25 city2 3
#3 Jolie Hope 26800 2007-03-14 city3 4
data1$city
#[1] city1 city2 city3
#Levels: city1 city2 city3
但是,如果我们创建两个变量为NA
data1$city <- NA
data1$age <- NA
城市的class
&#39;两个数据集都不一样
class(data1$city)
#[1] "logical"
class(data2$city)
#[1] "factor"
因此,由于factor
的存储模式为numeric
,这可能会导致factor
类强制转换为numeric
。
mode(data2$city)
#[1] "numeric"
如果我们想在&#39; data1&#39;中首先创建变量,可以选择一个选项。然后替换是创建变量&#39; city&#39;作为factor
,city
data1$city <- factor(NA, levels=unique(city))
data1[, c("city", "age")] <- data2[1:3, c("city", "age")]
data1$city
#[1] city1 city2 city3
#Levels: city1 city2 city3