将矢量化矩阵转换为具有保留因子的数值数据帧

时间:2015-04-18 22:43:10

标签: r matrix vector dataframe double

我有以下列方式给出的矩阵:

m <- as.matrix(rbind(c("State", "Murder", "Assault", "UrbanPop", "Rape", "Group"),
c("Alabama", 13.2, 236, 58, 21.2, "A"),
c("Alaska", 10.0, 263, 48, 44.5, "A"),
c("Arizona", 8.1, 294, 80, 31.0, "A"),
c("Arkansas", 8.8, 190, 50, 19.5, "A"),
c("California", 9.0, 276, 91, 40.6, "A"),
c("Colorado", 7.9, 204, 78, 38.7, "A"),
c("Connecticut", 3.3, 110, 77, 11.1, "A"),
c("Delaware", 5.9, 238, 72, 15.8, "A"),
c("Florida", 15.4, 335, 80, 31.9, "A"),
c("Georgia", 17.4, 211, 60, 25.8, "A"),
c("Hawaii", 5.3, 46, 83, 20.2, "A"),
c("Idaho", 2.6, 120, 54, 14.2, "A"),
c("Illinois", 10.4, 249, 83, 24.0, "A"),
c("Indiana", 7.2, 113, 65, 21.0, "A"),
c("Iowa", 2.2, 56, 57, 11.3, "A"),
c("Kansas", 6.0, 115, 66, 18.0, "A"),
c("Kentucky", 9.7, 109, 52, 16.3, "A"),
c("Louisiana", 15.4, 249, 66, 22.2, "A"),
c("Maine", 2.1, 83, 51, 7.8, "B"),
c("Maryland", 11.3, 300, 67, 27.8, "B"),
c("Massachusetts", 4.4, 149, 85, 16.3, "B"),
c("Michigan", 12.1, 255, 74, 35.1, "B"),
c("Minnesota", 2.7, 72, 66, 14.9, "B"),
c("Mississippi", 16.1, 259, 44, 17.1, "B"),
c("Missouri", 9.0, 178, 70, 28.2, "B"),
c("Montana", 6.0, 109, 53, 16.4, "B"),
c("Nebraska", 4.3, 102, 62, 16.5, "C"),
c("Nevada", 12.2, 252, 81, 46.0, "C"),
c("New_Hampshire", 2.1, 57, 56, 9.5, "C"),
c("New_Jersey", 7.4, 159, 89, 18.8, "C"),
c("New_Mexico", 11.4, 285, 70, 32.1, "C"),
c("New_York", 11.1, 254, 86, 26.1, "C"),
c("North_Carolina", 13.0, 337, 45, 16.1, "C"),
c("North_Dakota", 0.8, 45, 44, 7.3, "C"),
c("Ohio", 7.3, 120, 75, 21.4, "D"),
c("Oklahoma", 6.6, 151, 68, 20.0, "D"),
c("Oregon", 4.9, 159, 67, 29.3, "D"),
c("Pennsylvania", 6.3, 106, 72, 14.9, "D"),
c("Rhode_Island", 3.4, 174, 87, 8.3, "D"),
c("South_Carolina", 14.4, 279, 48, 22.5, "D"),
c("South_Dakota", 3.8, 86, 45, 12.8, "D"),
c("Tennessee", 13.2, 188, 59, 26.9, "D"),
c("Texas", 12.7, 201, 80, 25.5, "D"),
c("Utah", 3.2, 120, 80, 22.9, "D"),
c("Vermont", 2.2, 48, 32, 11.2, "D"),
c("Virginia", 8.5, 156, 63, 20.7, "D"),
c("Washington", 4.0, 145, 73, 26.2, "D"),
c("West_Virginia", 5.7, 81, 39, 9.3, "D"),
c("Wisconsin", 2.6, 53, 66, 10.8, "D"),
c("Wyoming", 6.8, 161, 60, 15.6, "D")))

我需要将其转换为data.frame(或表),保留列和rownames,数字的数字,并将其他任何内容(在此示例列中&#39; Group&#39;)转换为因子。 (数据总是采用这种格式,因此代码必须是通用的。)

(可选步骤是按给定名称删除一列,这是使用data.frame的原因,因为这很容易。)

然后,将得到的data.frame(或表格或矩阵)传递给&#39; scale&#39;功能

我的解决方案包括几个步骤:

data <- m[-1,-1]
colnames(data) <- m[1,-1]
rownames(data) <- m[-1,1][m[-1,1]!='']
data <- as.data.frame(data)

现在我有data.frame,但它不能传递到 scale()函数(&#34; colMeans中的错误(x,na.rm = TRUE):&#39; x& #39;必须是数字&#34;)。如果我使用 data.matrix(data)函数,则因子被很好地交换,但所有双精度也被转换为整数。我被困在这几个小时。

提前谢谢

3 个答案:

答案 0 :(得分:3)

我将此移至答案,因为它似乎无法通过评论工作。您可以执行以下操作

data <- data.frame(lapply(data.frame(m[-1,-1], stringsAsFactors = FALSE), type.convert))

将矩阵的所有列转换为正确的格式

str(data)
# 'data.frame':  50 obs. of  5 variables:
# $ X1: num  13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
# $ X2: int  236 263 294 190 276 204 110 238 335 211 ...
# $ X3: int  58 48 80 50 91 78 77 72 80 60 ...
# $ X4: num  21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
# $ X5: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...

然后,您可以根据需要设置列/行名称

colnames(data) <- m[1,-1]
rownames(data) <- m[-1,1][m[-1,1]!='']

scale你可以做

scale(data[-5])

根据OP评论进行修改。

正如我已多次说过的那样,在data.matrix上使用factor是完全错误的,它会完全弄乱您的数据。请考虑以下示例

data.matrix(data.frame(A = factor(c("A", "B")),
                       B = factor(10:11),
                       C = factor(c("22-11-2014", "23-11-2014"))))
#      A B C
# [1,] 1 1 1
# [2,] 2 2 2

data.matrix为这些完全不同的值返回了相同的结果。

现在回到您的真实数据,如果您想避免在因素上运行scale并且您不知道哪些列是因素,您可以简单地创建一个将识别数字列的索引然后仅对它们运行scale,例如

indx <- sapply(data, is.numeric)
scale(data[indx])

答案 1 :(得分:0)

将其读作data.frame并稍后再执行

m = data.frame(rbind.... you data here as above)

rownames(m) = m$X1 
colnames(m) = c(t(m[1,]))
req.df  = m[-1,-1]

答案 2 :(得分:0)

以下是可以保留数字和因子类型的快速试用。

# convert into data frame
df <- as.data.frame(m[2:nrow(m), 2:ncol(m)], stringsAsFactors = FALSE)
# set names
names(df) <- m[1, 2:ncol(m)]    
rownames(df) <- m[2:nrow(m), 1]
# convert types into numeric or factor
df[] <- lapply(df, function(x) if(is.na(as.numeric(x[1]))) as.factor(x) else  as.numeric(x))

str(df)
'data.frame':   50 obs. of  5 variables:
 $ Murder  : num  13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
 $ Assault : num  236 263 294 190 276 204 110 238 335 211 ...
 $ UrbanPop: num  58 48 80 50 91 78 77 72 80 60 ...
 $ Rape    : num  21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
 $ Group   : Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...