我是R的新手并且正在努力使用R来分析一些数据。数据恰好是Excel格式,现在我正在努力寻找将其转换为R友好格式的方法。
问题是列标题已合并单元格,因此实际上标题有两行。我想将它转换为一组正常的一维向量,添加一个额外的列和一行。让我用一个例子来解释:
目前excel格式如此:
| H | J |
Y |M |F |M |F |
== == == == ==
Y1|V1|V2|V3|V4|
H,J是合并的列标题,每个标题都跨越M和F列。
=表示上面的行是标题行
鉴于H,J都是R下的元素,我想将其转换为具有普通标题和两行的柱状格式,如下所示
Y |R |M |F |
== == == ==
Y1|H |V1|V2|
Y1|J |V3|V4|
有没有人知道如何做到这一点?
答案 0 :(得分:1)
首先,一些假设:
其次,您的数据。
temp = c(",\"H\",,\"J\",",
"\"Y\",\"M\",\"F\",\"M\",\"F\"",
"\"Y1\",\"V1\",\"V2\",\"V3\",\"V4\"")
第三,稍微修改版this answer。
# check.names is set to FALSE to allow variable names to be repeated
ONE = read.csv(textConnection(temp), skip=1, check.names=FALSE,
stringsAsFactors=FALSE)
GROUPS = read.csv(textConnection(temp), header=FALSE,
nrows=1, stringsAsFactors=FALSE)
GROUPS = GROUPS[!is.na(GROUPS)]
# This can be shortened, but I've written it this way to show how
# it can be generalized. For instance, if 3 columns were repeated
# instead of 2, the rep statement could be changed to reflect that
names(ONE)[-1] = paste0(names(ONE)[-1], ".",
rep(GROUPS, each=(length(names(ONE)[-1])/2)))
第四,实际重塑数据。
TWO = reshape(ONE, direction="long", ids=1, varying=2:ncol(ONE))
# And, here's the output.
TWO
# Y time M F id
# 1.H Y1 H V1 V2 1
# 1.J Y1 J V3 V4 1