我收到的数据集远大于,但类似于:
cost <- data.frame(Date=rep('1970-01-01', 3), Atr=runif(3),Atrb=runif(3),a=runif(3), b=runif(3), a=runif(3), b=runif(3),a=runif(3), b=runif(3))
names(cost) <- c('Date', 'Atr','Atrb','a','b','a','b','a','b')
除了a和b之外,还有许多列具有重复的名称。 这里有更多的属性列。
理想情况下,我希望能够使用-c(Date:Atrb)
样式进行任何排除。
我希望能够创建两种类型的高数据:
Date Atr Atrb Group a b
1 1970-01-01 0.988181929 0.123 A 0.3836335 0.7793414
...
1 1970-01-01 0.988181929 0.456 B 0.6464 0.98687
...
1 1970-01-01 0.988181929 0.789 C 0.123 0.3456
另一个是:
Date Atr Atrb Group Metric Value
1 1970-01-01 0.988181929 0.123 A a 0.3836335
1 1970-01-01 0.988181929 0.456 A b 0.7793414
...
1 1970-01-01 0.988181929 0.678 B a 0.6464
1 1970-01-01 0.988181929 0.345 B b 0.98687
...
1 1970-01-01 0.988181929 0.789 C a 0.123
1 1970-01-01 0.988181929 0.456 C b 0.3456
感谢任何提示或提示
答案 0 :(得分:1)
data.table
融化的延伸对此很有用,因为你可以一次融化多个重复的列:
require(data.table)
setDT(cost)
melt(cost, id=c("Date","Atr"), meas = patterns("^a$","^b$"), value.name=c("a","b"))
Date Atr variable a b
1: 1970-01-01 0.09643571 1 0.38316876 0.69935636
2: 1970-01-01 0.10714089 1 0.12154920 0.42598159
3: 1970-01-01 0.91581813 1 0.03301164 0.21327371
4: 1970-01-01 0.09643571 2 0.06866915 0.05604199
5: 1970-01-01 0.10714089 2 0.74418388 0.17013278
6: 1970-01-01 0.91581813 2 0.33784588 0.33794886
7: 1970-01-01 0.09643571 3 0.19680638 0.51427164
8: 1970-01-01 0.10714089 3 0.71372700 0.71134925
9: 1970-01-01 0.91581813 3 0.34700614 0.84975838
要转到第二个表单,您可以简单地使用melt
的标准实现,例如melt(cost2, id.vars=c("Date","Atr", "variable"))
答案 1 :(得分:1)
这是一个dplyr
解决方案,但它需要首先对重复的列名进行重复数据删除。可能有一种方法可以使用dplyr
或类似的东西在rename_at
管道中执行此操作,但目前没有任何内容可供我使用:
library(tidyverse)
# Deduplicate column names
idx = grep("^[ab]$", names(cost))
names(cost)[idx] = paste0(names(cost)[idx],".", rep(LETTERS[1:(length(idx)/2)], length(unique(names(cost)[idx]))))
cost.long = cost %>%
gather(key, value, -Date, -Atr) %>%
separate(key, into=c("Metric", "Group"))
cost.long
Date Atr Metric Group value 1 1970-01-01 0.5567203 a A 0.43008996 2 1970-01-01 0.9421835 a A 0.94488436 3 1970-01-01 0.4672264 a A 0.70847981 ... 16 1970-01-01 0.5567203 b C 0.53797611 17 1970-01-01 0.9421835 b C 0.02623668 18 1970-01-01 0.4672264 b C 0.72440841
cost.long %>% spread(Metric, value)
Date Atr Group a b 1 1970-01-01 0.4672264 A 0.7084798 0.34580760 2 1970-01-01 0.4672264 B 0.6537164 0.69052451 3 1970-01-01 0.4672264 C 0.6811653 0.72440841 4 1970-01-01 0.5567203 A 0.4300900 0.35140699 5 1970-01-01 0.5567203 B 0.7600863 0.76989417 6 1970-01-01 0.5567203 C 0.7536469 0.53797611 7 1970-01-01 0.9421835 A 0.9448844 0.61829407 8 1970-01-01 0.9421835 B 0.8929478 0.95985575 9 1970-01-01 0.9421835 C 0.9765727 0.02623668
更新:要解决您的评论和更新的示例数据:如果grep
不是一个选项,则还有其他方法。例如:
idx = which(names(cost) %in% c("a","b"))
names(cost)[idx] = paste0(names(cost)[idx],".", rep(LETTERS[1:(length(idx)/2)], length(unique(names(cost)[idx]))))
head(cost)
Date Atr Atrb a.A b.B a.C b.A a.B b.C 1 1970-01-01 0.9437837 0.6600156 0.6084679 0.7855074 0.5800305 0.5174202 0.8040882 0.5707653 2 1970-01-01 0.7802974 0.5401236 0.4929647 0.5453363 0.5287559 0.3684388 0.5405421 0.2421765 3 1970-01-01 0.5336272 0.9807934 0.4530894 0.6434532 0.8517817 0.1269552 0.6914684 0.3035965
我没有收到以下代码的警告:
cost %>%
gather(key, value, -c(Date:Atrb)) %>%
separate(key, into=c("Metric", "Group"))
Date Atr Atrb Metric Group value 1 1970-01-01 0.9437837 0.6600156 a A 0.6084679 2 1970-01-01 0.7802974 0.5401236 a A 0.4929647 3 1970-01-01 0.5336272 0.9807934 a A 0.4530894 4 1970-01-01 0.9437837 0.6600156 b B 0.7855074 5 1970-01-01 0.7802974 0.5401236 b B 0.5453363 6 1970-01-01 0.5336272 0.9807934 b B 0.6434532 7 1970-01-01 0.9437837 0.6600156 a C 0.5800305 8 1970-01-01 0.7802974 0.5401236 a C 0.5287559 9 1970-01-01 0.5336272 0.9807934 a C 0.8517817 10 1970-01-01 0.9437837 0.6600156 b A 0.5174202 11 1970-01-01 0.7802974 0.5401236 b A 0.3684388 12 1970-01-01 0.5336272 0.9807934 b A 0.1269552 13 1970-01-01 0.9437837 0.6600156 a B 0.8040882 14 1970-01-01 0.7802974 0.5401236 a B 0.5405421 15 1970-01-01 0.5336272 0.9807934 a B 0.6914684 16 1970-01-01 0.9437837 0.6600156 b C 0.5707653 17 1970-01-01 0.7802974 0.5401236 b C 0.2421765 18 1970-01-01 0.5336272 0.9807934 b C 0.3035965