将行转换为R中的列

时间:2015-12-01 12:07:15

标签: r transform multiple-columns melt

我的Dataframe

> head(scotland_weather)
    JAN Year.1   FEB Year.2   MAR Year.3   APR Year.4   MAY Year.5   JUN Year.6   JUL Year.7   AUG Year.8   SEP Year.9   OCT Year.10
1 293.8   1993 278.1   1993 238.5   1993 191.1   1947 191.4   2011 155.0   1938 185.6   1940 216.5   1985 267.6   1950 258.1    1935
2 292.2   1928 258.8   1997 233.4   1990 149.0   1910 168.7   1986 137.9   2002 181.4   1988 211.9   1992 221.2   1981 254.0    1954
3 275.6   2008 244.7   2002 201.3   1992 146.8   1934 155.9   1925 137.8   1948 170.1   1939 202.3   2009 193.9   1982 248.8    2014
4 252.3   2015 227.9   1989 200.2   1967 142.1   1949 149.5   2015 137.7   1931 165.8   2010 191.4   1962 189.7   2011 247.7    1938
5 246.2   1974 224.9   2014 180.2   1979 133.5   1950 137.4   2003 135.0   1966 162.9   1956 190.3   2014 189.7   1927 242.3    1983
6 245.0   1975 195.6   1995 180.0   1989 132.9   1932 129.7   2007 131.7   2004 159.9   1985 189.1   2004 189.6   1985 240.9    2001
    NOV Year.11   DEC Year.12   WIN Year.13   SPR Year.14   SUM Year.15   AUT Year.16    ANN Year.17
1 262.0    2009 300.7    2013 743.6    2014 409.5    1986 455.6    1985 661.2    1981 1886.4    2011
2 244.8    1938 268.5    1986 649.5    1995 401.3    2015 435.6    1948 633.8    1954 1828.1    1990
3 242.2    2006 267.2    1929 645.4    2000 393.7    1994 427.8    2009 615.8    1938 1756.8    2014
4 231.3    1917 265.4    2011 638.3    2007 393.2    1967 422.6    1956 594.5    1935 1735.8    1938
5 229.9    1981 264.0    2006 608.9    1990 391.7    1992 397.0    2004 590.6    1982 1720.0    2008
6 224.9    1951 261.0    1912 592.8    2015 389.1    1913 390.1    1938 589.2    2006 1716.5    1954

Year.X列不是ordered。我希望将其转换为以下格式:

month    year      rainfall_mm
Jan      1993       293.8
Feb      1993       278.1
Mar      1993       238.5
...
Nov      2015       230.0

我尝试了t(),但它将year列分开。

还尝试了reshape2 recast(data, formula, ..., id.var, measure.var),但遗漏了一些内容。因为monthYear.X列都是numericint

> str(scotland_weather)
'data.frame':   106 obs. of  34 variables:
 $ JAN    : num  294 292 276 252 246 ...
 $ Year.1 : int  1993 1928 2008 2015 1974 1975 2005 2007 1990 1983 ...
 $ FEB    : num  278 259 245 228 225 ...
 $ Year.2 : int  1990 1997 2002 1989 2014 1995 1998 2000 1920 1918 ...
 $ MAR    : num  238 233 201 200 180 ...
 $ Year.3 : int  1994 1990 1992 1967 1979 1989 1921 1913 2015 1978 ...
 $ APR    : num  191 149 147 142 134 ...

2 个答案:

答案 0 :(得分:2)

基于“苏格兰天气”中交替列的模式。对于' YearX'列,一种方法是使用c(TRUE, FALSE)通过回收选择备用列,类似于seq(1, ncol(scotland_weather), by =2)。使用c(FALSE, TRUE),我们得到seq(2, ncol(scotland_weather), by =2)。这对于提取这些列,获取转置(t)和连接(c)到向量非常有用。完成此操作后,下一步将提取不是“年”的列名。为此,可以使用grep。然后,我们使用data.frame将向量绑定到data.frame

res <- data.frame(month= names(scotland_weather)[!grepl('Year', 
    names(scotland_weather))], year=c(t(scotland_weather[c(FALSE,TRUE)])), 
     rainfall_mm= c(t(scotland_weather[c(TRUE,FALSE)])))

head(res,4)
#  month year rainfall_mm
#1   JAN 1993       293.8
#2   FEB 1993       278.1
#3   MAR 1993       238.5
#4   APR 1947       191.1

答案 1 :(得分:0)

你遇到的问题不仅是你需要转换你的数据,你还会遇到第一列的年份在第二列,第三列的年份在第四列的问题,等等...... 这是使用tidyr的解决方案。

library(tidyr)

match <- Vectorize(function(x,y) grep(x,names(df)) - grep(y,names(df) == 1))

years <- grep("Year",names(scotland_weather))

df %>% gather("month","rainfall_mm",-years) %>%
       gather("yearname","year",-c(months,time)) %>% 
       filter(match(month,yearname)) %>%
       select(-yearname)