将R中的数据从列到行进行操作

时间:2014-05-01 15:53:28

标签: r

我的数据目前按如下方式组织:

 X.1 State MN    X.2    WI    X.3
     NA    Price Pounds Price Pounds
Year NA    
1980 NA    56    23     56    96
1999 NA    41    63     56    65

我想把它转换成更像这样的东西:

Year State Price Pounds
1980 MN    56    23
1999 MN    41    63
1980 WI    56    96
1999 WI    56    65

是否有任何建议让某些R代码正确操作此数据? 谢谢!

2 个答案:

答案 0 :(得分:1)

这需要一些操作才能使其成为可以重塑的格式。

df <- read.table(h=T, t=" X.1 State MN    X.2    WI    X.3
NA     NA    Price Pounds Price Pounds
Year NA    NA    NA     NA    NA
1980 NA    56    23     56    96
1999 NA    41    63     56    65")

df <- df[-2]

# Auto-process names; you should look at intermediate step results to see
# what's going on.  This would probably be better addressed with something
# like `na.locf` from `zoo` but this is all in base.  Note you can do something
# a fair bit simpler if you know you have the same number of items for each
# state, but this should be robust to different numbers.

df.names <- names(df)
df.names <- ifelse(grepl("X.[0-9]+", df.names), NA, df.names)
df.names[[1]] <- "Year"
df.names.valid <- Filter(Negate(is.na), df.names)
df.names[is.na(df.names)] <- df.names.valid[cumsum(!is.na(df.names))[is.na(df.names)]]
names(df) <- df.names

# rename again by adding Price/Pounds

names(df)[-1] <- paste(                                
  vapply(2:5, function(x) as.character(df[1, x]), ""), # need to do this because we're pulling across different factor columns
  names(df)[-1], 
  sep="."
)
df <- df[-(1:2),]   # Don't need rows 1:2 anymore
df

产地:

  Year Price.MN Pounds.MN Price.WI Pounds.WI
3 1980       56        23       56        96
4 1999       41        63       56        65

然后:

使用基础reshape

reshape(df, direction="long", varying=2:5)

这基本上可以帮到你:

     Year time Price Pounds id
1.MN 1980   MN    56     23  1
2.MN 1999   MN    41     63  2
1.WI 1980   WI    56     96  1
2.WI 1999   WI    56     65  2

显然,您需要重命名某些列等,但这很简单。 reshape的关键点是列名很重要,因此我们以reshape可以使用的方式构建它们。

使用reshape2::melt/cast

library(reshape2)
df.mlt <- melt(df, id.vars="Year")
df.mlt <- transform(df.mlt, 
  metric=sub("\\..*", "", variable), 
  state=sub(".*\\.", "", variable)
)
dcast(df.mlt[-2], Year + state ~ metric)

产生

  Year state Pounds Price
1 1980    MN     23    56
2 1980    WI     96    56
3 1999    MN     63    41
4 1999    WI     65    56

非常小心,PricePounds可能是因素,因为该列过去同时具有字符和数字值。您需要使用as.numeric(as.character(df$Price))转换为数字。

答案 1 :(得分:0)

那是一个很好的挑战。它有很多strsplitgrep s,并且可能无法推广到整个数据集。或许它会,你永远不会知道。

> txt <- "X.1 State MN    X.2    WI    X.3
  NA    Price Pounds Price Pounds
  Year NA
  1980 NA    56    23     56    96
  1999 NA    41    63     56    65"
> 
> x <- textConnection(txt)
> y <- gsub("((X[.][0-9]{1})|NA)|\\s+", " ", readLines(x))
> z <- unlist(strsplit(y, "^\\s+"))
> a <- z[nzchar(z)]
> b <- unlist(strsplit(a, "\\s+"))
> nums <- as.numeric(grep("[0-9]", b[nchar(b) == 2], value = TRUE))
> Price = rev(nums[c(TRUE, FALSE)])
> pounds <- nums[-which(nums %in% Price)]
> data.frame(Year = rep(b[grepl("[0-9]{4}", b)], 2),
             State = unlist(lapply(b[grepl("[A-Z]{2}", b)], rep, 2)),
             Price = Price,
             Pounds = c(pounds[1], rev(pounds[2:3]), pounds[4]))
##   Year State Price Pounds
## 1 1980    MN    56     23
## 2 1999    MN    41     63
## 3 1980    WI    56     96
## 4 1999    WI    56     65