Question

我的数据目前按如下方式组织：

 X.1 State MN    X.2    WI    X.3
     NA    Price Pounds Price Pounds
Year NA    
1980 NA    56    23     56    96
1999 NA    41    63     56    65

我想把它转换成更像这样的东西：

Year State Price Pounds
1980 MN    56    23
1999 MN    41    63
1980 WI    56    96
1999 WI    56    65

是否有任何建议让某些R代码正确操作此数据？谢谢！

Answer 1

这需要一些操作才能使其成为可以重塑的格式。

df <- read.table(h=T, t=" X.1 State MN    X.2    WI    X.3
NA     NA    Price Pounds Price Pounds
Year NA    NA    NA     NA    NA
1980 NA    56    23     56    96
1999 NA    41    63     56    65")

df <- df[-2]

# Auto-process names; you should look at intermediate step results to see
# what's going on.  This would probably be better addressed with something
# like `na.locf` from `zoo` but this is all in base.  Note you can do something
# a fair bit simpler if you know you have the same number of items for each
# state, but this should be robust to different numbers.

df.names <- names(df)
df.names <- ifelse(grepl("X.[0-9]+", df.names), NA, df.names)
df.names[[1]] <- "Year"
df.names.valid <- Filter(Negate(is.na), df.names)
df.names[is.na(df.names)] <- df.names.valid[cumsum(!is.na(df.names))[is.na(df.names)]]
names(df) <- df.names

# rename again by adding Price/Pounds

names(df)[-1] <- paste(                                
  vapply(2:5, function(x) as.character(df[1, x]), ""), # need to do this because we're pulling across different factor columns
  names(df)[-1], 
  sep="."
)
df <- df[-(1:2),]   # Don't need rows 1:2 anymore
df

产地：

  Year Price.MN Pounds.MN Price.WI Pounds.WI
3 1980       56        23       56        96
4 1999       41        63       56        65

然后：

使用基础`reshape`：

reshape(df, direction="long", varying=2:5)

这基本上可以帮到你：

     Year time Price Pounds id
1.MN 1980   MN    56     23  1
2.MN 1999   MN    41     63  2
1.WI 1980   WI    56     96  1
2.WI 1999   WI    56     65  2

显然，您需要重命名某些列等，但这很简单。 reshape的关键点是列名很重要，因此我们以reshape可以使用的方式构建它们。

使用`reshape2::melt/cast`：

library(reshape2)
df.mlt <- melt(df, id.vars="Year")
df.mlt <- transform(df.mlt, 
  metric=sub("\\..*", "", variable), 
  state=sub(".*\\.", "", variable)
)
dcast(df.mlt[-2], Year + state ~ metric)

产生

  Year state Pounds Price
1 1980    MN     23    56
2 1980    WI     96    56
3 1999    MN     63    41
4 1999    WI     65    56

非常小心，Price和Pounds可能是因素，因为该列过去同时具有字符和数字值。您需要使用as.numeric(as.character(df$Price))转换为数字。

Answer 2

那是一个很好的挑战。它有很多strsplit和grep s，并且可能无法推广到整个数据集。或许它会，你永远不会知道。

> txt <- "X.1 State MN    X.2    WI    X.3
  NA    Price Pounds Price Pounds
  Year NA
  1980 NA    56    23     56    96
  1999 NA    41    63     56    65"
> 
> x <- textConnection(txt)
> y <- gsub("((X[.][0-9]{1})|NA)|\\s+", " ", readLines(x))
> z <- unlist(strsplit(y, "^\\s+"))
> a <- z[nzchar(z)]
> b <- unlist(strsplit(a, "\\s+"))
> nums <- as.numeric(grep("[0-9]", b[nchar(b) == 2], value = TRUE))
> Price = rev(nums[c(TRUE, FALSE)])
> pounds <- nums[-which(nums %in% Price)]
> data.frame(Year = rep(b[grepl("[0-9]{4}", b)], 2),
             State = unlist(lapply(b[grepl("[A-Z]{2}", b)], rep, 2)),
             Price = Price,
             Pounds = c(pounds[1], rev(pounds[2:3]), pounds[4]))
##   Year State Price Pounds
## 1 1980    MN    56     23
## 2 1999    MN    41     63
## 3 1980    WI    56     96
## 4 1999    WI    56     65

将R中的数据从列到行进行操作

2 个答案:

使用基础`reshape`：

使用`reshape2::melt/cast`：

将R中的数据从列到行进行操作

2 个答案:

使用基础reshape：

使用reshape2::melt/cast：

使用基础`reshape`：

使用`reshape2::melt/cast`：