读取r中的数据集并更改格式

时间:2013-11-17 03:24:35

标签: r reshape

我有一个美国降水数据集。它有纬度,经度和降雨量。它具有以下格式:

       lon -124   -125  -126 -127 -128
   lat 45  120   110    NA   230  145
       44  NA    130    205  240  195
       43  120   110    NA   235  185
       42  170   140    204   NA  155

这是数据集的链接:

https://www.dropbox.com/s/1xxy2ospr9xvy8n/Pmaxupscaled.csv

我想使用R:

将其转换为此格式
    precipitation  lat   lon       
    120            45    -124
    110            45    -125
    NA             45    -126

3 个答案:

答案 0 :(得分:0)

您可能需要使用'reshape2'包 -

library(reshape2)
df<- read.table(textConnection('
lat -124   -125  -126 -127 -128
45  120   110    NA   230  145
44  NA    130    205  240  195
43  120   110    NA   235  185
42  170   140    204   NA  155'), header = TRUE)

df2 <- melt(df, id.vars = 'lat')

您还需要获取原始列名称的变通方法。 R不喜欢数字列名。在这种特定情况下,这可能是一种解决方法 -

 df2$variable  <- gsub(x = df2$variable, pattern = "X\\.", replacement = "")

答案 1 :(得分:0)

我不是超级reshape2用户,但这对我有用。

library(reshape2)
a <- read.csv("~/Documents/Pmaxupscaled.csv")

# Convert to matrix
a <- as.matrix(a)

# Replace row names with the values from the first column
dimnames(a)[[1]] <- a[, 1]

# Drop the first column
a <- a[, -1]

# Melt the matrix into a data frame.
b <- melt(a, varnames = c("Lat", "Lon"))

# Get rid of "X."
b$Lon <- gsub("X\\.", "", b$Lon)

# Format longitude as negative number
b$Lon <- as.numeric(b$Lon)
b$Lon <- -1 * b$Lon

# Rename precipitation column
names(b)[3] <- "precipitation"

答案 2 :(得分:0)

由于答案超过了“reshape2”的最大值,这里是基数R中的一个选项:

a <- read.csv("path/to/Pmaxupscaled.csv", check.names = FALSE)
out <- cbind(lat = a[, 1], setNames(stack(a[-1]), c("precip", "lon")))
head(out)
#    lat precip  lon
# 1 45.0  77.63 -105
# 2 42.5  76.15 -105
# 3 40.0  72.18 -105
# 4 37.5  78.60 -105
# 5 35.0  80.93 -105
# 6 32.5  87.29 -105
tail(out)
#      lat precip lon
# 99  40.0 136.05 -75
# 100 37.5     NA -75
# 101 35.0     NA -75
# 102 32.5     NA -75
# 103 30.0     NA -75
# 104 27.5     NA -75

为了记录,这将是我使用“reshape2”的方法:

library(reshape2)
a <- read.csv("path/to/Pmaxupscaled.csv", check.names = FALSE)
names(a)[1] <- "lat"
out <- melt(a, id.vars="lat", value.name="precip", variable.name="lon")
head(out)
#    lat precip  lon
# 1 45.0  77.63 -105
# 2 42.5  76.15 -105
# 3 40.0  72.18 -105
# 4 37.5  78.60 -105
# 5 35.0  80.93 -105
# 6 32.5  87.29 -105