从CSV读取数据并在r中重新整形

时间:2013-10-08 00:02:52

标签: r csv reshape

我有每个月从1980年到2004年的数据集(下面给出了部分内容),但我不知道如何从CSV读取它并将其转换为具有以下形式的矩阵:data [lat,lon ,时间]从1到(2004-1980)* 12

的时间

enter image description here ...

3 个答案:

答案 0 :(得分:2)

数据已存在于.rda数据文件中,因此读取数据很容易。从干净的工作区开始,执行以下操作:

load("fedfire8004.rda")
ls()                  ## What objects were read in?
# [1] "fedfire8004"
str(fedfire8004)      ## What does that object look like?
# List of 10
# $ lon  : num [1:24] -124 -124 -122 -122 -120 ...
# $ lat  : num [1:18] 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5 40.5 ...
# $ x    : num [1:25] -125 -124 -123 -122 -121 -120 -119 -118 -117 -116 ...
# $ y    : num [1:19] 31 32 33 34 35 36 37 38 39 40 ...
# $ year : int [1:300] 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 ...
# $ month: int [1:300] 1 2 3 4 5 6 7 8 9 10 ...
# $ acres: num [1:24, 1:18, 1:300] NA NA NA NA NA NA NA NA NA NA ...
# ..- attr(*, "dimnames")=List of 3
# .. ..$ lon  : chr [1:24] "-124.5" "-123.5" "-122.5" "-121.5" ...
# .. ..$ lat  : chr [1:18] "31.5" "32.5" "33.5" "34.5" ...
# .. ..$ month: chr [1:300] "1980.1" "1980.2" "1980.3" "1980.4" ...
# $ fires: num [1:24, 1:18, 1:300] NA NA NA NA NA NA NA NA NA NA ...
# ..- attr(*, "dimnames")=List of 3
# .. ..$ lon  : chr [1:24] "-124.5" "-123.5" "-122.5" "-121.5" ...
# .. ..$ lat  : chr [1:18] "31.5" "32.5" "33.5" "34.5" ...
# .. ..$ month: chr [1:300] "1980.1" "1980.2" "1980.3" "1980.4" ...
# $ meta : chr "USFS, NPS, BLM, BIA total fires and acres on 1 degree monthly grid 1980-2004"
# $ cite : chr "Westerling, A.L., T.J. Brown, A. Gershunov, D.R. Cayan and M.D. Dettinger, 2003: Climate and Wildfire in the Western United Sta"| __truncated__

如您所见,核心数据似乎是acresfires列表项。将它们重塑为long数据集可能更方便。最直接的方法可能是“reshape2”包中的melt

library(reshape2)
Acres <- melt(fedfire8004$acres)
Fires <- melt(fedfire8004$fires)

让我们查看每个新对象的前几行和最后几行。

head(Acres)
#      lon  lat  month value
# 1 -124.5 31.5 1980.1    NA
# 2 -123.5 31.5 1980.1    NA
# 3 -122.5 31.5 1980.1    NA
# 4 -121.5 31.5 1980.1    NA
# 5 -120.5 31.5 1980.1    NA
# 6 -119.5 31.5 1980.1    NA
tail(Acres)
#           lon  lat   month value
# 129595 -106.5 48.5 2004.12     0
# 129596 -105.5 48.5 2004.12     0
# 129597 -104.5 48.5 2004.12    71
# 129598 -103.5 48.5 2004.12    NA
# 129599 -102.5 48.5 2004.12    NA
# 129600 -101.5 48.5 2004.12    NA
head(Fires)
#      lon  lat  month value
# 1 -124.5 31.5 1980.1    NA
# 2 -123.5 31.5 1980.1    NA
# 3 -122.5 31.5 1980.1    NA
# 4 -121.5 31.5 1980.1    NA
# 5 -120.5 31.5 1980.1    NA
# 6 -119.5 31.5 1980.1    NA
tail(Fires)
#           lon  lat   month value
# 129595 -106.5 48.5 2004.12     0
# 129596 -105.5 48.5 2004.12     0
# 129597 -104.5 48.5 2004.12     2
# 129598 -103.5 48.5 2004.12    NA
# 129599 -102.5 48.5 2004.12    NA
# 129600 -101.5 48.5 2004.12    NA

答案 1 :(得分:0)

您应该(始终)尝试重新组织数据,以便每列包含一种类型的信息:

Year  Month  Lat  Lon  Value

python脚本可能是执行此操作的最佳方式...一旦您使用此样式,就可以轻松地在R中导入和分析。

我制作了一个脚本,可以为您重新组织数据...但是目前还不清楚是否可以轻松运行它。你在用什么系统?

这是脚本......输出低于......

#!/usr/bin/env python
import csv

file_obj = open('originaldata.txt', 'r')
Input = csv.reader(file_obj, delimiter='\t')

LineNo = 0
year,month,data = [],[],[]
for items in Input:
    if LineNo == 0:
        lat = items[2:]
    elif LineNo == 1:
        lon = items[2:]
    else:
        year.append(items[0])
        month.append(items[1])
        data.append(items[2:])
    LineNo += 1

# print header
print "%s\t%s\t%s\t%s\t%s"% ("Year","Month","Lat","Lon","Data")
for La,Lo,Ind in zip(lat,lon,range(len(lat))):
    for Y,M,D in zip(year,month,data):
        print "%s\t%s\t%s\t%s\t%s"% (Y,M,La,Lo,D[Ind])

脚本输出:

Year  Month  Lat     Lon    Data
1980    1   31.5    -111.5  0
1980    2   31.5    -111.5  0
1980    3   31.5    -111.5  0
1980    4   31.5    -111.5  0
1980    5   31.5    -111.5  8.1
1980    6   31.5    -111.5  5.1
1980    7   31.5    -111.5  0
1980    8   31.5    -111.5  0
1980    9   31.5    -111.5  0
1980    10  31.5    -111.5  0
1980    11  31.5    -111.5  0
1980    12  31.5    -111.5  0
1981    1   31.5    -111.5  0
1981    2   31.5    -111.5  0
1981    3   31.5    -111.5  0
1981    4   31.5    -111.5  0
1981    5   31.5    -111.5  0
1981    6   31.5    -111.5  0
1981    7   31.5    -111.5  0
1981    8   31.5    -111.5  0
1981    9   31.5    -111.5  0
1981    10  31.5    -111.5  0
1981    11  31.5    -111.5  0
1981    12  31.5    -111.5  0
1980    1   31.5    -110.5  0
1980    2   31.5    -110.5  0
1980    3   31.5    -110.5  0
1980    4   31.5    -110.5  881
1980    5   31.5    -110.5  794.1
1980    6   31.5    -110.5  644.4
1980    7   31.5    -110.5  85.2
1980    8   31.5    -110.5  0.1
1980    9   31.5    -110.5  0
1980    10  31.5    -110.5  0
1980    11  31.5    -110.5  0
1980    12  31.5    -110.5  0
1981    1   31.5    -110.5  0
1981    2   31.5    -110.5  0
1981    3   31.5    -110.5  0
1981    4   31.5    -110.5  0
1981    5   31.5    -110.5  0
1981    6   31.5    -110.5  0
1981    7   31.5    -110.5  0
1981    8   31.5    -110.5  0
1981    9   31.5    -110.5  0
1981    10  31.5    -110.5  0

答案 2 :(得分:0)

轻松加载

meaningful.name<-read.csv(file.choose(new = FALSE))
meaningful.name<-as.matrix(meaningful.name)
meaningful.name$time<-1:nrow(meaningful.name)

之后,我不知道你在追求什么,请你澄清一下吗?