想知道其他人如何应对这一挑战。
背景
数据用于植被监测。它包括基本的每个地块信息,并确定这些物种的种类和覆盖率。
有几行特定于绘图的信息 - 日期,位置,距离后面是物种行。在物种内,行数值包括由该列表示的图中物种的%覆盖率。
简化视图将是这样的网格:
plot 1 4 5
date 5/3/2016 6/20/2016 6/22/2016
location A F K
sp1 15 30
sp2 5 100
sp3 T 3 5
我希望得到的是这样的网格,它有助于将csv导入到数据库中( species%cover需要引用RMDB中的情节信息)。 最左列=表字段名称。
plot 1 1 4 4 5 5 5
date 5/3/2016 5/3/2016 6/20/2016 6/20/2016 6/22/2016 6/22/2016 6/22/2016
location A A F F K K K
species sp2 sp3 sp1 sp3 sp1 sp2 sp3
cover % 5 T 15 3 30 100 5
数据库可以很容易地“消化”这种宽格式,并正确填充两个表格(Plot& CoverPercent)。
途径吗
我想过几种方法,但我认为有一种更好的方式让我失踪。
这是我到目前为止所提出的:
将数据从长到大
添加species
和cover
行
计算给定地块的物种数量
根据物种数重复绘图列
填充剧情的“物种”和“掩盖”行
最初我以为我可以在VBA中做到这一点,但看起来R似乎是更好/更快/更清洁的方法。但问题是“如何”?
我最近用表包完成了一些R工作,但过去一年我在VBA / SQL项目上花了很多钱。
我很好奇别人会如何应对这种变化。有什么想法吗?
答案 0 :(得分:1)
我会使用OO方法。定义一个包含绘图和数据信息的简单类,并有一个物种和覆盖百分比字典:
'Plot.cls
Option Explicit
Private Type PlotMembers
PlotId As Long
DataDate As Date
Location As String
End Type
Private this As PlotMembers
Private mCover As Scripting.Dictionary
Private Sub Class_Initialize()
Set mCover = New Scripting.Dictionary
End Sub
Public Property Get PlotId() As Long
PlotId = this.PlotId
End Property
Public Property Let PlotId(inValue As Long)
this.PlotId = inValue
End Property
Public Property Get DataDate() As Date
DataDate = this.DataDate
End Property
Public Property Let DataDate(inValue As Date)
this.DataDate = inValue
End Property
Public Property Get Location() As String
Location = this.Location
End Property
Public Property Let Location(inValue As String)
this.Location = inValue
End Property
Public Sub AddSpeciesCover(species As String, cover As String)
mCover.Add species, cover
End Sub
然后给它一个属性,用于显示CSV数据行列表:
'Also in Plot.cls
Public Property Get CsvRows() As String
Dim key As Variant
Dim output() As String
ReDim output(mCover.Count - 1)
Dim i As Long
For Each key In mCover.Keys
Dim temp(4) As String
temp(0) = this.PlotId
temp(1) = this.DataDate
temp(2) = this.Location
temp(3) = key
temp(4) = mCover(key)
output(i) = Join(temp, ",")
i = i + 1
Next key
CsvRows = Join(output, vbCrLf)
End Property
然后,您需要做的就是从输入数据填充它们。请注意,此处的示例用法假定您问题中的顶部网格基本上看起来像A1左上角的活动工作表。更改此选项以匹配您需要收集数据的方式应该相当容易:
Public Sub SampleUsage()
Dim plots As New Collection
With ActiveSheet
Dim col As Long
For col = 2 To 4
Dim current As Plot
Set current = New Plot
current.PlotId = .Cells(1, col).Value
current.DataDate = .Cells(2, col).Value
current.Location = .Cells(3, col).Value
Dim r As Long
For r = 4 To 6
Dim cover As String
cover = .Cells(r, col).Value
If cover <> vbNullString Then
current.AddSpeciesCover .Cells(r, 1).Value, cover
End If
Next
plots.Add current
Next
End With
For Each current In plots
Debug.Print current.CsvRows
Next
End Sub
请注意,这只是一个演示方法要点的框架 - 它需要错误处理,更强大的格式化等等,以便生产就绪。
答案 1 :(得分:1)
使用reshape2包的melt()
方法简单地在R中重塑数据框。下面假设您发布的数据的转置视图是您在评论中提到的实际格式:
library(reshape2)
data = 'plot date location sp1 sp2 sp3
1 5/3/2016 A 5 T
4 6/20/2016 F 15 3
5 6/22/2016 K 30 100 5'
df <- read.table(text=data, header=TRUE, sep="\t", stringsAsFactors = FALSE)
df
# plot date location sp1 sp2 sp3
# 1 1 5/3/2016 A NA 5 T
# 2 4 6/20/2016 F 15 NA 3
# 3 5 6/22/2016 K 30 100 5
mdf <- melt(df, id.vars=c("plot", "date", "location"),
variable.name="species", na.rm = TRUE, value.name="cover %")
mdf <- mdf[with(mdf, order(date)),] # ORDER BY DATE
rownames(mdf) <- seq_len(nrow(mdf)) # RESET ROW NAMES
mdf
# plot date location species cover %
# 1 1 5/3/2016 A sp2 5
# 2 1 5/3/2016 A sp3 T
# 3 4 6/20/2016 F sp1 15
# 4 4 6/20/2016 F sp3 3
# 5 5 6/22/2016 K sp1 30
# 6 5 6/22/2016 K sp2 100
# 7 5 6/22/2016 K sp3 5