Question

我搜索了很多地方（stackoverflow，r-blogger等），但是在R中做到这一点并没有找到一个很好的选择。希望有人有一些想法。

我有一套环境采样数据。数据包括各种字段（访问日期，区域，位置，样本介质，样本组件，结果等）。

这是相关领域的一个子集。这是我开始的地方......

visit_date   region    location     media      component     result
1990-08-20   LAKE      555723       water       Mg            *Nondetect
1999-07-01   HILL      432422       water       Ca            3.2
2010-09-12   LAKE      555723       water       pH            6.8
2010-09-12   LAKE      555723       water       Mg            2.1
2010-09-12   HILL      432423       water       pH            7.2
2010-09-12   HILL      432423       water       N             0.8
2010-09-12   HILL      432423       water       NH4          112

我希望达到的是这样的表/数据框：

visit_date   region    location     media      component     result        pH
1990-08-20   LAKE      555723       water       Mg            *Nondetect  *Not recorded
1999-07-01   HILL      432422       water       Ca            3.2         *Not recorded
2010-09-12   LAKE      555723       water       pH            6.8         6.8
2010-09-12   LAKE      555723       water       Mg            2.1         6.8
2010-09-12   HILL      432423       water       pH            7.2         7.2
2010-09-12   HILL      432423       water       N             0.8         7.2
2010-09-12   HILL      432423       water       NH4          112          7.2

我试图在这里使用这个方法 - R finding rows of a data frame where certain columns match those of another - 但遗憾的是没有得到我想要的结果。相反，pH柱是我预先填充的值-999或NA，而不是该特定访问日期的pH值（如果已收集）。由于结果数据集大约为500k记录，因此我使用unique(tResult$pH)来确定pH列的值。

在这里尝试。 res是原始结果data.frame和component将是pH结果子集（pH样本来自主要结果表）。

keys <- c("region", "location", "visit_date", "media")

tResults <- data.table(res, key=keys)
tComponent <- data.table(component, key=keys)

tResults[tComponent, pH>0]

我尝试在原始数据框架上使用match，merge和within但未成功。从那时起，我已经为组件（本例中的pH）生成了一个子集，我将结果列复制到新的＆＃34; pH＆＃34;专栏，认为我可以匹配键并更新新的＆＃34; pH＆＃34;主结果集中的列。

由于并非所有结果值都是数字的（值为*Not recorded），我尝试使用像-888这样的数字或其他可以替代的值，因此我可以强制至少将结果和pH值列为数字。除了POSIXct个值的日期，其余列为character列。原始数据框架是使用StringsAsFactors=FALSE创建的。

一旦我能够做到这一点，我就能够为其他组件生成类似的列，这些列可用于填充和计算给定样本的其他值。至少这是我的目标。

所以我很难过这个。在我看来它应该很容易，但我肯定没有看到它！

您的帮助和想法当然受到欢迎和赞赏！

Answer 1

#df1 is your first data set and is dataframe
df1$phtem<-with(df1,ifelse(component=="pH",result,NA))

library(data.table)
library(zoo) # locf function

setDT(df1)[,pH:=na.locf(phtem,na.rm = FALSE)]
    visit_date region location media component     result phtem  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect    NA  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2    NA  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8   6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1    NA 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2   7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8    NA 7.2
7: 2010-09-12   HILL   432423 water       NH4        112    NA 7.2

＃如果您不需要，可以删除\ nttem。

编辑：

library(data.table)
setDT(df1)[,pH:=result[component=="pH"],by="region,location,visit_date,media"]
df1

   visit_date region location media component     result  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8 7.2
7: 2010-09-12   HILL   432423 water       NH4        112 7.2

如何在匹配R中的其他列时将特定值从一个数据列复制到另一个数据列？

1 个答案: