Question

我有一个如下所示的数据框：

*VarName1* - *VarValue1*
*VarName2* - *VarValue2*
*Etc.*

在实践中，它看起来像这样的事情：

nmlVar     - noFloat

Date-Batch - 2011020147
Weight     - 10
Length     - 5 
Height     - 8
Date-Batch - 2011020148
Weight     - 10.3
Length     - 6 
Height     - 8
Date-Batch - 2011020147
Weight     - 10
Length     - 5 
Height     - 8

我准备以这样的方式组织数据，以便我可以将其用于分析。我已经在这篇文章中找到了如何将行转置为列：Transposing rows into columns, then split them

我使用此代码进行转置：

library(dplyr)
library(tidyr)
DFP %>% 
  mutate(sample = cumsum(nmlVar == 'Batch')) %>% 
  spread(nmlVar, noFloat)

我想做同样的事情，但是在上面的函数中使用“Date-Batch”变量作为关键变量。这是必需的，因为这是另一个数据框中使用的密钥，我想合并它们。

问题是这个Date-Batch变量并不总是具有唯一值（检查第一次和第三次出现）。我试图找到一个删除相同日期 - 批处理值的每一秒出现的函数。

我试图用'编程词'来描述它：

FOR日期批量IN nmlVar IF重复DELETE次出现

我不知道这是否是最好的方法，或者你可以用其他方式安排我。

Answer 1

取决于您所称的重复内容：

library(dplyr)
library(tidyr)
DFP %>% 
  mutate(sample = cumsum(nmlVar == 'Date-Batch')) %>% 
  spread(nmlVar, noFloat) %>%
  select(-sample) %>%
  filter(!duplicated(.))

DFP %>% 
  mutate(sample = cumsum(nmlVar == 'Date-Batch')) %>% 
  spread(nmlVar, noFloat) %>%
  select(-sample) %>%
  filter(!duplicated(`Date-Batch`))

在这种情况下，

两者的输出：

#   Date-Batch Height Length Weight
# 1 2011020147      8      5   10.0
# 2 2011020148      8      6   10.3

数据

DFP <- read.table(text="nmlVar noFloat Date-Batch 2011020147 Weight 10 Length 5 Height 8 Date-Batch 2011020148 Weight 10.3 Length 6 Height 8 Date-Batch 2011020147 Weight 10 Length 5 Height 8",header=T)

仅删除某些行

1 个答案: