Question

我的df RawDat有两行ID, data。我希望通过id使用例如grep（）我的数据。 lapply（）生成一个新的df，其中数据按其id排序到列中：我的df看起来像这样，除了我有> 80000行和75个ID：

ID data abl 564 dlh 78 vho 354 mez 15 abl 662 dlh 69 vho 333 mez 9 . . .

我可以使用grep（）函数手动提取数据：

ExtRawDat = as.data.frame(RawDat[grep("abl",RawDat$ID),])

然而，我不想这样做75次并且cbind（）他们。相反，我想使用lapply（）函数来自动化它。我已经尝试了以下代码的几种变体，但我没有得到提供所需输出的脚本。

我有一个带有75个ProLisV的向量来循环我的参数

ExtRawDat = as.data.frame(lapply(ProLisV[1:75],function(x){     
Temp1 = RawDat[grep(x,RawDat$ID),]      # The issue is here, the pattern is not properly defined with the X input (is it detrimental that some of the names in the list having spaces etc.?)
Values = as.data.frame(Temp1$data)
list(Values$data)
}))

所需的输出如下所示：

abl dlh vho mez ... 564 78 354 15 662 69 333 9 . . .

如何调整该功能以提供所需的输出？谢谢。

Answer 1

看起来您要做的就是将数据从长格式转换为宽格式。轻松完成此操作的一种方法是使用spread包中的tidyr函数。要使用它，我们需要一列remove duplicate identifiers，因此我们首先添加一个分组变量：

n.ids <- 4 # With your full data this should be 75
df$group <- rep(1:n.ids, each = n.ids, length.out = nrow(df))
tidyr::spread(df, ID, data)

#   group abl dlh mez vho
# 1     1 564  78  15 354
# 2     2 662  69   9 333

如果您不想在结尾处使用group列，请执行df$group <- NULL。

数据

df <- read.table(text = " ID data abl 564 dlh 78 vho 354 mez 15 abl 662 dlh 69 vho 333 mez 9", header = T)

如何通过id

1 个答案: