对列表中的数据框列执行操作 (R)

时间:2021-05-17 07:32:32

标签: r database list dataframe

我正在尝试在列表内的数据帧的列中执行操作。

这是我列表中的数据框

enter image description here

> dput(wtr_complete[[1]])
structure(list(date = c("2010-03-02T00:00:00", "2010-03-03T00:00:00", 
"2010-03-04T00:00:00", "2010-03-05T00:00:00", "2010-03-06T00:00:00", 
"2010-03-07T00:00:00", "2010-03-08T00:00:00", "2010-03-09T00:00:00", 
"2010-03-10T00:00:00", "2010-03-11T00:00:00", "2010-03-12T00:00:00", 
"2010-03-13T00:00:00", "2010-03-14T00:00:00", "2010-03-15T00:00:00", 
"2010-03-16T00:00:00", "2010-03-17T00:00:00", "2010-03-18T00:00:00", 
"2010-03-19T00:00:00", "2010-03-20T00:00:00", "2010-03-21T00:00:00", 
"2010-03-22T00:00:00", "2010-03-23T00:00:00", "2010-03-24T00:00:00", 
"2010-03-25T00:00:00", "2010-03-26T00:00:00", "2011-01-01T00:00:00", 
"2011-01-02T00:00:00", "2011-01-03T00:00:00", "2011-01-04T00:00:00", 
"2011-01-05T00:00:00", "2011-01-06T00:00:00", "2011-01-07T00:00:00", 
"2011-01-08T00:00:00", "2011-01-09T00:00:00", "2011-01-10T00:00:00", 
"2011-01-11T00:00:00", "2011-01-12T00:00:00", "2011-01-13T00:00:00", 
"2011-01-14T00:00:00", "2011-01-15T00:00:00", "2011-01-16T00:00:00", 
"2011-01-17T00:00:00", "2011-01-18T00:00:00", "2011-01-19T00:00:00", 
"2011-01-20T00:00:00", "2011-01-21T00:00:00", "2011-01-22T00:00:00", 
"2011-01-23T00:00:00", "2011-01-24T00:00:00", "2011-01-25T00:00:00", 
"2012-01-01T00:00:00", "2012-01-02T00:00:00", "2012-01-03T00:00:00", 
"2012-01-04T00:00:00", "2012-01-05T00:00:00", "2012-01-06T00:00:00", 
"2012-01-07T00:00:00", "2012-01-08T00:00:00", "2012-01-09T00:00:00", 
"2012-01-10T00:00:00", "2012-01-11T00:00:00", "2012-01-12T00:00:00", 
"2012-01-13T00:00:00", "2012-01-14T00:00:00", "2012-01-15T00:00:00", 
"2012-01-16T00:00:00", "2012-01-17T00:00:00", "2012-01-18T00:00:00", 
"2012-01-19T00:00:00", "2012-01-20T00:00:00", "2012-01-21T00:00:00", 
"2012-01-22T00:00:00", "2012-01-23T00:00:00", "2012-01-24T00:00:00", 
"2012-01-25T00:00:00", "2013-01-01T00:00:00", "2013-01-02T00:00:00", 
"2013-01-03T00:00:00", "2013-01-04T00:00:00", "2013-01-05T00:00:00", 
"2013-01-06T00:00:00", "2013-01-07T00:00:00", "2013-01-08T00:00:00", 
"2013-01-09T00:00:00", "2013-01-10T00:00:00", "2013-01-11T00:00:00", 
"2013-01-12T00:00:00", "2013-01-13T00:00:00", "2013-01-14T00:00:00", 
"2013-01-15T00:00:00", "2013-01-16T00:00:00", "2013-01-17T00:00:00", 
"2013-01-18T00:00:00", "2013-01-19T00:00:00", "2013-01-20T00:00:00", 
"2013-01-21T00:00:00", "2013-01-22T00:00:00", "2013-01-23T00:00:00", 
"2013-01-24T00:00:00", "2013-01-25T00:00:00", "2014-01-01T00:00:00", 
"2014-01-02T00:00:00", "2014-01-03T00:00:00", "2014-01-04T00:00:00", 
"2014-01-05T00:00:00", "2014-01-06T00:00:00", "2014-01-07T00:00:00", 
"2014-01-08T00:00:00", "2014-01-09T00:00:00", "2014-01-10T00:00:00", 
"2014-01-11T00:00:00", "2014-01-12T00:00:00", "2014-01-13T00:00:00", 
"2014-01-14T00:00:00", "2014-01-15T00:00:00", "2014-01-16T00:00:00", 
"2014-01-17T00:00:00", "2014-01-18T00:00:00", "2014-01-19T00:00:00", 
"2014-01-20T00:00:00", "2014-01-21T00:00:00", "2014-01-22T00:00:00", 
"2014-01-23T00:00:00", "2014-01-24T00:00:00", "2014-01-25T00:00:00"
), station = c("GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156"), value = c(0L, 0L, 
0L, 0L, 0L, 64L, 26L, 21L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 161L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 8L, 8L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 
-125L), class = c("tbl_df", "tbl", "data.frame"))

并且该操作包括在新列(编号 4)中创建数据框第二列中名称的子字符串。我正在使用以下代码:

wtr_complete[[1]][4] <-  substring(wtr_complete[[1]][2],7)

但是当我进入第 4 列时它无法正常工作:

enter image description here

知道如何对列表中数据框的列执行操作吗?

3 个答案:

答案 0 :(得分:0)

这应该适合你:

df <- wtr_complete[[1]]
df$newcol <-  substring(df$station,7)
wtr_complete[[1]] <- df

直接从列表中

wtr_complete[[1]]$newcol <- substring(wtr_complete[[1]]$station,7)

答案 1 :(得分:0)

感谢您整理了一个可重现的示例。您非常接近您想要的解决方案。

在学习 R 时需要注意的一件事是引用您想要使用的值与引用包含这些值的事物之间的区别。这是其中一种情况。

当您编写 wtr_complete[[1]][2] 时,R 返回一个 data.frame,其中仅包含第二列。当您使用略有不同的语法 wtr_complete[[1]][,2] 时,R 会返回一个字符向量,其中包含您要使用的实际值。区别仅在于那个逗号。逗号是 R 对 data.frame 进行子集化的语法,[,2] 表示:“只是第 2 列中的所有值”。

您得到的奇怪输出是因为您将整个 data.frame 传递给 substring,而不仅仅是它所要求的字符向量。反过来,substring 所做的是在执行子字符串操作之前将您的 data.frame 转换为字符向量。请注意,这会生成一个长度为 1 的字符向量,其中所有值都混在一起。

as.character(wtr_complete[[1]][2])
[1] "c(\"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\",

一些非常奇怪的输出,对吧?

你想要做的是用调用substring

wtr_complete[[1]][4] <- substring(wtr_complete[[1]][,2], 7)

你应该得到如下结果:

                   date           station value          V4
1   2010-03-02T00:00:00 GHCND:USW00053156     0 USW00053156

注意:您会看到,这会为您的新列命名为“V4”。运行此操作的总体更好的方法是给新列一个名称,按名称引用第二列,这样更安全:

wtr_complete[[1]]$mynewcol <- substring(wtr_complete[[1]]$station, 7)

答案 2 :(得分:0)

wtr_complete[[1]][2]wtr_complete[[1]][,2] 都不是字符向量,

您可以使用:

wtr_complete[[1]]$newcol <-  substring(wtr_complete[[1]][[2]],7)
相关问题