给出一个包含多个元素的列表,目标是将它们放入数据框。 purr软件包中的map_df
函数对于常规列表非常有用,但对于不规则列表会产生错误。
例如,按照this教程进行以下工作:
library(purrr)
library(repurrrsive) # The data comes from this package
map_dfr(got_chars, magrittr::extract, c("name", "culture", "gender", "id", "born", "alive"))
A tibble: 30 x 6
name culture gender id born alive
<chr> <chr> <chr> <int> <chr> <lgl>
1 Theon Greyjoy Ironborn Male 1022 In 278 AC or 279 AC, at Pyke TRUE
2 Tyrion Lannister "" Male 1052 In 273 AC, at Casterly Rock TRUE
3 Victarion Greyjoy Ironborn Male 1074 In 268 AC or before, at Pyke TRUE
4 Will "" Male 1109 "" FALSE
5 Areo Hotah Norvoshi Male 1166 In 257 AC or before, at Norvos TRUE
6 Chett "" Male 1267 At Hag's Mire FALSE
7 Cressen "" Male 1295 In 219 AC or 220 AC FALSE
8 Arianne Martell Dornish Female 130 In 276 AC, at Sunspear TRUE
9 Daenerys Targaryen Valyrian Female 1303 In 284 AC, at Dragonstone TRUE
10 Davos Seaworth Westeros Male 1319 In 260 AC or before, at King's Landing TRUE
# … with 20 more rows
但是,如果将元素从列表中删除,则该函数将失败。
got_chars[[1]]["gender"]<-NULL
map_dfr(got_chars, magrittr::extract, c("name", "culture", "gender", "id", "born", "alive"))
#Error: Argument 3 is a list, must contain atomic vectors
所需的输出将是缺少元素的NA
值。一个好的解决方案是什么?我怀疑该解决方案包括使用purrr:possibly()
,但我还没有弄清楚。
答案 0 :(得分:3)
tidyr的开发版本具有强大的新“嵌套”功能,它们可以处理有问题的数据(选项1)。另一种解决方法是逐列解决问题,该方法使您可以将public partial class Form1 : Form
{
Timer timer = new Timer { Interval = 10 };
public Form1()
{
InitializeComponent();
Paint += (s, e) => { };
timer.Tick += (s, e) => Refresh();
timer.Start();
}
}
的{{1}}参数用作.default
,它为缺少的元素提供了一个值(选项2)。
purrr::map()
由reprex package(v0.3.0.9000)于2019-08-15创建
答案 1 :(得分:2)
一个固有的问题是[
(或其别名magrittr::extract
)在缺少我们要提取的元素的情况下的行为:
list(a = 1)["b"]
# $<NA>
# NULL
magrittr::extract(list(a = 1), "b")
# $<NA>
# NULL
我们可以定义:
extract_if_present <- function(x, y) {
x[intersect(y, names(x))]
}
其行为类似于:
extract_if_present(list(a = 1), "b")
# named list()
然后将缺少元素的行绑定“有效”:
map_dfr(
got_chars_mutilated,
extract_if_present,
c("name", "culture", "gender", "id", "born", "alive")
)
# # A tibble: 30 x 6
# name culture id born alive gender
# <chr> <chr> <int> <chr> <lgl> <chr>
# 1 Theon Greyjoy Ironborn 1022 In 278 AC or 279 AC, at Pyke TRUE NA
# 2 Tyrion Lannister "" 1052 In 273 AC, at Casterly Rock TRUE Male
# 3 Victarion Greyjoy Ironborn 1074 In 268 AC or before, at Pyke TRUE Male
# 4 Will "" 1109 "" FALSE Male
# 5 Areo Hotah Norvoshi 1166 In 257 AC or before, at Norvos TRUE Male
# 6 Chett "" 1267 At Hag's Mire FALSE Male
# 7 Cressen "" 1295 In 219 AC or 220 AC FALSE Male
# 8 Arianne Martell Dornish 130 In 276 AC, at Sunspear TRUE Female
# 9 Daenerys Targaryen Valyrian 1303 In 284 AC, at Dragonstone TRUE Female
# 10 Davos Seaworth Westeros 1319 In 260 AC or before, at King's Landing TRUE Male
# # … with 20 more rows
列的顺序有些混乱,取决于行的顺序和它们错过的内容。
答案 2 :(得分:1)
一种方法是定义一个partial()
明确指定的pluck()
来提取感兴趣的名称,如果缺少该名称,则返回NA
。将修改后的pluck()
传递到双图,内图遍历要提取的名称,外图遍历got_chars
列表:
v <- set_names(c("name", "culture", "gender", "id", "born", "alive"))
map_dfr( got_chars, ~map(v, partial(pluck, .x, .default=NA)) )
# # A tibble: 30 x 6
# name culture gender id born alive
# <chr> <chr> <chr> <int> <chr> <lgl>
# 1 Theon Greyjoy Ironborn NA 1022 In 278 AC or 279 AC, at Pyke TRUE
# 2 Tyrion Lannister "" Male 1052 In 273 AC, at Casterly Rock TRUE
# 3 Victarion Greyj… Ironborn Male 1074 In 268 AC or before, at Pyke TRUE
# 4 Will "" Male 1109 "" FALSE
# 5 Areo Hotah Norvoshi Male 1166 In 257 AC or before, at Norvos TRUE
# 6 Chett "" Male 1267 At Hag's Mire FALSE
# 7 Cressen "" Male 1295 In 219 AC or 220 AC FALSE
# 8 Arianne Martell Dornish Female 130 In 276 AC, at Sunspear TRUE
# 9 Daenerys Targar… Valyrian Female 1303 In 284 AC, at Dragonstone TRUE
# 10 Davos Seaworth Westeros Male 1319 In 260 AC or before, at King's … TRUE
# # … with 20 more rows
为澄清起见,.x
在got_chars
上进行迭代,因为它位于用~
指定的lambda函数内部,因此它对应于外部map
。内部map
的函数由partial()
指定,该函数将当前查找的got_chars
元素(即.x
)附加为{{1 }}。然后,经过修改的pluck()
接受要提取的名称作为其(新的)第一个参数,因此可以按原样传递给内部地图,而无需任何额外的pluck()
。