使用`purr`

时间:2017-09-25 21:34:20

标签: r data.table purrr

这是我在提取列表列表的特定子集时提出的先前和类似问题的直接跟进:Extracting data from a list of lists into its own `data.frame` with `purrr`

因此我将使用相同的样本数据集:

l <- list(structure(list(a = -1.54676469632688, b = "s", c = "T", 
                     d = structure(list(id = 5L, label = "Utah", link = "Asia/Anadyr",
                                        score = -0.21104594634643), .Names = c("id", "label", "link", "score")), e = 49.1279871269422), .Names = c("a", "b", "c", "d", "e")), structure(list(a = -0.934821052832427, b = "k", c = "T", d = list(structure(list(id = 8L, label = "South Carolina", link = "Pacific/Wallis", score = 0.526540892113734, externalId = -6.74354377676955), .Names = c("id", "label", "link", "score", "externalId")), structure(list(id = 9L, label = "Nebraska", link = "America/Scoresbysund", score = 0.250895465294041, externalId = 16.4257470807879), .Names = c("id", "label", "link", "score", "externalId"))), e = 52.3161400117052), .Names = c("a", "b", "c", "d", "e")), structure(list(a = -0.27261485993069, b = "f", c = "P", d = list(structure(list(id = 8L, label = "Georgia", link = "America/Nome", score = 0.526494135483816, externalId = 7.91583574935589), .Names = c("id", "label", "link", "score", "externalId")), structure(list(id = 2L, label = "Washington", link = "America/Shiprock", score = -0.555186440792989, externalId = 15.0686663219837), .Names = c("id", "label", "link", "score", "externalId")), structure(list(id = 6L, label = "North Dakota", link = "Universal", score = 1.03168296038975), .Names = c("id", "label", "link", "score")), structure(list(id = 1L, label = "New Hampshire", link = "America/Cordoba", score = 1.21582056168681, externalId = 9.7276418869132), .Names = c("id", "label", "link", "score", "externalId")), structure(list(id = 1L, label = "Alaska", link = "Asia/Istanbul", score = -0.23183264861979), .Names = c("id", "label", "link", "score")), structure(list(id = 4L, label = "Pennsylvania", link = "Africa/Dar_es_Salaam", score = 0.590245339334121), .Names = c("id", "label", "link", "score"))), e = 132.1153538536), .Names = c("a", "e")), structure(list(a = 0.202685974077313, b = "x", c = "O", d = structure(list(id = 3L, label = "Delaware", link = "Asia/Samarkand", score = 0.695577130634724, externalId = 15.2364820698193), .Names = c("id", "label", "link", "score", "externalId")), e = 97.9908914452971), .Names = c("a", "b", "c", "d", "e")), structure(list(a = -0.396243444741009, b = "z", c = "P", d = list(structure(list(id = 4L, label = "North Dakota", link = "America/Tortola", score = 1.03060272795705, externalId = -7.21666936522344), .Names = c("id", "label", "link", "score", "externalId")), structure(list(id = 9L, label = "Nebraska", link = "America/Ojinaga", score = -1.11397997280413, externalId = -8.45145052697411), .Names = c("id", "label", "link", "score", "externalId"))), e = 123.597945533926), .Names = c("a", "b", "c", "d", "e")))

我试图解决的一般问题是提取具有不同长度的嵌套列表的内容,并将它们绑定到同一列表中的其他内容,这些内容基本上被用作嵌套内容的ID。

在上面的示例数据集的上下文中,我试图将子列表d的内容提取到data.table / data.frame,但也提取并基本上重复数据每个元素a - 这样我就可以理解d中哪些提取的元素属于同一个子集,因为它们的长度不同。所需data.table的示例将最好地解释:

a          id           label                        link       score  externalId
-1.5467647  5            Utah                 Asia/Anadyr  -0.2110459          NA
-0.9348211  8  South Carolina              Pacific/Wallis   0.5265409   -6.743544
-0.9348211  9        Nebraska        America/Scoresbysund   0.2508955    16.42575

请注意,第一列al中第一个子列表的内容。第一行是d中第一个嵌套项的内容(长度为1),然后第二行和第三行是d中第二个项的内容(长度为2),因此{ {1}}与a相同。

目前,我实现这一目标的解决方案是一种全面的方式,并且容易出错 - 并且考虑到与上述参考文章的关系,我想知道我是否不理解能够将其扩展到这个相关的问题。

1 个答案:

答案 0 :(得分:4)

&lt; p&gt;每个嵌套列表往往需要稍微不同的方法,但这涵盖了一些典型的方法:&lt; / p&gt; &lt; pre class =&#34; lang-r prettyprint-override&#34;&gt;&lt; code&gt; library(tidyverse) l&lt; - list(结构(列表(a = -1.54676469632688,b =&#34; s&#34;,c =&#34; T&#34;,                      d =结构(列表(id = 5L,label =&#34; Utah&#34;,link =&#34; Asia / Anadyr&#34;,                                         得分= -0.21104594634643),.姓名= c(&#34; id&#34;,&#34;标签&#34;,&#34;链接&#34;,&#34;得分&#34;)),e = 49.1279871269422),. Name = c(&#34; a&#34;,&#34; b&#34;,&#34; c&#34;,&#34; d&#34;,&#34; e& #34;)),结构(列表(a = -0.934821052832427,b =&#34; k&#34;,c =&#34; T&#34;,d = list(结构(列表(id = 8L,标签) =&#34; South Carolina&#34;,link =&#34; Pacific / Wallis&#34;,score = 0.526540892113734,externalId = -6.74354377676955),。Name = c(&#34; id&#34;,&# 34;标签&#34;,&#34;链接&#34;,&#34;得分&#34;,&#34; externalId&#34;)),结构(列表(id = 9L,label =&#34; Nebraska&#34;,link =&#34; America / Scoresbysund&#34;,score = 0.250895465294041,externalId = 16.4257470807879),。Name = c(&#34; id&#34;,&#34; label&#34;, &#34;链接&#34;,&#34;得分&#34;,&#34; externalId&#34;))),e = 52.3161400117052),. Name = c(&#34; a&#34;,& #34; b&#34;,&#34; c&#34;,&#34; d&#34;,&#34; e&#34;)),结构(列表(a = -0.27261485993069,b =&# 34; f&#34;,c =&#34; P&#34;,d = list(结构(list(id = 8L,label) =&#34; Georgia&#34;,link =&#34; America / Nome&#34;,score = 0.526494135483816,externalId = 7.91583574935589),。Name = c(&#34; id&#34;,&#34;标签&#34;,&#34;链接&#34;,&#34;得分&#34;,&#34; externalId&#34;)),结构(列表(id = 2L,标签=&#34;华盛顿&# 34;,link =&#34; America / Shiprock&#34;,score = -0.555186440792989,externalId = 15.0686663219837),。Name = c(&#34; id&#34;,&#34; label&#34;,& #34;链接&#34;,&#34;得分&#34;,&#34; externalId&#34;)),结构(列表(id = 6L,标签=&#34;北达科他州&#34;,link = &#34; Universal&#34;,得分= 1.03168296038975),. Name = c(&#34; id&#34;,&#34;标签&#34;,&#34;链接&#34;,&#34;得分&#34;)),结构(列表(id = 1L,标签=&#34;新罕布什尔&#34;,链接=&#34;美国/科尔多瓦&#34;,得分= 1.21582056168681,externalId = 9.7276418869132),.姓名= c(&#34; id&#34;,&#34;标签&#34;,&#34;链接&#34;,&#34;得分&#34;,&#34; externalId&#34;)) ,结构(列表(id = 1L,label =&#34; Alaska&#34;,link =&#34; Asia / Istanbul&#34;,score = -0.23183264861979 ),。Name = c(&#34; id&#34;,&#34;标签&#34;,&#34;链接&#34;,&#34;得分&#34;)),结构(列表(id) = 4L,label =&#34; Pennsylvania&#34;,link =&#34; Africa / Dar_es_Salaam&#34;,score = 0.590245339334121),. Name = c(&#34; id&#34;,&#34; label&#34;,&#34; link&#34;,&#34; score&#34;))),e = 132.1153538536),. Name = c(&#34; a&#34;,&#34; e& #34;)),结构(列表(a = 0.202685974077313,b =&#34; x&#34;,c =&#34; O&#34;,d =结构(列表(id = 3L,label =&#) 34; Delaware&#34;,link =&#34; Asia / Samarkand&#34;,score = 0.695577130634724,externalId = 15.2364820698193),。Name = c(&#34; id&#34;,&#34; label&#34 ;,&#34;链接&#34;,&#34;得分&#34;,&#34; externalId&#34;)),e = 97.9908914452971),. Name = c(&#34; a&#34;, &#34; b&#34;,&#34; c&#34;,&#34; d&#34;,&#34; e&#34;)),结构(列表(a = -0.396243444741009,b =& #34; z&#34;,c =&#34; P&#34;,d = list(结构(列表(id = 4L,标签=&#34;北达科他州&#34;,链接=&#34;美国) / Tortola&#34;,得分= 1.03060272795705,externalId = -7.21666936522344),. Name = c(&#34; id&#34;,&#34;标签&#34;,&#34;链接&#34;,&#34;得分&#34;,&#34; externalId&#34;)),结构(list = id = 9L,label =&#34; Nebraska&#34;,link =&#34; America / Ojinaga&#34;,score = -1.11397997280413,externalId = -8.45145052697411)。。Name = c(&#34 ; id&#34;,&#34; label&#34;,&#34; link&#34;,&#34; score&#34;,&#34; externalId&#34;))),e = 123.597945533926), .Names = c(&#34; a&#34;,&#34; b&#34;,&#34; c&#34;,&#34; d&#34;,&#34; e&#34;) )) l%&gt;%     map(set_names,letters [1:5])%&gt;%#添加缺少的名称     map(modify_at,&#39; d&#39;,bind_rows)%&gt;%#cercece嵌套元素到data.frame     #将每个元素设置为data.frame,并将它们全部组合在一起     map_df(data.frame,stringsAsFactors = FALSE) #&GT; a b c d.id d.label d.link d.score e d.externalId #&GT; 1 -1.5467647 s T 5 Utah Asia / Anadyr -0.2110459 49.12799 NA #&GT; 2 -0.9348211 k T 8 South Carolina Pacific / Wallis 0.5265409 52.31614 -6.743544 #&GT; 3 -0.9348211 k T 9内布拉斯加州美国/ Scoresbysund 0.2508955 52.31614 16.425747 #&GT; 4 -0.2726149 f P 8 Georgia America / Nome 0.5264941 132.11535 7.915836 #&GT; 5 -0.2726149 f P 2 Washington America / Shiprock -0.5551864 132.11535 15.068666 #&GT; 6 -0.2726149 f P 6 North Dakota Universal 1.0316830 132.11535 NA #&GT; 7 -0.2726149 f P 1新罕布什尔州美国/科尔多瓦1.2158206 132.11535 9.727642 #&GT; 8 -0.2726149 f P 1 Alaska Asia / Istanbul -0.2318326 132.11535 NA #&GT; 9 -0.2726149 f P 4 Pennsylvania Africa / Dar_es_Salaam 0.5902453 132.11535 NA #&GT; 10 0.2026860 x O 3 Delaware Asia / Samarkand 0.6955771 97.99089 15.236482 #&GT; 11 -0.3962434 z P 4 North Dakota America / Tortola 1.0306027 123.59795 -7.216669 #&GT; 12 -0.3962434 z P 9内布拉斯加州/ Ojinaga -1.1139800 123.59795 -8.451451 &LT; /代码&GT;&LT; /预&GT; &lt; p&gt;还有很多方法可以做到这一点,但关键是首先将最嵌套的元素排列到正确的数据结构中,然后将它们与其余元素组合,直到您拥有data.frame。 &LT; / p为H. &lt; p&gt;请注意,使用&lt; code&gt; data.frame&lt; / code&gt;而不是一个类似的等价在这里有点hacky,但data.frame更好地将data.frames和值插入单个data.frame,必要时进行回收。使用tidyverse版本需要使所有内容都正确,而不是依赖于回收。&lt; / p&gt;