R组合不同行长的矢量

时间:2016-07-29 17:19:45

标签: r

如何将具有不同数量或行的矢量组合到R中的数据帧中。以下是示例。每个向量有7或9行。 sourceVersion和device是另外两行。我希望这些包含在数据框中并留空或设置为NA用于7行向量观察,如下表所示。

我希望像这样的数据框中的数据。

type                                    sourceName              sourceVersion   device                                                                                                          unit    creationDate    startDate       endDate         value
HKQuantityTypeIdentifierFlightsClimbed  Ryan Praskievicz iPhone 9.3.2           <<HKDevice: 0x15a4af3f0>, name:iPhone, manufacturer:Apple, model:iPhone, hardware:iPhone8,1, software:9.3.2>    count   6/2/2016 12:27  6/2/2016 12:09  6/2/2016 12:09  1
HKQuantityTypeIdentifierStepCount       Ryan Praskievicz iPhone                                                                                                                                 count   10/2/2014 8:30  9/24/2014 15:07 9/24/2014 15:07 7

这就是我的尝试。

library(XML)

xmlstr <- '<?xml version="1.0" encoding="UTF-8"?>
            <HealthData locale="en_US">
              <ExportDate value="2016-06-02 14:05:23 -0400"/>
              <Me HKCharacteristicTypeIdentifierDateOfBirth="" HKCharacteristicTypeIdentifierBiologicalSex="HKBiologicalSexNotSet" HKCharacteristicTypeIdentifierBloodType="HKBloodTypeNotSet" HKCharacteristicTypeIdentifierFitzpatrickSkinType="HKFitzpatrickSkinTypeNotSet"/>
              <Record type="HKQuantityTypeIdentifierStepCount" sourceName="Ryan Praskievicz iPhone" unit="count" creationDate="2014-10-02 08:30:17 -0400" startDate="2014-09-24 15:07:06 -0400" endDate="2014-09-24 15:07:11 -0400" value="7"/> <Record type="HKQuantityTypeIdentifierFlightsClimbed" sourceName="Ryan Praskievicz iPhone" sourceVersion="9.3.2" device="&lt;&lt;HKDevice: 0x15a4af3f0&gt;, name:iPhone, manufacturer:Apple, model:iPhone, hardware:iPhone8,1, software:9.3.2&gt;" unit="count" creationDate="2016-06-02 12:27:46 -0400" startDate="2016-06-02 12:09:29 -0400" endDate="2016-06-02 12:09:29 -0400" value="1"/> </HealthData>'

xml <- xmlParse(xmlstr)

recordAttribs <- xpathSApply(doc=xml, path="//HealthData/Record",  xmlAttrs)
df <- data.frame(t(recordAttribs))
df

这是我输出到R控制台的原因

      X1
            1 HKQuantityTypeIdentifierStepCount, Ryan Praskievicz iPhone, count, 2014-10-02 08:30:17 -0400, 2014-09-24 15:07:06 -0400, 2014-09-24 15:07:11 -0400, 7                                                                                                                                                                                                                                                                                 
    X2 
1 HKQuantityTypeIdentifierFlightsClimbed, Ryan Praskievicz iPhone, 9.3.2, <<HKDevice: 0x15a4af3f0>, name:iPhone, manufacturer:Apple, model:iPhone, hardware:iPhone8,1, software:9.3.2>, count, 2016-06-02 12:27:46 -0400, 2016-06-02 12:09:29 -0400, 2016-06-02 12:09:29 -0400, 1

2 个答案:

答案 0 :(得分:2)

依赖性有点深奥,但你可以这样做:

library(data.table)
rbindlist(lapply(recordAttribs, function(x) data.table(t(x))), fill=TRUE)

这将返回data.table,其继承data.frame

                                     type              sourceName  unit
1:      HKQuantityTypeIdentifierStepCount Ryan Praskievicz iPhone count
2: HKQuantityTypeIdentifierFlightsClimbed Ryan Praskievicz iPhone count
                creationDate                 startDate                   endDate value
1: 2014-10-02 08:30:17 -0400 2014-09-24 15:07:06 -0400 2014-09-24 15:07:11 -0400     7
2: 2016-06-02 12:27:46 -0400 2016-06-02 12:09:29 -0400 2016-06-02 12:09:29 -0400     1
   sourceVersion
1:            NA
2:         9.3.2
                                                                                                         device
1:                                                                                                           NA
2: <<HKDevice: 0x15a4af3f0>, name:iPhone, manufacturer:Apple, model:iPhone, hardware:iPhone8,1, software:9.3.2>

我使用data.table的原因是它有一个智能rbind方法,其use.names=TRUE选项允许行长度不等,匹配名称上的列而不是位置,并填充NA的缺失值。

rbind.data.table如何运作的简单示例:

d1 = data.table(a="foo", b = "bar", c = "baz")
d2 = data.table(b="bar", a = "foo")
rbind(d1, d2) # throws helpful error:  "If instead you need to fill missing columns, use set argument 'fill' to TRUE."
rbind(d1, d2, fill=TRUE)
#      a   b   c
# 1: foo bar baz
# 2: foo bar  NA 

答案 1 :(得分:1)

以下是使用lapplyrecordAttribs <- xpathSApply(doc=xml, path="//HealthData/Record", xmlAttrs) recordAttribs <- t(recordAttribs) 进行此操作的方法。

TRUE/FALSE

根据列表中的元素是否等于7,使用sapply获取short.condition <- sapply(recordAttribs, function(x) length(x)==7) 的向量。

lapply

在符合此条件的列表子集上使用NA。这次你在符合上述条件的向量中连接两个recordAttribs[short.condition] <- lapply(recordAttribs, function(x) c(x[1:2],NA,NA,x[3:7]))

df <- matrix(unlist(recordAttribs),
            nrow=2,ncol=9, byrow=TRUE)

df <- data.frame(df, stringsAsFactors=FALSE)

names(df) <- c("type","sourceName","sourceVersion","device","unit","creationDate","startDate","endDate","value")

要将其转换为您想要的格式的data.frame:

> str(df)
'data.frame':   2 obs. of  9 variables:
 $ type         : chr  "HKQuantityTypeIdentifierStepCount" "HKQuantityTypeIdentifierFlightsClimbed"
 $ sourceName   : chr  "Ryan Praskievicz iPhone" "Ryan Praskievicz iPhone"
 $ sourceVersion: chr  NA "9.3.2"
 $ device       : chr  NA "<<HKDevice: 0x15a4af3f0>, name:iPhone, manufacturer:Apple, model:iPhone, hardware:iPhone8,1, software:9.3.2>"
 $ unit         : chr  "count" "count"
 $ creationDate : chr  "2014-10-02 08:30:17 -0400" "2016-06-02 12:27:46 -0400"
 $ startDate    : chr  "2014-09-24 15:07:06 -0400" "2016-06-02 12:09:29 -0400"
 $ endDate      : chr  "2014-09-24 15:07:11 -0400" "2016-06-02 12:09:29 -0400"
 $ value        : chr  "7" "1"

看起来像这样:

import { component_name } from '@angular/core'