我正在研究使用XML包创建基于R对象的XML树。
我想做的一件事是从数据框中提取信息,如下所示:
frame2
rules support confidence lift
1 1 0.010230179 1.0000000 78.200000
2 2 0.010230179 0.8000000 78.200000
3 3 0.010230179 1.0000000 65.166667
4 4 0.010230179 0.6666667 65.166667
5 5 0.012787724 0.8333333 54.305556
6 6 0.012787724 0.8333333 54.305556
7 7 0.010230179 0.6666667 26.066667
8 8 0.010230179 0.4000000 26.066667
9 9 0.007672634 0.6000000 26.066667
10 10 0.007672634 0.3333333 26.066667
11 11 0.007672634 0.6000000 21.327273
12 12 0.007672634 0.2727273 21.327273
13 13 0.007672634 0.4285714 16.757143
14 14 0.007672634 0.3000000 16.757143
15 15 0.010230179 0.6666667 26.066667
16 16 0.010230179 0.4000000 26.066667
17 17 0.007672634 0.3333333 10.861111
18 18 0.007672634 0.2500000 10.861111
19 19 0.007672634 0.3750000 13.329545
20 20 0.007672634 0.2727273 13.329545
21 21 0.007672634 0.3750000 11.278846
22 22 0.007672634 0.2307692 11.278846
23 23 0.007672634 0.3750000 18.328125
24 24 0.007672634 0.3750000 18.328125
25 25 0.007672634 0.4285714 13.964286
26 26 0.007672634 0.2500000 13.964286
27 27 0.007672634 0.4285714 11.171429
28 28 0.007672634 0.2000000 11.171429
29 29 0.007672634 0.3000000 11.730000
30 30 0.007672634 0.3000000 11.730000
31 31 0.007672634 0.2727273 8.886364
32 32 0.007672634 0.2500000 8.886364
33 33 0.007672634 0.3333333 10.861111
34 34 0.007672634 0.2500000 10.861111
35 35 0.007672634 0.3000000 11.730000
36 36 0.007672634 0.3000000 11.730000
37 37 0.007672634 0.3000000 9.775000
38 38 0.007672634 0.2500000 9.775000
39 39 0.007672634 0.2727273 8.202797
40 40 0.007672634 0.2307692 8.202797
41 41 0.007672634 0.2307692 8.202797
42 42 0.007672634 0.2727273 8.202797
43 43 0.007672634 0.2307692 6.015385
44 44 0.007672634 0.2000000 6.015385
45 45 0.010230179 0.8000000 31.280000
46 46 0.010230179 1.0000000 65.166667
47 47 0.010230179 1.0000000 65.166667
进入XML树,如下所示
root
1
support=0.010230179
confidence=1.0000000
lift=78.200000
/1
2
support=0.010230179
confidence=0.8000000
lift=78.200000
/2
47
support=0.010230179
confidence=1.0000000
lift=65.166667
/47
/root
到目前为止,我已经能够使用以下命令创建47个子节点。
root<-newXMLNode("root")
sapply(frame2$rules,newXMLNode,parent=root)
但是,我无法使用适当的值添加元素支持,置信度和提升。
这引导我提出以下两个问题:
如何为47个子节点中的每一个定义元素或属性支持,置信度和提升?
如何根据第2帧中的值填写各自的值?
非常感谢。
答案 0 :(得分:1)
基本问题:
apply
有一个是你的朋友:
> invisible(apply(df, MARGIN=1, print))
rules support confidence lift
1.00000000 0.01023018 1.00000000 78.20000000
rules support confidence lift
2.00000000 0.01023018 0.80000000 78.20000000
rules support confidence lift
3.00000000 0.01023018 1.00000000 65.16666700
newXMLNode
有一个attrs
参数,它将一个命名向量用于生成属性
> newXMLNode(as.character(df[1,"rules"]), attrs=df[1,])
<1 rules="1" support="0.010230179" confidence="1" lift="78.2"/>
要准确回答这个问题,我们还必须通过子集来摆脱你的rules
属性,但这并不会让我感到非常高兴 -
> newXMLNode(as.character(df[1,"rules"]), attrs=df[1,-1])
<1 support="0.010230179" confidence="1" lift="78.2"/>
我建议不要创建一个带有不可预测的元素名称的XML模式(因为在收件人代码中解析/验证通常比较困难),对此进行编码就好像
> newXMLNode("observation", attrs=df[1,])
<observation rule="1" support="0.010230179" confidence="1" lift="78.2"/>
结合上述两个并添加根节点:
> children <- apply(df, MARGIN=1, function(row) newXMLNode("observation", attrs=row))
> root <- newXMLNode("frame2", .children = children)
> root
<frame2>
<observation rules="1" support="0.010230179" confidence="1" lift="78.2"/>
<observation rules="2" support="0.010230179" confidence="0.8" lift="78.2"/>
<observation rules="3" support="0.010230179" confidence="1" lift="65.166667"/>
</frame2>