R - 字符向量中两个单词之间出现的单词位置

时间:2015-08-18 15:42:42

标签: r

我有一个示例xml输入字符向量,如下所示。我不使用xml解析函数,因为xml文档不完全有效:

x1 <- readLines("test.xml)
x1
<important>
<hdtitle>Important:</hdtitle>
<stepgrp type="unordered-bullet">
<step>
Ensure the dark link is facing up.
</step>
<step>
If using the chain and sprockets again, align the darkened
link with the marked sprockets made during disassembly.
</step>
</stepgrp>
</important>
Install the drive chain to the drive sprocket and the driven
sprocket.
</step>

我想根据以下条件在</important>和连续</step>之间找到</step>的位置:
1)<step></step></step>

的位置之间不会发生<important> <hdtitle>Important:</hdtitle> <stepgrp type="unordered-bullet"> <step> Ensure the dark link is facing up. </step> <step> If using the chain and sprockets again, align the darkened link with the marked sprockets made during disassembly. </step> </stepgrp> </important><step> Install the drive chain to the drive sprocket and the driven sprocket. </step>

我想要的输出如下:

<step>

如果上述条件满足,我会在</important>之后添加Const cstrDbPath As String = "C:\share\Access\Database2.accdr" Const cstrDao = "DAO.DBEngine.120" Dim dbe As Object ' DAO.DBEngine Dim db As Object ' DAO.Database Dim rs As Object ' DAO.Recordset Dim strSql As String strSql = "SELECT 'Hello World' AS greet_world;" Set dbe = CreateObject(cstrDao) Set db = dbe.OpenDatabase(cstrDbPath, True) Set rs = db.OpenRecordset(strSql) Range("A2").CopyFromRecordset rs rs.Close db.Close

1 个答案:

答案 0 :(得分:0)

String  args = "ps -efl";
Process p = Runtime.getRuntime ().exec (args);

try
{
    int     exitVal = p.waitFor ();

    // search output for instance of process name
    BufferedReader br = new BufferedReader (new InputStreamReader (p.getInputStream ()));
    String  line = "";
    while ((line = br.readLine ()) != null)
    {
        System.out.println (line);
    }
}
catch (InterruptedException e)
{
    System.out.println (e.getMessage ());
    System.exit (0);
}

我使用imp_end <- grep("<\\/important>", x) step_start <- grep("<step>", x) step_end <- grep("<\\/step>", x) imp_intervals <- cut(imp_end, step_end) step_start_intervals <- na.omit(cut(step_start, step_end)) valid <- na.omit(imp_intervals[!imp_intervals %in% step_start_intervals]) indx <- na.omit(imp_end[imp_intervals == valid]) x[indx] <- gsub("^(<\\/important>)$", "\\1<step>", x[indx]) x # [1] "<important>" # [2] "<hdtitle>Important:</hdtitle>" # [3] "<stepgrp type=\"unordered-bullet\">" # [4] "<step>" # [5] "Ensure the dark link is facing up." # [6] "</step>" # [7] "<step>" # [8] "If using the chain and sprockets again, align the darkened" # [9] "link with the marked sprockets made during disassembly." # [10] "</step>" # [11] "</stepgrp>" # [12] "</important><step>" # [13] "Install the drive chain to the drive sprocket and the driven" # [14] "sprocket." # [15] "</step>" 为线条创建间隔。我假设数据是垂直的,如示例中所示。然后使用cut,我查找valid之间的</important>位置,并且在同一区间内也没有"</step>"个字符串。我试图破译你的规则逻辑,让我知道你是否正在寻找它。

数据

"<step>"