R:如何根据其他字符串标记位置在字符串中添加字符?

时间:2018-06-26 23:07:37

标签: r regex string split

我想基于另一个字符串的标记位置在字符串中添加标记。我有两列的SOURCE数据框:“正交”和“音节化”。我想使用“下划线”标记创建TARGET列。对应于“ sillabify”中“ underlines”位置的“ ortho”字符串应以“ underlines”分隔。

  

df <-data.frame(“ agradece”,“ R_OOR_OR_OR”)

SOURCE:  
   ortho    syllabify       
agradeço  R_OOR_OR_OR  
    bala        OR_OR        
 futebol    OR_OR_ORC    

TARGET:  
   ortho    syllabify       TARGET
agradeço  R_OOR_OR_OR  a_gra_de_ço    
    bala        OR_OR        ba_la
 futebol    OR_OR_ORC    fu_te_bol

谢谢大家!

2 个答案:

答案 0 :(得分:0)

我不知道您使用的是哪种语言(古斯塔沃,梅利索),但这是Java的答案:

初始化器:

String sillabify = "OR_OR_ORC";
String ortho = "futebol";
String answer = returnTheTARGETColumnStringUsingTheUnderlineMarkers(ortho, sillabify);

方法:

public String returnTheTARGETColumnStringUsingTheUnderlineMarkers(String pOrtho, String pSillabify) {
    String target = "";

    int ind = 0;
    while (pSillabify.contains("_")) {
        target = target + pOrtho.substring(0, pSillabify.indexOf("_")) + "_";
        pOrtho = pOrtho.substring(pSillabify.indexOf("_"), pOrtho.length());
        pSillabify = pSillabify.substring(pSillabify.indexOf("_") + 1, pSillabify.length());
    }

    target = target + pOrtho;

    return target;
}

返回“ fu_te_bol”。

答案 1 :(得分:0)

这是一种解决方案:

Reconnect

reprex package(v0.2.0)于2018-06-26创建。

=IF(COUNTIF(A$2:A2, A2)=1, SUMPRODUCT((A$2:A$999=A2)/(COUNTIFS(B$2:B$999, B$2:B$999&"", A$2:A$999, A2)+(A$2:A$999<>A2))), "") 包用于获取df <- read.table(text = " ortho syllabify agradeço R_OOR_OR_OR bala OR_OR futebol OR_OR_ORC", header = TRUE) library(purrr) df <- within(df, { ortho <- as.character(ortho) underscore_loc <- gregexpr("_", syllabify) target <- map2(ortho, underscore_loc, function(string, loc) { locs <- cbind(c(1, loc) - pmax(0, 1 + 0:length(loc) - 2), c(loc, nchar(string)) - c(1:length(loc), 0)) strings <- apply(locs, 1, function(x) substr(string, x[1], x[2])) paste(strings, collapse = "_") }) rm(underscore_loc) }) df #> ortho syllabify target #> 1 agradeço R_OOR_OR_OR a_gra_de_ço #> 2 bala OR_OR ba_la #> 3 futebol OR_OR_ORC fu_te_bol 函数-类似于purrr,但可跨2个列表进行输入。