我想基于另一个字符串的标记位置在字符串中添加标记。我有两列的SOURCE数据框:“正交”和“音节化”。我想使用“下划线”标记创建TARGET列。对应于“ sillabify”中“ underlines”位置的“ ortho”字符串应以“ underlines”分隔。
df <-data.frame(“ agradece”,“ R_OOR_OR_OR”)
SOURCE:
ortho syllabify
agradeço R_OOR_OR_OR
bala OR_OR
futebol OR_OR_ORC
TARGET:
ortho syllabify TARGET
agradeço R_OOR_OR_OR a_gra_de_ço
bala OR_OR ba_la
futebol OR_OR_ORC fu_te_bol
谢谢大家!
答案 0 :(得分:0)
我不知道您使用的是哪种语言(古斯塔沃,梅利索),但这是Java的答案:
初始化器:
String sillabify = "OR_OR_ORC";
String ortho = "futebol";
String answer = returnTheTARGETColumnStringUsingTheUnderlineMarkers(ortho, sillabify);
方法:
public String returnTheTARGETColumnStringUsingTheUnderlineMarkers(String pOrtho, String pSillabify) {
String target = "";
int ind = 0;
while (pSillabify.contains("_")) {
target = target + pOrtho.substring(0, pSillabify.indexOf("_")) + "_";
pOrtho = pOrtho.substring(pSillabify.indexOf("_"), pOrtho.length());
pSillabify = pSillabify.substring(pSillabify.indexOf("_") + 1, pSillabify.length());
}
target = target + pOrtho;
return target;
}
返回“ fu_te_bol”。
答案 1 :(得分:0)
这是一种解决方案:
Reconnect
由reprex package(v0.2.0)于2018-06-26创建。
=IF(COUNTIF(A$2:A2, A2)=1, SUMPRODUCT((A$2:A$999=A2)/(COUNTIFS(B$2:B$999, B$2:B$999&"", A$2:A$999, A2)+(A$2:A$999<>A2))), "")
包用于获取df <- read.table(text = " ortho syllabify
agradeço R_OOR_OR_OR
bala OR_OR
futebol OR_OR_ORC", header = TRUE)
library(purrr)
df <- within(df, {
ortho <- as.character(ortho)
underscore_loc <- gregexpr("_", syllabify)
target <- map2(ortho, underscore_loc, function(string, loc) {
locs <- cbind(c(1, loc) - pmax(0, 1 + 0:length(loc) - 2), c(loc, nchar(string)) - c(1:length(loc), 0))
strings <- apply(locs, 1, function(x) substr(string, x[1], x[2]))
paste(strings, collapse = "_")
})
rm(underscore_loc)
})
df
#> ortho syllabify target
#> 1 agradeço R_OOR_OR_OR a_gra_de_ço
#> 2 bala OR_OR ba_la
#> 3 futebol OR_OR_ORC fu_te_bol
函数-类似于purrr
,但可跨2个列表进行输入。