我有一个RichText,我将QTextEdit中的Html源存储在一个字符串中。 我想要做的是逐个提取所有行(我有4-6行)。 字符串如下所示:
//html opening stuff
<p style = attributes...><span style = attributes...>My Text</span></p>
//more lines like this
//html closing stuff
所以我需要从开头的p标签到结束的p标签(包括p标签)的全线。 我检查并尝试了我在这里和其他网站上找到的所有内容,但仍然没有结果。
这是我的代码(&#34; htmlStyle&#34;是输入字符串):
QStringList list;
QRegExp rx("(<p[^>]*>.*?</p>)");
int pos = 0;
while ((pos = rx.indexIn(htmlStyle, pos)) != -1) {
list << rx.cap(1);
pos += rx.matchedLength();
}
或者没有正则表达式可以做任何其他方法吗?
答案 0 :(得分:2)
HTML / XML不是常规语法。你不能用正则表达式解析它。参见例如this question。解析HTML并非易事。
您可以使用QTextDocument
,QTextBlock
,QTextCursor
等来迭代富文本文档中的段落。所有HTML解析都会为您完成。这正是QTextEdit
支持的HTML子集:它使用QTextDocument
作为内部表示。您可以使用QTextEdit::document()
直接从窗口小部件中获取它。 E.g:
void iterate(QTextEdit * edit) {
auto const & doc = *edit->document();
for (auto block = doc.begin(); block != doc.end(); block.next()) {
// do something with text block e.g. iterate its fragments
for (auto fragment = block.begin(); fragment != block.end(); fragment++) {
// do something with text fragment
}
}
}
您不应手动错误地解析HTML,而应探索QTextDocument
的结构并根据需要使用它。
答案 1 :(得分:1)
下面是纯java方式,希望这会有所帮助:
# Your matrix
mymat <- structure(c("0/1", "1/1", "0/0", "0/0"), .Dim = c(2L, 2L),
.Dimnames = list(c("chr1:1163804", "chr1:1888193"),
c("00860.GT", "00861.GT")))
# Using a data table approach
library(data.table)
# Casting to data table - row.names will be converted to a column called 'rn'.
mymat = as.data.table(mymat, keep.rownames = T)
# Find "GT" columns
GTcols = grep("GT", colnames(mymat))
# Get number before ".GT"
selectedCols = gsub(".GT", "", colnames(mymat)[GTcols])
selectedCols
[1] "00860" "00861"
# Create ".DP" columns
mymat[, paste0(selectedCols, ".DP") := 50, with = F]
mymat
rn 00860.GT 00861.GT 00860.DP 00861.DP
1: chr1:1163804 0/1 0/0 50 50
2: chr1:1888193 1/1 0/0 50 50
# Create "GT" to "AD" mapping
GTToADMapping = c("50,0", "25/25", "0/50")
names(GTToADMapping) = c("0/0", "0/1", "1/1")
GTToADMapping
0/0 0/1 1/1
"50,0" "25/25" "0/50"
# This function will return the "AD" mapping given the values of "GT"
mapGTToAD <- function(x){
return (GTToADMapping[x])
}
# Here, we create the AD columns using the GT mapping
mymat[, (paste0(selectedCols, ".AD")) := lapply(.SD, mapGTToAD), with = F,
.SDcols = colnames(mymat)[GTcols]]
rn 00860.GT 00861.GT 00860.DP 00861.DP 00860.AD 00861.AD
1: chr1:1163804 0/1 0/0 50 50 25/25 50,0
2: chr1:1888193 1/1 0/0 50 50 0/50 50,0
# We can sort the data now as you have it
colOrder = as.vector(rbind(paste0(selectedCols, ".GT"),
paste0(selectedCols, ".AD"),
paste0(selectedCols, ".DP")))
mymat = mymat[, c("rn", colOrder), with = F]
mymat
rn 00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
1: chr1:1163804 0/1 25/25 50 0/0 50,0 50
2: chr1:1888193 1/1 0/50 50 0/0 50,0 50
# Put it back in the format you had
mymat2 = as.matrix(mymat[,-1, with = F])
rownames(mymat2) = mymat$rn
mymat2
00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
chr1:1163804 "0/1" "25/25" "50" "0/0" "50,0" "50"
chr1:1888193 "1/1" "0/50" "50" "0/0" "50,0" "50"
答案 2 :(得分:0)
对于那些需要完整Qt解决方案的人,我根据@Aditya Poorna的回答找出了答案。谢谢你的提示!
以下是代码:
int startIndex = htmlStyle.indexOf("<p");
int endIndex = htmlStyle.indexOf("</p>");
while (startIndex >= 0) {
endIndex = endIndex + 4;
QStringRef subString(&htmlStyle, startIndex, endIndex-startIndex);
qDebug() << subString;
startIndex = htmlStyle.indexOf("<p", startIndex + 1);
endIndex = htmlStyle.indexOf("</p>", endIndex + 1);
}
“QStringRef subString”从“startIndex”进入“htmlStyle”,直到“endIndex-startIndex”的长度!