Question

我有很多句子的数据，关于一个例子作为下面的句子，我想把它分成2个子句：

全血浆和d 。 1.006g / ml血浆密度分数来自2/2只小鼠显示出这种广泛的β迁移模式（图1B） | T：** 1SP3E3 | ; |我：** 1SP3E3 | | L：** 1SP3E3 |相比之下，3/3等离子显示在β位几乎没有脂质染色。 | T：** 1SN3E3 | |我：** 1SN3E3 | | L：** 1SN3E3 |

将其拆分为：

全血浆和d 。 1.006g / ml血浆密度分数来自2/2小鼠的小鼠显示出这种广泛的β-迁移模式（图1B）

和

相比之下，3/3血浆显示几乎没有脂质染色的β位上。

我的代码是：

private static DataGridViewRow PrepareDataGridViewRow(DataGridView dgv, object[] cols)
{
    var result = new DataGridViewRow();
    result.CreateCells(dgv,cols);
    return result;
};

但我无法得到正确的结果。有人可以帮忙吗？非常感谢。

Answer 1

[i.strip() for i in re.sub(r'\|\w:\*\*\w*\|', '', re.sub(r' +', r' ', s.strip())).split(';')]

返回

['Both whole plasma and the d < 1.006 g/ml density fraction of plasma from 2/2 mice show this broad beta-migration pattern (Fig. 1 B)', 'in contrast, 3/3 plasma shows virtually no lipid staining at the beta-position.']

但是要考虑一下，因为这取决于你的文字是否与你的例子一致。

Answer 2

import re
x="""Both whole plasma and the d < 1.006 g/ml density fraction of plasma from 2/2 mice show this broad beta-migration pattern (Fig. 1 B) |T:**1SP3E3| ; |I:**1SP3E3| |L:**1SP3E3| in contrast, 3/3 plasma shows virtually no lipid staining at the beta-position. |T:**1SN3E3| |I:**1SN3E3| |L:**1SN3E3|"""
print [i for i in re.split(r"(?:\|[^:]*:.*?\|(?:[\s;]+|$))+",x) if i]

输出将如下所示：

['全血浆和d＆lt;来自2/2小鼠的1.006g / ml血浆密度分数显示出这种广泛的β-迁移模式（图1B）'，相反，3/3血浆显示在β位几乎没有脂质染色。 “]

Answer 3

导入重新

string = [str.strip（）for str in re.sub（'\ | \ w：** \ w + \ |'，''，string）.split（';'）]

输出将是： ['整个血浆和d＆lt;来自2/2小鼠的1.006g / ml血浆密度分数显示出这种广泛的β-迁移模式（图1B）'，'不对称，3/3血浆显示β位置几乎没有脂质染色。']

Answer 4

此模式匹配b部分：org.activiti.spring.SpringProcessEngineConfiguration
您可以使用代码拆分两部分检查regex101中的here

使用Python中的re包将句子分成子句

4 个答案: