Python-解析行分隔符的文本,并为新行和段落添加html标签

时间:2019-04-24 15:54:48

标签: python html parsing text delimited-text

我正在继承一个以@@@作为行定界符的CSV文件。我已经创建了一些Python代码,以在@@@处分割文本并将文本分割为单独的行。我的问题是,原始文本的格式为段落和换行符,但是我的代码将所有内容组合在一起。这将进入MS Access表格,供人们查看病历。保留原始格式将是理想的。由于我认为这是不可能的,因此我希望我可以为换行添加一个“ br” html标签,为一个段落添加一个“ p” html标签。我的问题是我不知道如何让Python换行或换行。

我的输出看起来像这样(不理想):

ID |报告文字

1 |糖尿病(DM),通常称为糖尿病     以高血糖为特征的一组代谢紊乱     长时间糖尿病是由于胰腺而不是糖尿病     产生足够的胰岛素,或人体细胞无反应     适当地产生胰岛素。糖尿病有三种主要类型     糖尿病:-1型糖尿病是由于胰腺无法生产所致     由于失去了β细胞而产生了足够的胰岛素。该表格以前是     称为“胰岛素依赖型糖尿病”(IDDM)或“少年”     糖尿病”。原因未知。-2型糖尿病始于胰岛素     抵抗力,即细胞对胰岛素无反应的情况     正确地。随着疾病的进展,也可能会缺乏胰岛素。

2 |妊娠糖尿病是第三种主要形式,在怀孕时发生     没有糖尿病史的女性会产生高血糖     水平。不要得糖尿病,你会好起来的!'

(有关短语参考的代码,请参见代码)。我希望能够添加html标签,以便富文本MS Access表单将保持换行符和段落格式。例如,在第一段中,我希望在“长时间”之后添加“ p”标签。在第三段中,“原因未知”后面会出现“ br”标记。预先谢谢你!

t = '@@@ Diabetes mellitus (DM), commonly known as diabetes, is a 
group of metabolic disorders characterized by high blood sugar levels 
over a prolonged period.  

Diabetes is due to either the pancreas not producing enough insulin,
or the cells of the body not responding properly to the insulin produced.
There are three main types of diabetes mellitus:

- Type 1 DM results from the pancreas' failure to produce enough insulin 
due to loss of beta cells. This form was previously referred to 
as "insulin-dependent diabetes mellitus" (IDDM) or "juvenile 
diabetes".The cause is unknown.
- Type 2 DM begins with insulin resistance, a condition in which cells 
fail to respond to insulin properly. As the disease progresses, a lack of 
insulin may also develop. 

@@@ Gestational diabetes is the third main form, and occurs when pregnant 
women without a previous history of diabetes develop high blood sugar 
levels. 

Do not get diabetes and you will be okay!'

data = list(enumerate( (x.strip() for x in t.split("@@@") if x.strip()),
1))

print(data)
print("")

import csv
with open("t.txt", "w", newline = "") as csvfile:
   writer = csv.writer(csvfile, delimiter='|')
   writer.writerow(('ID', 'Reporttext'))
   writer.writerows(data)

print( open("t.csv").read())

0 个答案:

没有答案