JavaScript Regex从文本文件中获取标题和副标题

时间:2015-11-27 14:47:36

标签: javascript regex

我有以下文字,它来自* .text文件:

1)TEXTDATA.TXT

A 57-year-old female presents to the office with fatigue, jaundice and dyspnea. On physical exam you note her face is pale. Laboratory testing shows slightly elevated MCV, increased LDH, indirect bilirubin, and reticulocytes. Positive Direct Coombs test shows antibodies on RBCs and peripheral smear shows spherocytes. What is the most likely diagnosis?

A. Glucose-6-phospate dehydrogenase (G6PD) deficiency
B. Vitamin B12 deficiency
C. Paroxysmal nocturnal hemoglobinuria (PNH)
D. Iron deficiency anemia
E. Autoimmune hemolytic anemia

The correct answer is (E) Autoimmune hemolytic anemia This patient most likely has warm autoimmune hemolytic anemia as evidenced by her positive Direct Coombs test, elevated MCV, increased LDH, indirect bilirubin, and reticulocytes. Warm autoimmune hemolytic anemias are idiopathic or associated with autoimmune processes (SLE), drugs, lymphoproliferative disorders (CLL) and typically present with severe anemia (pallor, jaundice, fatigue, dyspnea). Peripheral smear can show spherocytes.

Choice A (Glucose-6-phospate dehydrogenase (G6PD) deficiency) is incorrect. G6PD is a X-linked recessive disease, which is seen more commonly in males.

Choice B (Vitamin B12 deficiency) is incorrect. Pernicious anemia typically presents with peripheral neuropathy, fatigue, leg stiffness, ataxia, memory impairment, and depression.

Choice C (Paroxysmal nocturnal hemoglobinuria (PNH)) is incorrect. Paroxysmal nocturnal hemoglobinuria presents with intermittent dark colored urine in the morning.

Choice D (Iron deficiency anemia) is incorrect. Iron deficiency anemia is associated with decreased Hgb, hematocrit, serum Fe, ferritin, transferrin saturation, and MCV, increased TIBC and RDW.


AUTOIMMUNE HEMOLYTIC ANEMIA
Hemolytic anemia

Ax: 
Warm autoimmune hemolytic anemias are idiopathic or associated with autoimmune processes (SLE), drugs, lymphoproliferative disorders (CLL).

1)我已经更新了TEXTDATA.TXT,我正在尝试查找最后一个"选择X" to" Axe:"是否有任何简单的技巧。我的代码看起来

var string = string.toString().substring(fileContent.indexOf("Choice E") + 8, string.indexOf("Cx:") - 3); 

它有点不适合最后的选择,因为选择是D"选择D"。

2)我只需要Title =" AUTOIMMUNE HEMOLYTIC ANEMIA"和副标题="溶血性贫血"来自TEXTDATA.TXT文件如果我在最后一个"选择X"之间获得内容,那就完美了。到" Axe:"。

代码:

var ifdtdata = string.toString().substring(string.indexOf("Choice E") + 8, string.indexOf("Cx:") - 3);

titleifdt = /(?:\r?\n){2}([A-Z].*)/.exec(ifdtdata);
subifdt = /(?:\r?\n){2}([A-Z].*)\r?\n(.*)/.exec(ifdtdata);

ifdtdata = ifdtdata.replace(/[^a-z0-9 ,.?!]/ig, '');
if(valUndefinedNull(subifdt) == false){
       subifdt = /([A-Z0-9 ]*[A-Z]{2,}?)([A-Z][a-z]+[^.]*)/.exec(ifdtdata);
}
if(valUndefinedNull(titleifdt) == false){
       titleifdt = /([A-Z0-9 ]*[A-Z]{2,}?)([A-Z][a-z]+[^.]*)/.exec(ifdtdata);
}

3 个答案:

答案 0 :(得分:3)

我假设你需要第二个"有意义的"的内容。线。您可以使用与任何类型的换行符匹配的正则表达式拆分内容,并仅抓取第二个元素。由于换行符中可以有\r符号,因此我建议使用以下示例代码:



var s = "TITLE X (CD55 and CD59 markers) are positive in paroxysmal nocturnal hemoglobinuria (PNH).\n\nAUTOIMMUNE HEMOLYTIC ANEMIA\nHemolytic anemia\n\nTITLE Z: Warm autoimmune hemolytic anemias are idiopathic or associated with autoimmune processes (SLE)";
var arr = s.replace(/^\s*|\s*$/g, '').split(/[\r\n]+/);
document.write(arr[1]);




使用.replace(/^\s*|\s*$/g, ''),您可以修剪输入,使用.split(/[\r\n]+/);可以将内容拆分为单独的行,而不是Windows / Linux / MacOS文本文件。

如果您需要在第一个双线换行符后以大写字母开头的第一行,请使用



var s = "TITLE X (CD55 and CD59 markers) are positive in paroxysmal nocturnal hemoglobinuria (PNH).\n\nAUTOIMMUNE HEMOLYTIC ANEMIA\nHemolytic anemia\n\nTITLE Z: Warm autoimmune hemolytic anemias are idiopathic or associated with autoimmune processes (SLE)";
var m = /(?:\r?\n){2}([A-Z].*)/.exec(s);
if (m !== null)
  document.write(m[1]);




这里,正则表达式匹配:

  • (?:\r?\n){2} - 两个换行符
  • ([A-Z].*) - 以大写字母[A-Z]开头的行,然后尽可能多地匹配除换行符之外的所有符号(贪婪)。此值将显示在m[1]

<强>更新

要查找副标题,请使用

&#13;
&#13;
var s = "TITLE X (CD55 and CD59 markers) are positive in paroxysmal nocturnal hemoglobinuria (PNH).\n\nAUTOIMMUNE HEMOLYTIC ANEMIA\nHemolytic anemia\n\nTITLE Z: Warm autoimmune hemolytic anemias are idiopathic or associated with autoimmune processes (SLE)";
var m = /(?:\r?\n){2}([A-Z].*)\r?\n(.*)/.exec(s);
if (m !== null){
  document.write("Title: " + m[1] + "<br/>Subtitle: " + m[2]);
}
&#13;
&#13;
&#13;

答案 1 :(得分:0)

在这里,我只是将新的线条带到了新的线条上(从0开始,所以它的数据为2)

var title = fileContent.split("\n")[2]
console.log(title);

答案 2 :(得分:0)

我只想匹配所有大写字母中的行,并且仅匹配第一个匹配:/^\[A-Z\W 0-9\]{3,}$/m