我想从文本文件中获取所有文本并转换为xml文件

时间:2015-07-09 14:50:01

标签: c# asp.net xml

这是我的文本文件的格式我想特别提取文字和标题,所以请建议我如何做。现在我尝试首先将此文本转换为数据集,但在此文本中无法获取文本节点值。

BoxID{26027836}
Left{8.346457}
Top{14.296841}
Bottom{20.563504}
Right{12.952756}
Text{8.346457,14.819063,20.563504,12.952737 Tika R Pradhan

Kathmandu, January 24

The central committee of the UCPN-Maoist has told party Chairman Pushpa Kamal Dahal to present his proposal for the revival of the party.

According to senior leader Haribol Gajurel, 98 CC members have already aired their views at the ongoing CC meeting demanding a new lease of life for the party.  The CC decided to resume its meeting on Tuesday, giving the party chief the time to present a proposal on party restructuring. The hiatus also takes into account the Legislature-Parliament meeting slated for Monday. 

Gajurel said discussions on fresh issues will begin after chairman presents his plan on Tuesday. 

He said Dahal will float his plan for party restructuring, adding that the plan will also cover ethics that the party’s rank and file should adhere to. 

The CC meeting that kicked off eight days ago dwelt on the political document that Dahal had presented. The document has raised questions without bothering to offer solutions. 

During today’s meeting, party leader Renu Chand pointed that people have made negative comments about Chairman’s son Prakash Dahal. She urged Dahal to remove his son from his secretariat. Krishna Dhital, a party leader from Gorkha, asked party leader Narayan Kaji Shrestha to explain why he appointed a lady personal secretary. 
  EOF9.000000}
Text{10.038215,15.732951,19.470202,12.834632 Himalayan News Service 

Kathmandu, January 24

UCPN-Maoist leaders have started discussions on selection of the party’s parliamentary party leader. 

With former vice-chairman Baburam announcing that he does not want to hold any political position for at least a year, party Chairman Pushpa Kamal Dahal is under pressure not to stake his claim on the position. Sources claimed Dahal will support his close confidante Krishna Bahadur Mahara for the position. 

Since Bhattarai is against Mahara, he will field his close aide Top Bahadur Rayamajhi for the position. Another former vice-chairman Narayan Kaji Shrestha will back Giriraj Mani Pokhrel. 

Some leaders close to Dahal claim that Dahal himself may take up the responsibility of the PP leader if subordinates start quarrelling for the position. Others say the party will settle the issue by making Mahara the PP leader, Rayamajhi deputy leader and Pokhrel the chief whip of the PP. 

Senior leader Shrestha, however, said top leaders have not yet discussed the selection of the PP leader. He said the meeting of the central office, which will begin tomorrow, will begin discussions on the issue. He said all options, including Dahal and Mahara’s PP leadership, are open for discussion. 
  EOF8.500000}
Headline{10.038215,15.013414,15.851079,12.834632 Parliamentary Party leadership in focus  EOF27.000000}
Headline{8.346457,14.296841,14.819063,12.952738 UCPN-M revival plan sought  EOF38.000000}

BoxID{25861210}
Left{11.161417}
Top{5.569194}
Bottom{5.680180}
Right{12.952756}
Text{11.161417,5.569194,5.680180,12.952756 THT
  EOF5.000000}

BoxID{26027216}
Left{8.346287}
Top{8.552401}
Bottom{14.166286}
Right{10.314763}
Headline{8.346287,8.552401,9.727400,10.314763 TB patients jeopardising their own lives   EOF29.000000}
Text{8.346287,9.727400,14.166286,10.314763 Himalayan News Service

Dipayal, January 24

TB patients in Doti district have been playing with their own lives by not adhering to the Directly Observed Treatment Shortcourse, an intensive treatment method against tuberculosis. 

For eight months, medical personnel administer doses to TB patients and perform health checkups. But this method is proving ineffective in treating the respiratory disease with the patients not bothering to visit designated centres for prescribed doses, says focal person and District Public Health Officer, Bhim Prasad Paudel. “We remind patients to take medicine on time, but they pay no heed,” he says. Forty per cent of around 300 patients on DOTS have not been abiding by the rules. Paudel blames it on the lack of adequate knowledge among patients about DOTS and adverse consequences that result from non-adherence to the medication schedule. 

DPHO Chief Mahendradhwaj Adhikari says only a handful of patients approach designated hospitals for treatment regularly. “In many cases, they come only at the last stage,” he observes. 
  EOF9.000000}

1 个答案:

答案 0 :(得分:0)

以下内容应达到您所寻求的目标。我在这里提供的解决方案会将源文本中的所有数据转换为XML。您可能需要修改我提供的内容以更具体地满足您的要求,但以下内容应该为您提供良好的开端。

string fileContents = File.ReadAllText("input.txt"); //your example source text is in this file
string pattern = @"(.*)?\{([^}]+)\}";

MatchCollection matches = Regex.Matches(fileContents, pattern, RegexOptions.Multiline);

StringBuilder sb = new StringBuilder();
sb.AppendLine("<?xml version='1.0' encoding='UTF-8'?>");
sb.AppendLine("<contents>");

foreach (Match match in matches)
{
    string nodeName = match.Groups[1].Value;
    string nodeValue = match.Groups[2].Value;

    sb.AppendFormat("<{0}>{1}</{0}>", nodeName.ToLower(), nodeValue);
    sb.AppendLine();
}
sb.AppendLine("</contents>");

File.WriteAllText("output.xml", sb.ToString());

&#34; input.txt&#34;文件包含您的源文本。出于本示例的目的,我在转储到文件之前使用StringBuilder创建XML字符串。显然,如果您需要在转换后立即使用XML,那么您需要使用XDocument(或XMLDocument,具体取决于您使用的.Net版本)。 StringBuilder比XDocument / XMLDocument简单快,这就是我使用它来创建输出XML文件的原因。