C#和VB.NET中的字符串处理对我来说很容易,但理解如何在F#中做同样的事情并不那么容易。我正在读两本Apress F#书(基础和专家)。大多数样本都是数字运算,而且我认为,字符串操作很少。特别是 seq {sequence-expression} 和列表的样本。
我有一个C#程序,我想转换为F#。这是它的作用:
这是一个简单的例子,说明我可以在C#中做什么,但还没有在F#中。
假设这是一个文本文件:
命令,最高法院,纽约县 (Paul G Someone),进入3月18日, 2008年,在个人行动中 旅途中受伤和摔倒 据称是由一个坑洞造成的 被告人的疏忽城市或 合并麦克弗森和第三方 联合麦克弗森反对的行动 它的承包商(Mallen),就此而言 上诉,否认,不合时宜, 马伦对简易判决的动议 驳回投诉和 第三方投诉,一致通过 肯定,没有任何费用。
缔约方获得了很大的自由 制定程序课程 通过法院,通过规定或 除此以外。因此,我们肯定否认 从那时起,马伦的动议不合时宜 马伦没有为迟到的借口 备案。
我得到了这个输出:
2 Paragraphs
3 Lines
109 Words
Found Tokens: 2
Token insofar: ocurrence(s) 1: position(s): 52
Token thus: ocurrence(s) 1: position(s): 91
行应该被称为句子:(
有几个令牌。我会说超过100个按类分组。我必须多次迭代同一文本,试图匹配每个令牌。这是代码的一部分。它显示了我如何分割句子,将它们放在ListBox中,这有助于轻松获取项目数。这适用于段落,句子和标记。它还展示了我如何依赖和预备。我想通过使用 seq {sequence-expression} 和列表和seq.iter或List.iter以及任何匹配令牌来避免这种方法...这是必要的。
/// <summary>
/// split the text into sentences and displays
/// the results in a list box
/// </summary>
private void btnParseText_Click(object sender, EventArgs e)
{
lstLines.Items.Clear();
ArrayList al = SplitLines(richTextBoxParagraphs.Text);
for (int i = 0; i < al.Count; i++)
//populate a list box
lstLines.Items.Add(al[i].ToString());
}
/// <summary>
/// parse a body of text into sentences
/// </summary>
private ArrayList SplitLines(string sText)
{
// array list tto hold the sentences
ArrayList al = new ArrayList();
// split the lines regexp
string[] splitLines =
Regex.Split(sText, @"(?<=['""A-Za-z0-9][\.\!\?])\s+(?=[A-Z])");
// loop the sentences
for (int i = 0; i < splitLines.Length; i++)
{
string sOneLine =
splitLines[i].Replace(Environment.NewLine, string.Empty);
al.Add(sOneLine.Trim());
}
// update statistics
lblLineCount.Text = "Line Count: " +
GetLineCount(splitLines).ToString();
// words
lblWordCount.Text = "Word Count: " +
GetWordCount(al).ToString();
// tokens
lblTokenCount.Text = "Token Count: " +
GetTokenCount(al).ToString();
// return the arraylist
return al;
}
/// <summary>
/// count of all words contained in the ArrayList
/// </summary>
public int GetWordCount(ArrayList allLines)
{
// return value
int rtn = 0;
// iterate through list
foreach (string sLine in allLines)
{
// empty space is the split char
char[] arrSplitChars = {' '};
// create a string array and populate
string[] arrWords = sSentence.Split(arrSplitChars, StringSplitOptions.RemoveEmptyEntries);
rtn += arrWords.Length;
}
// return word count
return rtn;
}
实际上,它是一个非常简单的Windows应用程序。包含一个RichTextBox和三个ListBox(已找到段落,行,标记)的表单,用于显示输出的标签和一个按钮。
答案 0 :(得分:5)
Brian有一个良好的开端,但功能代码将更多地关注你正在尝试做什么而不是“如何”。
我们可以从类似的方式开始:
open System
open System.Text.RegularExpressions
let text = @"Order, Supreme Court, New York County (Paul G Someone), entered..."
let lines = text.Split([|Environment.NewLine|], StringSplitOptions.None)
首先,让我们看看段落。我喜欢Brian的方法来计算分隔段落的空行。因此,我们过滤以仅查找空行,计算它们,然后根据该值返回我们的段落计数:
let numParagraphs =
let blankLines = lines |> Seq.filter (fun line -> Regex.IsMatch(line, @"^\s*$"))
|> Seq.length
blankLines + 1
对于句子,我们可以将全文视为一系列字符并计算句子结尾字符的数量。因为它是F#,让我们使用模式匹配:
let numSentences =
let isSentenceEndChar c = match c with
| '.' | '!' | '?' -> true
| _ -> false
text |> Seq.filter isSentenceEndChar
|> Seq.length
匹配单词可以像简单的正则表达式一样简单,但可能会因您想要处理标点符号而有所不同:
let words = Regex.Split(text, "\s+")
let numWords = words.Length
numParagraphs |> printfn "%d paragraphs"
numSentences |> printfn "%d sentences"
numWords |> printfn "%d words"
最后,我们定义了一个打印令牌出现的函数,可以很容易地应用于令牌列表:
let findToken token =
let tokenMatch (word : string) = word.Equals(token, StringComparison.OrdinalIgnoreCase)
words |> Seq.iteri (fun n word ->
if tokenMatch word then
printfn "Found %s at word %d" word n
)
let tokensToFind = ["insofar"; "thus"; "the"]
tokensToFind |> Seq.iter findToken
请注意,由于其尾随逗号,此代码找不到“因此”。您可能希望调整生成words
的方式或定义tokenMatch
。
答案 1 :(得分:1)
你应该在问题中发布你的C#代码(听起来有点像家庭作业,如果你证明你已经用一种语言完成了工作并且真的想要了解更多关于另一种语言的话,那么人们会更有信心。)
这里不一定有很多F#特定的,你可以在任何.Net语言中做到这一点。您可以使用多种策略,例如下面我使用正则表达式来排除单词......但下面只有几个F#成语。
open System
open System.Text.RegularExpressions
let text = @"Order, Supreme Court, New York County (Paul G Someone), entered
March 18, 2008, which, in an action for personal injuries sustained in a
trip and fall over a pothole allegedly created by the negligence of
defendants City or Consolidated McPherson, and a third-party action by
Consolidated McPherson against its contractor (Mallen), insofar as appealed
from, denied, as untimely, Mallen's motion for summary judgment dismissing
the complaint and third-party complaint, unanimously affirmed, without costs.
Parties are afforded great latitude in charting their procedural course
through the courts, by stipulation or otherwise. Thus, we affirm the denial
of Mallen's motion as untimely since Mallen offered no excuse for the late
filing."
let lines = text.Split([|'\n'|])
// If was in file, could use
//let lines = System.IO.File.ReadAllLines(@"c:\path\filename.txt")
// just like C#. For this example, assume have giant string above
let fullText = String.Join(" ", lines)
let numParagraphs =
let mutable count = 1
for line in lines do
// look for blank lines, assume each delimits another paragraph
if Regex.IsMatch(line, @"^\s*$") then
count <- count + 1
count
let numSentences =
let mutable count = 1
for c in fullText do
if c = '.' || c = '!' || c = '?' then
count <- count + 1
count
let words =
let wordRegex = new Regex(@"\b(\w+)\b")
let fullText = String.Join(" ", lines)
[| for m in wordRegex.Matches(fullText) do
yield m.Value |]
printfn "%d paragraphs" numParagraphs
printfn "%d sentences" numSentences
printfn "%d words" words.Length
let Find token =
words |> Seq.iteri (fun n word ->
if 0=String.Compare(word, token,
StringComparison.OrdinalIgnoreCase) then
printfn "Found %s at word %d" word n
)
let tokensToFind = ["insofar"; "thus"; "the"]
for token in tokensToFind do
Find token
答案 2 :(得分:0)
你能发布你的C#程序吗? (编辑你的问题)
我认为您可以在F#中以非常类似的方式实现此功能,除非您的原始代码严重基于更改变量(我在问题描述中没有看到原因)。
如果您在C#中使用String.Split
:它基本上是相同的:
open System
let results = "Hello World".Split [|' '|]
let results2 = "Hello, World".Split ([| ", "|], StringSplitOptions.None)
为了连接结果序列,您可以合并yield
和yield!
。
抽象示例
let list = [ yield! [1..8]; for i in 3..10 do yield i * i ]