从字符串中解析城市和州

时间:2015-02-03 01:59:16

标签: c# string parsing

我创建的搜索类似于yp.com上的搜索,我需要从文本框中解析城市名称和州名称。城市可以是多个单词,州可以是全名或缩写。城市和州之间可能有逗号,但也可能没有。

示例:

Grand Rapids, New Mexico
Grand Rapids New Mexico
Grand Rapids, NM
Grand Rapids NM

如果有逗号,这很容易做,但如果没有逗号,我根本不确定怎么做。

2 个答案:

答案 0 :(得分:1)

它实际上需要比我想象的更多的逻辑,但这应该有效。

var entries = new List<string[]>(); // List of entries
foreach (var e in str.Split('\n')) // Splits by new line .. Can be modified to whatever ...
{
    if (string.IsNullOrWhiteSpace(e) || !e.Contains(" ")) // If the string is empty, whitespace or doesn't contain a space
        continue; // Skip to next line
    string[] entry; // Entry holder ...
    if (e.Contains(",")) // If the entry contains ","
    {
        entry = e.Split(','); // Split it by ,
        entries.Add(new string[] { entry[1].Trim(), entry[0].Trim() }); // The two entries should be the state and city, so add it to the entries
        continue; // Skip to next line
    }

    entry = e.Split(' '); // Splits the entry by space
    if (entry.Length < 2) // If there is less than two entries
        continue; // Skip to next line

    if (entry.Length > 2) // Checks if there are more than two entries Ex. "Grand Rapids New Mexico"
    {
        var statePart1 = entry[entry.Length - 2]; // Gets the first part of the state
        var statePart2 = entry[entry.Length - 1]; // Gets the second part of the state

        // Note: statePart1 is invalid if the state only has one "word", statePart2 is valid in this case

        if (statePart1 == "North" || statePart1 == "South" || statePart1 == "West" || statePart1 == "New") // Checks if statePart1 is valid
        {
            int stateSize = statePart1.Length + statePart2.Length + 2; // Gets the state string size
            var state = string.Format("{0} {1}", statePart1, statePart2); // Creates the state string
            var city = e.Substring(0, e.Length - stateSize); // Gets the city string
            entries.Add(new string[] { state, city }); // Adds the entry to the entries
        }
        else
        {
            // If statePart1 is not valid then the state is a single "word"
            int cityLength = e.LastIndexOf(' '); // Gets the length of the city
            entries.Add(new string[] { statePart2, e.Substring(0, cityLength) }); // Adds the entry to the entries
        }
    }
    else
    {
        // If there is only two entries then both the city and state has only one "word"
        entries.Add(new string[] { entry[1], entry[0] }); // Adds the entry to the entries
    }
}

您可以在

之后使用此类条目
foreach (var e in entries)
    Console.WriteLine("{0}, {1}", e[0], e[1]);

这可能导致类似:

string str = @"Grand Rapids New Mexico
Grand Rapids, NM
New York City New York
Jacksonville Florida
Bismarck North Dakota
Las Vegas Nevada";

输出......

New Mexico, Grand Rapids
NM, Grand Rapids
New York, New York City
Florida, Jacksonville
North Dakota, Bismarck
Nevada, Las Vegas

当然,假设您正在解析美国各州/城市。

答案 1 :(得分:1)

试试这段代码:

class Program
{
    static void Main(string[] args)
    {
        PrintCityState(GetCityState("Grand Rapids, New Mexico"));
        PrintCityState(GetCityState("Sacremento California"));
        PrintCityState(GetCityState("Indianpolis, IN"));
        PrintCityState(GetCityState("Phoenix AZ"));
    }

    public static void PrintCityState(CityState cs)
    {
        Console.WriteLine("{0}, {1} ({2})", cs.City, cs.StateAbbreviation, cs.StateName);
    }

    public static CityState GetCityState(string input)
    {
        string truncatedInput = input;
        var statesDictionary = new Dictionary<string, string>
        {
            {"AZ", "Arizona"},
            {"NM", "New Mexico"},
            {"CA", "California"},
            {"WA", "Washington"},
            {"OR", "Oregon"},
            {"MI", "Michigan"},
            {"IN", "Indiana"}
            // And so forth for all 50 states
        };
        var cityState = new CityState();

        foreach (KeyValuePair<string, string> kvp in statesDictionary)
        {
            if (input.Trim().ToLower().EndsWith(" " + kvp.Key.ToLower()))
            {
                cityState.StateName = kvp.Value;
                cityState.StateAbbreviation = kvp.Key;
                truncatedInput = input.Remove(input.Length - 1 - kvp.Key.Length);
                break;
            }
            if (input.Trim().ToLower().EndsWith(" " + kvp.Value.ToLower()))
            {
                cityState.StateName = kvp.Value;
                cityState.StateAbbreviation = kvp.Key;
                truncatedInput = input.Remove(input.Length - 1 - kvp.Value.Length);
                break;
            }
        }

        cityState.City = truncatedInput.Trim().Trim(',').Trim();
        return cityState;
    }
}

public class CityState
{
    public string City { get; set; }
    public string StateName { get; set; }
    public string StateAbbreviation { get; set; }
}

此代码使用州名和缩写词典。为简洁起见,我只添加了7个状态,但您可以添加全部50.它在输入字符串中搜索字典键或字典值的匹配项。如果它找到一个,它将删除状态,剩下的是城市。

确保在弗吉尼亚州之前添加西弗吉尼亚州,以便正确解析。