使用帮助程序类解析文本文件

时间:2018-04-17 18:32:42

标签: c# .net linq

我有一个文本文件,其中包含电影名称及其部分列表,如下所示:

xxx, Author1, v6
the net, author1, v7
xxx, author3, v10
DDLJ, author3, v11
the fire, author5, v6
the health, author1, v8
the health, author7, v2
the hero, author9, v11
the hero, author8, v3

我想获得最新版本的电影名称。在这种情况下,它应该返回" DDLJ"和#34;英雄"。

这就是我的尝试:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
namespace ProgramNamespace
{
    public class Program
    {
        public static List<String> processData(IEnumerable<string> lines)
        {           
            Dictionary<string, int> keyValuePairs = new Dictionary<string, int>();
            foreach (var item in lines)
            {
                string[] readsplitted = item.Split(',');              
                keyValuePairs.Add(readsplitted[0], Convert.ToInt32(
                    Regex.Replace(readsplitted[2], "[^0-9]+", string.Empty)));
            }

            //List<String> retVal = new List<String>();
            return retVal;
        }

        static void Main(string[] args)
        {
            try
            {
                List<String> retVal = processData(File.ReadAllLines(@"D:\input.txt"));
                File.WriteAllLines(@"D:\output.txt", retVal);
            }
            catch (IOException ex)
            {
                Console.WriteLine(ex.Message);
            }
        }
    }
}

请注意,如果需要,我想添加一个帮助类。

4 个答案:

答案 0 :(得分:1)

编辑:重复密钥的版本

我重写了我给出的第一个解决重复数据的解决方案。诀窍是在键之前添加一个渐进数字并用下划线分隔它:这样每个键都是唯一的。

E.g。你将把你的词典填写如下:

  

&#34; 1_xxx&#34;,6
  &#34; 2_the net&#34;,7
  &#34; 3_xxx&#34;,10
  &#34; 4_DDLJ&#34;,11
  ...

然后在提供结果之前删除数字(和下划线)。

public static List<String> processData(IEnumerable<string> lines)
{
    var keyValuePairs = new Dictionary<string, int>();

    int Position = 0;
    foreach (var item in lines)
    {
        Position++;
        string[] readsplitted = item.Split(',');
        keyValuePairs.Add(Position.ToString() +"_" + readsplitted[0], Convert.ToInt32(Regex.Replace(readsplitted[2], "[^0-9]+", string.Empty)));
    }
    var MaxVersion = keyValuePairs.Values.OrderByDescending(f => f).First();

    return keyValuePairs.Where(f => f.Value == MaxVersion).Select(f => string.Join("_", f.Key.Split('_').Skip(1))).ToList();
}

更详细:

  • keyValuePairs.Values将只返回版本号
  • .OrderByDescending(f => f).First()将按降序对版本号进行排序,然后选择第一个,即最高的
  • keyValuePairs.Where(f => f.Value == MaxVersion)将选择与
  • 上方最高版本对应的键值对
  • .Select(f => f.Key)会为您提供Dictionary的密钥,即标题

这样你也可以保留Dictionary; 如果你这样做了一次并且你不需要扩展你的代码或重用你的模型,你就不必创建其他类或使它变得比必要的更复杂。

答案 1 :(得分:1)

对于这些类型的任务,我通常更喜欢创建一个表示我们正在收集的数据的类,并为其提供一个TryParse方法,该方法将基于一行数据创建类的实例:

public class MovieInfo
{
    public string Name { get; set; }
    public string Author { get; set; }
    public int Version { get; set; }

    public static bool TryParse(string input, out MovieInfo result)
    {
        result = null;
        if (input == null) return false;

        var parts = input.Split(',');
        int version;

        if (parts.Length == 3 &&
            int.TryParse(parts[2].Trim().TrimStart('v'), out version))
        {
            result = new MovieInfo
            {
                Name = parts[0],
                Author = parts[1],
                Version = version
            };
        }

        return result != null;
    }

    public override string ToString()
    {
        return $"{Name} (v{Version}) - {Author}";
    }
}

然后,只需阅读文件,创建这些类的列表,并获得所有具有最高编号的文件:

public static List<MovieInfo> processData(IEnumerable<string> lines)
{
    if (lines == null) return null;

    var results = new List<MovieInfo>();

    foreach (var line in lines)
    {
        MovieInfo temp;

        if (MovieInfo.TryParse(line, out temp))
        {
            results.Add(temp);
        }
    }

    var maxVersion = results.Max(result => result.Version);

    return results.Where(result => result.Version == maxVersion).ToList();
}

例如:

private static void Main()
{
    var lines = new List<string>
    {
        "xxx, Author1, v6",
        "the net, author1, v7",
        "xxx, author3, v10",
        "DDLJ, author3, v11",
        "the fire, author5, v6",
        "the health, author1, v8",
        "the health, author7, v2",
        "the hero, author9, v11",
        "the hero, author8, v3",
    };

    var processed = processData(lines);

    foreach (var movie in processed)
    {
        // Note: this uses the overridden ToString method. You could just do 'movie.Name'
        Console.WriteLine(movie);
    }

    GetKeyFromUser("\nDone!! Press any key to exit...");
}

<强>输出

enter image description here

答案 2 :(得分:0)

我就是这样做的。这解释了获取最大版本相同的所有电影名称。

public static List<String> processData(string fileName)
{
    var lines = File.ReadAllLines(fileName);

    var values = lines.Select(x => 
    {
        var readsplitted = x.Split(',');
        return new { Name = readsplitted[0], Verison = int.Parse(readsplitted[2].Replace("v", string.Empty))};  
    });

    var maxValue= values.Max(x => x.Verison);

    return values.Where(v => v.Verison == maxValue)
    .Select(v => v.Name)
    .ToList();  
}

static void Main(string[] args)
{
    try
    {
        List<String> retVal = processData(@"D:\output.txt");
    }
    catch (IOException ex)
    {
        Console.WriteLine(ex.Message);
    }
}

答案 3 :(得分:0)

  1. 创建一个Movie类,以便为表示电影的每一行初始化对象。
  2. 首先通过','。
  3. 分隔传递给processData()的整个字符串
  4. 提取每部电影的版本号(从“v”开始),请参阅:extractNumberFromString()方法。
  5. 找到最大版本号并获取(使用linq查询)所有共享最大版本号的电影。

  6. public static List<Movie> processData(string s)
    {
        // list to store all movies
        List<Movie> allmovies = new List<Movie>();
    
        // first split by new line
        var splitbynewline = s.Split('\n');
        // split by ',' and initilize object
        foreach (var line in splitbynewline)
        {
            var moviestring = line.Split(',');
            // create new movie object
            Movie obj = new Movie { Name = moviestring[0], Author = moviestring[1], Version = moviestring[2] };
            obj.VersionNumber = extractNumberFromString(moviestring[2]);
            allmovies.Add(obj);
        }
    
        // get the max version number
        double maxver = allmovies.Max(x => x.VersionNumber);
        // set and returen list that containes all movies with max version
        List<Movie> result = allmovies.Where(x => x.VersionNumber == maxver).ToList();
    
        return result;
    }
    
    /// <summary>
    /// 
    /// convert number that exist in a string to an int32 for example sdfdf43gn will return as 43
    /// </summary>
    /// <param name="value">string that contains inside him as digits</param>
    /// <returns>int32</returns>
    public static double extractNumberFromString(string value)
    {
        string returnVal = string.Empty;
        System.Text.RegularExpressions.MatchCollection collection = System.Text.RegularExpressions.Regex.Matches(value, "\\d+");
        foreach (System.Text.RegularExpressions.Match m in collection)
        {
            returnVal += m.ToString();
        }
    
        return Convert.ToDouble(returnVal);
    }
    
    public class Movie
    {
        public string Name;
        public String Author;
        public string Version;
        public double VersionNumber;
    }