电影文件名的C#正则表达式

时间:2011-02-15 12:24:27

标签: c# .net regex replace movie

我一直试图使用C#Regex从电影名称中删除某些字符串失败。

我正在使用的文件名示例如下:

EuroTrip(2004)[SD]

Event Horizo​​n(1997)[720]

快速&愤怒(2009)[1080p]

星际迷航(2009)[未知]

我想删除方括号或括号中的任何内容(包括括号本身)

到目前为止,我正在使用:

movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([*\\(\\d{4}\\)])", "");

这似乎删除了年份和括号确定,但我无法弄清楚如何删除方括号和内容而不影响其他部分...我有杂项结果,但最接近的一个是:< / p>

movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([?\\[+A-Z+\\]])", "");

让我离开了:

urorip(2004)

而不是:

EuroTrip(2004)[SD]

任何留在末端的空白都可以,因为我只会执行

movieTitleToFetch = movieTitleToFetch.Trim();

最后。

提前致谢,

亚历

7 个答案:

答案 0 :(得分:3)

这个正则表达式模式应该可行......可能需要一些调整

"[\[\(].+?[\]\)]"

Regex.Replace(movieTitleToFetch, @"[\[\(].+?[\]\)]", "");

这应匹配任何来自“[”或“(”直到下一次出现“]”或“)”

如果这不起作用,请尝试删除括号的转义字符,如此...

Regex.Replace(movieTitleToFetch, @"[\[(].+?[\])]", "");

答案 1 :(得分:1)

@Craigt非常有用,但确保括号匹配可能更清晰。

([\[].*?[\]]|[\(].*?[\)]) 

答案 2 :(得分:0)

我们不能使用它: -

if(movieTitleToFetch.Contains("("))
         movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("("));

以上代码肯定会为您提供完整的电影片名: -

EuroTrip(2004)[SD]

Event Horizo​​n(1997)[720]

快速&amp;愤怒(2009)[1080p]

星际迷航(2009)[未知]

如果出现您不会有年份但只有类型的情况,即:

EuroTrip [SD]

Event Horizo​​n [720]

快速&amp;愤怒的[1080p]

星际迷航[未知]

然后使用此

if(movieTitleToFetch.Contains("("))
         movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("("));
else if(movieTitleToFetch.Contains("["))
         movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("["));

答案 3 :(得分:0)

这就是诀窍:

@"(\[[^\]]*\])|(\([^\)]*\))"

它删除从“[”到下一个“]”的任何内容以及从“(”到下一个“)”的任何内容。

答案 4 :(得分:0)

你能使用:

string MovieTitle="Star Trek (2009) [Unknown]";
movieTitleToFetch= MovieTitle.IndexOf('(')>MovieTitle.IndexOf('[')?
                    MovieTitle.Substring(0,MovieTitle.IndexOf('[')):
                    MovieTitle.Substring(0,MovieTitle.IndexOf('('));

答案 5 :(得分:0)

我提出了与您的任何示例匹配的.+\s(?<year>\(\d{4}\))\s(?<format>\[\w+\]),并将年份和格式包含为命名捕获组,以帮助您替换它们。

此模式转换为:

任何角色,一个或多个回复
空白
文字'('后跟4位数后跟字面')'(年)
空白
文字'['后跟字母数字,一个或多个重复,后跟文字']'(格式)

答案 6 :(得分:0)

我知道我在这个帖子上已经迟到但我写了一个简单的算法来清理下载的电影文件名。

执行以下步骤:

  1. 删除括号中的所有内容(如果找到一年,它会尝试保留信息)
  2. 删除常用词汇列表(720p,bdrip,h264等......)
  3. 假设标题中可以是语言信息,并在剩余字符串末尾(特殊字词之前)删除它们
  4. 如果在括号中找不到一年,则查看剩余字符串的结尾(对于语言)
  5. 执行此操作会替换点和空格,以便标题准备就绪,例如,作为搜索API的查询。

    这是XUnit中的测试(我使用大多数意大利语标题来测试它)

    using Grappachu.Movideo.Core.Helpers.TitleCleaner;
    using SharpTestsEx;
    using Xunit;
    
    namespace Grappachu.MoVideo.Test
    {
        public class TitleCleanerTest
        {
            [Theory]
            [InlineData("Avengers.Confidential.La.Vedova.Nera.E.Punisher.2014.iTALiAN.Bluray.720p.x264 - BG.mkv",
                "Avengers Confidential La Vedova Nera E Punisher", 2014)]
            [InlineData("Fuck You, Prof! (2013) BDRip 720p HEVC ITA GER AC3 Multi Sub PirateMKV.mkv",
                "Fuck You, Prof!", 2013)]
            [InlineData("Il Libro della Giungla(2016)(BDrip1080p_H264_AC3 5.1 Ita Eng_Sub Ita Eng)by siste82.avi",
                "Il Libro della Giungla", 2016)]
            [InlineData("Il primo dei bugiardi (2009) [Mux by Little-Boy]", "Il primo dei bugiardi", 2009)]
            [InlineData("Il.Viaggio.Di.Arlo-The.Good.Dinosaur.2015.DTS.ITA.ENG.1080p.BluRay.x264-BLUWORLD",
                "il viaggio di arlo", 2015)]
            [InlineData("La Mafia Uccide Solo D'estate 2013 .avi",
                "La Mafia Uccide Solo D'estate", 2013)]
            [InlineData("Ip.Man.3.2015.iTA.AC3.5.1.448.Chi.Aac.BluRay.m1080p.x264.Sub.[scambiofile.info].mkv",
                "Ip Man 3", 2015)]
            [InlineData("Inferno.2016.BluRay.1080p.AC3.ITA.AC3.ENG.Subs.x264-WGZ.mkv",
                "Inferno", 2016)]
            [InlineData("Ghostbusters.2016.iTALiAN.BDRiP.EXTENDED.XviD-HDi.mp4",
                "Ghostbusters", 2016)]
            [InlineData("Transcendence.mkv", "Transcendence", null)]
            [InlineData("Being Human (Forsyth, 1994).mkv", "Being Human", 1994)]
            public void Clean_should_return_title_and_year_when_possible(string filename, string title, int? year)
            {
                var res = MovieTitleCleaner.Clean(filename);
    
                res.Title.ToLowerInvariant().Should().Be.EqualTo(title.ToLowerInvariant());
                res.Year.Should().Be.EqualTo(year);
            }
        }
    }
    

    和fisrt版本的代码

    using System;
    using System.Globalization;
    using System.IO;
    using System.Linq;
    using System.Text.RegularExpressions; 
    
    namespace Grappachu.Movideo.Core.Helpers.TitleCleaner
    {
        public class MovieTitleCleanerResult
        {
            public string Title { get; set; }
            public int? Year { get; set; }
            public string SubTitle { get; set; }
        }
    
        public class MovieTitleCleaner
        {
            private const string SpecialMarker = "§=§";
            private static readonly string[] ReservedWords;
            private static readonly string[] SpaceChars;
            private static readonly string[] Languages;
    
            static MovieTitleCleaner()
            {
                ReservedWords = new[]
                {
                    SpecialMarker, "hevc", "bdrip", "Bluray", "x264", "h264", "AC3", "DTS", "480p", "720p", "1080p"
                };
                var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
                var l = cultures.Select(x => x.EnglishName).ToList();
                l.AddRange(cultures.Select(x => x.ThreeLetterISOLanguageName));
                Languages = l.Distinct().ToArray();
    
    
                SpaceChars = new[] {".", "_", " "};
            }
    
    
            public static MovieTitleCleanerResult Clean(string filename)
            {
                var temp = Path.GetFileNameWithoutExtension(filename);
                int? maybeYear = null;
    
                // Remove what's inside brackets trying to keep year info.
                temp = RemoveBrackets(temp, '{', '}', ref maybeYear);
                temp = RemoveBrackets(temp, '[', ']', ref maybeYear);
                temp = RemoveBrackets(temp, '(', ')', ref maybeYear);
    
                // Removes special markers (codec, formats, ecc...)
                var tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
                var title = string.Empty;
                for (var i = 0; i < tokens.Length; i++)
                {
                    var tok = tokens[i];
                    if (ReservedWords.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase)))
                    {
                        if (title.Length > 0)
                            break;
                    }
                    else
                    {
                        title = string.Join(" ", title, tok).Trim();
                    }
                }
                temp = title;
    
                // Remove languages infos when are found before special markers (should not remove "English" if it's inside the title)
                tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
                for (var i = tokens.Length - 1; i >= 0; i--)
                {
                    var tok = tokens[i];
                    if (Languages.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase)))
                        tokens[i] = string.Empty;
                    else
                        break;
                }
                title = string.Join(" ", tokens).Trim();
    
    
                // If year is not found inside parenthesis try to catch at the end, just after the title
                if (!maybeYear.HasValue)
                {
                    var resplit = title.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
                    var last = resplit.Last();
                    if (LooksLikeYear(last))
                    {
                        maybeYear = int.Parse(last);
                        title = title.Replace(last, string.Empty).Trim();
                    }
                }
    
    
                // TODO: review this. when there's one dash separates main title from subtitle 
                var res = new MovieTitleCleanerResult();
                res.Year = maybeYear;
                if (title.Count(x => x == '-') == 1)
                {
                    var sp = title.Split('-');
                    res.Title = sp[0];
                    res.SubTitle = sp[1];
                }
                else
                {
                    res.Title = title;
                }
    
    
                return res;
            }
    
            private static string RemoveBrackets(string inputString, char openChar, char closeChar, ref int? maybeYear)
            {
                var str = inputString;
                while (str.IndexOf(openChar) > 0 && str.IndexOf(closeChar) > 0)
                {
                    var dataGraph = str.GetBetween(openChar.ToString(), closeChar.ToString());
                    if (LooksLikeYear(dataGraph))
                    {
                        maybeYear = int.Parse(dataGraph);
                    }
                    else
                    {
                        var parts = dataGraph.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
                        foreach (var part in parts)
                            if (LooksLikeYear(part))
                            {
                                maybeYear = int.Parse(part);
                                break;
                            }
                    }
                    str = str.ReplaceBetween(openChar, closeChar, string.Format(" {0} ", SpecialMarker));
                }
                return str;
            }
    
            private static bool LooksLikeYear(string dataRound)
            {
                return Regex.IsMatch(dataRound, "^(19|20)[0-9][0-9]");
            }
        }
    
    
        public static class StringUtils
        {
            public static string GetBetween(this string src, string a, string b,
                StringComparison comparison = StringComparison.Ordinal)
            {
                var idxStr = src.IndexOf(a, comparison);
                var idxEnd = src.IndexOf(b, comparison);
                if (idxStr >= 0 && idxEnd > 0)
                {
                    if (idxStr > idxEnd)
                        Swap(ref idxStr, ref idxEnd);
                    return src.Substring(idxStr + a.Length, idxEnd - idxStr - a.Length);
                }
                return src;
            }
    
            private static void Swap<T>(ref T idxStr, ref T idxEnd)
            {
                var temp = idxEnd;
                idxEnd = idxStr;
                idxStr = temp;
            }
    
            public static string ReplaceBetween(this string s, char begin, char end, string replacement = null)
            {
                var regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
                return regex.Replace(s, replacement ?? string.Empty);
            }
        }
    }