Python正则表达式确定换行并排除\ W

时间:2019-03-17 13:57:57

标签: python regex

我正在尝试从IMDb“即将推出”页面获取与独特电影相关的“类型”字符串。在Python中,正则表达式至少对我来说有点不同。这是我的源字符串,我必须对其进行处理,而不是DOM解析。

示例更新

Shazam! (2019)


132 min
                                  -  
                                Action
|
Adventure
|
Fantasy
|
Sci-Fi




    We all have a superhero inside us, it just takes a bit of magic to bring it out. In Billy Batson's case, by shouting out one word - SHAZAM! - this streetwise 14-year-old foster kid can turn into the adult superhero Shazam.                    

Director:

David F. Sandberg 


Stars:
Zachary Levi, 
Djimon Hounsou, 
Mark Strong, 
Michelle Borth






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Pet Sematary (2019)

Horror
|
Thriller




    Louis Creed, his wife Rachel, and their two children Gage and Ellie move to a rural home where they are welcomed and enlightened about the eerie 'Pet Sematary' located nearby. After the tragedy of their cat being killed by a truck, Louis resorts to burying it in the mysterious pet cemetery, which is definitely not as it seems, as it proves to the Creeds that sometimes, dead is better.                    

Directors:

Kevin Kölsch 
|

Dennis Widmyer 


Stars:
John Lithgow, 
Jason Clarke, 
Amy Seimetz, 
Naomi Frenette






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















The Best of Enemies (2019)


Biography
|
Drama
|
History




    Civil rights activist Ann Atwater faces off against C.P. Ellis, Exalted Cyclops of the Ku Klux Klan, in 1971 Durham, North Carolina over the issue of school integration.                    

Director:

Robin Bissell 


Stars:
Sam Rockwell, 
Taraji P. Henson, 
Wes Bentley, 
Anne Heche






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Peterloo (2018)


154 min
                                  -  
                                Drama
|
History


70        
        Metascore


    The story of the 1819 Peterloo Massacre where British forces attacked a peaceful pro-democracy rally in Manchester.                    

Director:

Mike Leigh 


Stars:
Rory Kinnear, 
Maxine Peake, 
Neil Bell, 
Philip Jackson






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Teen Spirit (2018)


92 min
                                  -  
                                Drama
|
Music


51        
        Metascore


    Violet is a shy teenager who dreams of escaping her small town and pursuing her passion to sing. With the help of an unlikely mentor, she enters a local singing competition that will test her integrity, talent and ambition. Driven by a pop-fueled soundtrack, Teen Spirit is a visceral and stylish spin on the Cinderella story.                    

Director:

Max Minghella 


Stars:
Elle Fanning, 
Rebecca Hall, 
Millie Brady, 
Elizabeth Berrington






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Storm Boy (2019)


99 min
                                  -  
                                Adventure
|
Drama
|
Family




    A beautiful and contemporary retelling of Colin Thiele's classic Australian tale. 'Storm Boy' has grown up to be Michael Kingley, a successful retired businessman and grandfather. When Kingley starts to see images from his past that he can't explain, he is forced to remember his long-forgotten childhood, growing up on an isolated coastline with his father. He recounts to his grand-daughter the story of how, as a boy, he rescued and raised an extraordinary orphaned pelican, Mr Percival. Their remarkable adventures and very special bond has a profound effect on all their lives. Based on the beloved book, Storm Boy is a timeless story of an unusual and unconditional friendship.                    

Director:

Shawn Seet 


Stars:
Jai Courtney, 
Geoffrey Rush, 
David Gulpilil, 
Erik Thomson






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }










April 12 












Hellboy (2019)


Action
|
Adventure
|
Fantasy
|
Sci-Fi




    Based on the graphic novels by Mike Mignola, Hellboy, caught between the worlds of the supernatural and human, battles an ancient sorceress bent on revenge.                    

Director:

Neil Marshall 


Stars:
David Harbour, 
Milla Jovovich, 
Ian McShane, 
Daniel Dae Kim






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Missing Link (2019)


95 min
                                  -  
                                Animation
|
Adventure
|
Comedy
|
Family
|
Fantasy




    Plot kept under wraps.                    

Director:

Chris Butler 


Stars:
Zoe Saldana, 
Hugh Jackman, 
Emma Thompson, 
Matt Lucas






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















After (2019)

Drama
|
Romance




    A young woman falls for a guy with a dark secret and the two embark on a rocky relationship. Based on the novel by Anna Todd.                    

Director:

Jenny Gage 


Stars:
Selma Blair, 
Hero Fiennes Tiffin, 
Peter Gallagher, 
Jennifer Beals






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Little (2019)


Comedy
|
Fantasy




    A woman is transformed into her younger self at a point in her life when the pressures of adulthood become too much to bear.                    

Director:

Tina Gordon 


Stars:
Justin Hartley, 
Regina Hall, 
Marsai Martin, 
Tone Bell






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















High Life (2018)


110 min
                                  -  
                                Adventure
|
Drama
|
Horror
|
Mystery
|
Sci-Fi


81        
        Metascore


    A father and his daughter struggle to survive in deep space where they live in isolation.                    

Director:

Claire Denis 


Stars:
Robert Pattinson, 
Juliette Binoche, 
André Benjamin, 
Mia Goth






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Les filles du soleil (2018)

115 min
                                  -  
                                Drama
|
War


59        
        Metascore


    A Kurdish female battalion prepares to take back their town from extremists.                    

Director:

Eva Husson 


Stars:
Golshifteh Farahani, 
Emmanuelle Bercot, 
Zübeyde Bulut, 
Sinama Alievi






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Sauvage (2018)

99 min
                                  -  
                                Drama


78        
        Metascore


    Leo is 22 and sells his body on the street for a bit of cash. The men come and go, and he stays right here - longing for love. He doesn't know what the future will bring. He hits the road. His heart is pounding.                    

Director:

Camille Vidal-Naquet 


Stars:
Félix Maritaud, 
Eric Bernard, 
Nicolas Dibla, 
Philippe Ohrel






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }










April 19 












The Curse of La Llorona (2019)


93 min
                                  -  
                                Horror
|
Mystery
|
Thriller




    Ignoring the eerie warning of a troubled mother suspected of child endangerment, a social worker and her own small kids are soon drawn into a frightening supernatural realm.                    

Director:

Michael Chaves 


Stars:
Linda Cardellini, 
Raymond Cruz, 
Marisol Ramirez, 
Patricia Velasquez






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Penguins (2019)


Documentary




    The story of Steve, an Adélie penguin, on a quest to find a life partner and start a family. When Steve meets with Wuzzo the emperor penguin they become friends. But nothing comes easy in the icy Antarctic.                    

Directors:

Alastair Fothergill 
|

Jeff Wilson 








    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Under the Silver Lake (2018)


139 min
                                  -  
                                Comedy
|
Crime
|
Drama
|
Mystery
|
Thriller


59        
        Metascore


    Sam, intelligent but without purpose, finds a mysterious woman swimming in his apartment's pool one night. The next morning, she disappears. Sam sets off across LA to find her, and along the way he uncovers a conspiracy far more bizarre.                    

Director:

David Robert Mitchell 


Stars:
Andrew Garfield, 
Riley Keough, 
Topher Grace, 
Callie Hernandez






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Rafiki (2018)

83 min
                                  -  
                                Drama
|
Romance


62        
        Metascore


    "Good Kenyan girls become good Kenyan wives," but Kena and Ziki long for something more. When love blossoms between them, the two girls will be forced to choose between happiness and safety.                    

Director:

Wanuri Kahiu 


Stars:
Samantha Mugatsia, 
Neville Misati, 
Nice Githinji, 
Charlie Karumi






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Breakthrough (2019)


116 min
                                  -  
                                Biography
|
Drama




    When her 14-year-old son drowns in a lake, a faithful mother prays for him to come back from the brink of death and be healed.                    

Director:

Roxann Dawson 


Stars:
Topher Grace, 
Sam Trammell, 
Chrissy Metz, 
Rebecca Staab






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Little Woods (2018)


105 min
                                  -  
                                Crime
|
Drama
|
Western




    A modern Western that tells the story of two sisters, Ollie and Deb, who are driven to work outside the law to better their lives. For years, Ollie has illicitly helped the struggling residents of her North Dakota oil boomtown access Canadian health care and medication. When the authorities catch on, she plans to abandon her crusade, only to be dragged in even deeper after a desperate plea for help from her sister.                    

Director:

Nia DaCosta 


Stars:
Tessa Thompson, 
Lily James, 
Lance Reddick, 
Luke Kirby






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















Fast Color (2018)


100 min
                                  -  
                                Drama
|
Sci-Fi
|
Thriller


56        
        Metascore


    A woman is forced to go on the run when her superhuman abilities are discovered. Years after having abandoned her family, the only place she has left to hide is home.                    

Director:

Julia Hart 


Stars:
Gugu Mbatha-Raw, 
Saniyya Sidney, 
David Strathairn, 
Lorraine Toussaint






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















High on the Hog (2017)

85 min
                                  -  
                                Action
|
Crime
|
Drama
|
Thriller




    With a potent strain of pot sweeping the City, DTA agents attempt to infiltrate a small town farming operation that has a strong leader and interesting family members.                    

Director:

Tony Wash 


Stars:
Sid Haig, 
Joe Estevez, 
Robert Z'Dar, 
Fiona Domenica






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }










April 26 












Avengers: Endgame (2019)

Action
|
Adventure
|
Fantasy
|
Sci-Fi




    After the devastating events of Avengers: Infinity War (2018), the universe is in ruins. With the help of remaining allies, the Avengers assemble once more in order to undo Thanos' actions and restore order to the universe.                    

Directors:

Anthony Russo 
|

Joe Russo 


Stars:
Brie Larson, 
Bradley Cooper, 
Scarlett Johansson, 
Chris Hemsworth






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

Watch Trailer

    if (typeof uet == 'function') {
    uet("be", "SmallTrailerWidget", {wb: 1});
    }


    if (typeof uex == 'function') {
    uex("ld", "SmallTrailerWidget", {wb: 1});
    }






















The White Crow (2018)


127 min
                                  -  
                                Biography
|
Drama




    The story of Rudolf Nureyev's defection to the West.                    

Director:

Ralph Fiennes 


Stars:
Oleg Ivenko, 
Ralph Fiennes, 
Louis Hofmann, 
Adèle Exarchopoulos






    if (typeof uet == 'function') {
    uet("bb", "SmallTrailerWidget", {wb: 1});
    }

预期结果是

Adventure
Fantasy
Sci-Fi

Horror
Thriller

Biography
Drama
History

History

每个电影都以\(20 .. \)开头,然后一些电影跟随新的空行,一些电影跟随(分钟),并且出现“ |”我用\ W确定之间的char,但是我无法提出一个正则表达式。有没有人对此有想法。谢谢。

编辑:用于抓取的网页是https://www.imdb.com/movies-coming-soon/2019-04/。但是,我必须使用给出的文本示例。

2 个答案:

答案 0 :(得分:1)

现在您进行了更新,嗯,这超出了我,但是下面可以处理原始帖子

给出原始信息,请尝试以下操作:

第1部分-字符串

>>> string = '''Shazam! (2019)


132 min
                              -  
                            Action
|
Adventure
|
Fantasy
|
Sci-Fi




We all have a superhero inside us, it just takes a bit of magic to bring it out. In Billy Batson's case, by shouting out one word - SHAZAM! - this streetwise 14-year-old foster kid can turn into the adult superhero Shazam.                    

Director:

David F. Sandberg 


Stars:
Zachary Levi, 
Djimon Hounsou, 
Mark Strong, 
Michelle Borth







Pet Sematary (2019)

Horror
|
Thriller




Louis Creed, his wife Rachel, and their two children Gage and Ellie move to a rural home where they are welcomed and enlightened about the eerie 'Pet Sematary' located nearby. After the tragedy of their cat being killed by a truck, Louis resorts to burying it in the mysterious pet cemetery, which is definitely not as it seems, as it proves to the Creeds that sometimes, dead is better.                    

Directors:

Kevin Kölsch 
|

Dennis Widmyer 


Stars:
John Lithgow, 
Jason Clarke, 
Amy Seimetz, 
Naomi Frenette





The Best of Enemies (2019)


Biography
|
Drama
|
History




Civil rights activist Ann Atwater faces off against C.P. Ellis, Exalted Cyclops of the Ku Klux Klan, in 1971 Durham, North Carolina over the issue of school integration.                    

Director:

Robin Bissell 


Stars:
Sam Rockwell, 
Taraji P. Henson, 
Wes Bentley, 
Anne Heche






Peterloo (2018)


154 min
                              -  
                            Drama
|
History


70        
    Metascore


    The story of the 1819 Peterloo Massacre where British forces attacked a peaceful pro-democracy rally in Manchester.                    

Director:

Mike Leigh 


Stars:
Rory Kinnear, 
Maxine Peake, 
Neil Bell, 
Philip Jackson'''

第2部分-代码

>>> categories_group = re.findall('\(20[\d]{2}\)[\S\s]*?((?:[\S]+[\s]*\|[\s]*){1,}[\S]*)', string)




>>> for categories in categories_group:
        print('\n'*3)
        print(categories)





#Output

Action
|
Adventure
|
Fantasy
|
Sci-Fi




Horror
|
Thriller




Biography
|
Drama
|
History




Drama
|
History

第3部分-进一步的代码以消除|\n

>>> categories_eliminate_OR = []


>>> for categories in categories_group:
        categories_eliminate_OR.append(categories.replace('|\n', ''))


>>> for categories in categories_eliminate_OR:
        print('\n'*2)
        print(categories)



#Output
Action
Adventure
Fantasy
Sci-Fi



Horror
Thriller



Biography
Drama
History



Drama
History

答案 1 :(得分:0)

所以,如果您愿意接受一些非捕获功能,那么我认为以下方法应该可行。请记住,我更新了正则表达式,以便它现在也可以捕获单个术语。另外,您可以通过我上面发布的用来消除|\n

>>> categories_group = re.findall('Director[\S\s]+?Star|\(20[\d]{2}\)[\S\s]*?((?:[\S]+[\s]*\|[\s]*){1,}[\S]*|[A-Z]+[\w]+)', string)


>>> for categories in categories_group:
        print(categories)



#Output

Action
|
Adventure
|
Fantasy
|
Sci-Fi



Horror
|
Thriller



Biography
|
Drama
|
History



Drama
|
History



Drama
|
Music



Adventure
|
Drama
|
Family



Action
|
Adventure
|
Fantasy
|
Sci-Fi



Animation
|
Adventure
|
Comedy
|
Family
|
Fantasy



Drama
|
Romance



Comedy
|
Fantasy



Adventure
|
Drama
|
Horror
|
Mystery
|
Sci-Fi



Drama
|
War



Drama



Horror
|
Mystery
|
Thriller



Documentary



Drama
|
Romance



Biography
|
Drama



Crime
|
Drama
|
Western



Drama
|
Sci-Fi
|
Thriller



Action
|
Crime
|
Drama
|
Thriller



Action
|
Adventure
|
Fantasy
|
Sci-Fi



With



Biography
|
Drama

有一件事,下面的句子将导致With被捕获,因为我无法找出一种真正可靠的方法来使正则表达式忽略该行,因为电影标题本身可以是可变长度的

After the devastating events of Avengers: Infinity War (2018), the universe is in ruins. With the help