在python中按特定单词拆分列表中的一些字符串项

时间:2015-12-06 11:29:32

标签: python string list

我有一个包含一些字符串项的列表

res = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|DRAMA|'",
"FAV_VENUE_CITY_NAME = 'Mumbai' &  & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
"FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|COMEDY|'",
"FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_LANGUAGE == 'Hindi'"]

我想迭代所有列表项和

1.如果以count_开头(在两个和两个字符之间)

,则删除一个单词短语

out put应该像

res = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS'",
 "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY'",
 "FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL'",
 "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'WORKSHOP'",
 "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION'",
 "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|DRAMA|'",
 "FAV_VENUE_CITY_NAME = 'Mumbai' &  & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
 "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|COMEDY|'",
 "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_LANGUAGE == 'Hindi'"]

我尝试了类似

的内容
for x in res:
    regex = re.compile('count_')   #setting a search cateory
    matches = [string for string in res if re.match(regex, string)]  # finding all matches
    resfinal = [x for x in res if x not in matches]

但不成功。我知道我错过了一些reg操作技巧,但没有得到它。请建议使用一些代码行。

3 个答案:

答案 0 :(得分:1)

你不需要正则表达式。

>>> res = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|DRAMA|'",
"FAV_VENUE_CITY_NAME = 'Mumbai' &  & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
"FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|COMEDY|'",
"FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_LANGUAGE == 'Hindi'"]
>>> [' & '.join(x for x in i.split(' & ') if not x.startswith('count_')) for i in res]
["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS'", "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY'", "FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL'", "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'WORKSHOP'", "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION'", "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|DRAMA|'", "FAV_VENUE_CITY_NAME = 'Mumbai' &  & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'", "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|COMEDY|'", "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_LANGUAGE == 'Hindi'"]

答案 1 :(得分:1)

re.match() == re.search('^regex')

因此re.match(regex, string)将检查字符串是否以count_开头,而不是在字符串中搜索。因此,您应该使用re.search()代替re.match()

for x in res:
    regex = re.compile('count_')   #setting a search cateory
    matches = [string for string in res if re.search(regex, string)]  # finding all matches
    resfinal = [x for x in res if x not in matches]

输出:

["FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|DRAMA|'",
 "FAV_VENUE_CITY_NAME = 'Mumbai' &  & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
 "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|COMEDY|'",
 "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_LANGUAGE == 'Hindi'"]

但如果if x not in matches中有count_>>> import re >>> res = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1", ... "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1", ... "FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1", ... "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1", ... "FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1", ... "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|DRAMA|'", ... "FAV_VENUE_CITY_NAME = 'Mumbai' & & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'", ... "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|COMEDY|'", ... "FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_LANGUAGE == 'Hindi'"] >>> for x in res: ... resfinal = [re.sub(' & count_.*(?= & )', '', x) for x in res] # remove all things after that ` & count_` 会删除该字符串。我认为你应该使用:

>>> for i in resfinal:
...     print(i)
...     
... 
FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS'
FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY'
FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL'
FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'WORKSHOP'
FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION'
FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|DRAMA|'
FAV_VENUE_CITY_NAME = 'Mumbai' &  & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'
FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_GENRE == '|COMEDY|'
FAV_VENUE_CITY_NAME == 'Mumbai' & FAV_LANGUAGE == 'Hindi'
>>> 

演示:

public class HelloWorld {
    public static void main(String args[]){
       System.out.println("Hello World");
    }
}

答案 2 :(得分:1)

我认为您使用的是错误的数据类型。如果要从这些字符串中提取更多信息,请考虑尝试构建一个dicts列表。如果您坚持要保留列表项字符串,请尝试以下操作:

var stack = new Stack();
var op = new Decr();
stack.push(0);
op.Execute(stack);
Assert.AreEqual(-1, stack.Peek());

魔法在正则表达式中。它匹配'&'或字符串的开头,然后只有空格字符,直到'count_',然后其他任何东西,最后一个完成'& '或字符串的结尾。 '*?'是为了确保匹配尽可能小。