在pandas列中搜索字符串的最有效方法是什么?

时间:2018-01-03 13:40:24

标签: python string list pandas append

我写了这段代码,它需要一个输入字符串来获取类似的单词 并创建这些单词的不同组合,在pandas列中搜索每个组合,并返回找到关键词的行的索引。

我在下面编写了代码并且对我来说效果很好,但它比我想要的慢,随着时间的推移,数据框越来越大,我猜测它只会变慢。

所以我想知道是否有更有效的方法可以遵循,我可以改变哪些线来实现这一目标。使用正则表达式搜索或附加列表。

这是我的数据框

    Unnamed: 0  web-scraper-start-url   course-link course-link-href    title   shortDescription    instructor  date    language    subtitle    ... fullDescription requiremens includes    objective   audience    instruct    fullText    full_text   key_words   clean_words
0   0   https://www.udemy.com/courses/business/all-cou...   How To Create A 5 Figure SEO Business-ZERO Exp...   https://www.udemy.com/how-to-create-a-5-figure...   How To Create A 5 Figure SEO Business-ZERO Exp...   Create a 5 figure SEO business by working for ...   Angshuman Dutta Last updated 3/2017 English English [Auto-generated]    ... This course will show you how to create a prof...   You should be willing to profit from selling S...   2 hours on-demand video|2 Supplemental Resourc...   Build a sustainable income selling SEO service...   This course is for internet marketers who want...   Angshuman-Dutta How To Create A 5 Figure SEO Business-ZERO Exp...   ['create', 'figure', 'seo', 'business', 'zero'...   ['freelance', 'experience', 'service', 'websit...   ['income', 'resource', 'corporate', 'absolutel...
1   1   https://www.udemy.com/courses/business/all-cou...   Microsoft Excel for Project Management - Earn...    https://www.udemy.com/microsoft-excel-for-proj...   Microsoft Excel for Project Management - Earn...    Mastering Microsoft Excel for Project Manageme...   Joseph Phillips Last updated 3/2016 English English [Auto-generated]    ... Itâs been said that project management is 90... Basics of project management|Basics of Microso...   4.5 hours on-demand video|1 Supplemental Resou...   Design reports for your stakeholders|Create a ...   Project managers|PMPs|People learning Microsof...   Joseph-Phillips Microsoft Excel for Project Management - Earn...    ['microsoft', 'excel', 'project', 'management'...   ['project', 'manager', 'excel', 'microsoft', '...   ['project', 'resource', 'people', 'reporting',...

这就是我的数据框中的关键字列如何

0       [freelance, experience, service, website, free...
1       [project, manager, excel, microsoft, reporting...
2       [income, informational, english, online, exper...

这是我的代码。

def bla_bla(model):

    input_string = input()
    title = input_course.split()
    titles = model.most_similar(title)
    title_list = []
    for keyword in titles:
        titles_list.append(keyword[0])

    recommended_keywords = titles_list + title


 #This is how recommended key_words will look like

      ['fullstack',
 'ror',
 'tulsa',
 'shrikrishna',
 'vanston',
 'devtools',
 'develoeprs',
 'frontend',
 'intermidate',
 'nunn',
 'web',
 'developer']


    coursat = []
    for duo in range(0, len(recommended_keywords)+1):
        for subset in itertools.combinations(recommended_keywords, duo):
            if len(subset) > 2 and len(subset)<=3:
                coursat.append(subset)
            else:
                pass
    my_list = []
    for g in coursat:
        y  = df[df['key_words'].str.contains(".*"+str(g[0])+".*"+str(g[1])+"|"+".*"+str(g[1])+".*"+str(g[0]))]
        if y.title.empty:
            pass
        else: my_list.append(y.title)
    return my_list

这应该是我的功能的输出。

[2538    Node with React: Fullstack Web Development
 Name: title, dtype: object,
 2481      Progressive Web Apps (PWA) - The Complete Guide
 3447    Progressive Web Apps - The Concise PWA Masterc...
 4964    Progressive Web Apps (PWA) - From Beginner to ...
 Name: title, dtype: object,
 5691    Yii2 Application Development Solutions–Volume 2
 Name: title, dtype: object,
 3697    HTML5 : Mobile Web App Development
 Name: title, dtype: object]

提前致谢。

0 个答案:

没有答案