Pandas str.inctains regex

时间:2014-12-11 13:07:01

标签: python pandas

我正在尝试学习Pandas和MatPlotLib。作为一项挑战,我决定根据评论尝试绘制职业类型的结果会很有趣。我的思考过程是获取评论,找到一个专业的小数据集,并检查对数据集的评论。我确信必须有更好的方法,还在学习。

与常规正则表达式结果相比,Pandas正则表达式匹配方式有何不同?如果不是,0应该是真的吗?

#! /usr/bin/python
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import
import pandas as pd
import matplotlib.pyplot as plt
import praw

r = praw.Reddit(user_agent='my_cool_application')
submissions = r.get_submission(submission_id = '2owaba')
s = pd.Series(submissions.comments)

pattern = r'Programmer'
print (s.str.contains(pattern))
print (s)

输出不符合预期。

$ python reddit.py 
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
9    NaN
10   NaN
11   NaN
12   NaN
13   NaN
14   NaN
...
57   NaN
58   NaN
59   NaN
60   NaN
61   NaN
62   NaN
63   NaN
64   NaN
65   NaN
66   NaN
67   NaN
68   NaN
69   NaN
70   NaN
71   NaN
Length: 72, dtype: float64
0     Programmer/Project Lead for a railroad company...
1     I deliver pizza part time while I go to colleg...
2      Graduate student (molecular biologist) + cat mom
3     Systems Analyst at a big, boring corporation. ...
4     I work in IT.  I wear many hats at my (small) ...
5                       I'm a professional desk jobber.
6     medical pot producer....pretty much your typic...
7     Research tech for the federal govt. Water leve...
8                                     Karate instructor
9     I own a Vape shop and an E-Liquid manufacturin...
10      Guidance counselor. If only my students knew...
11                         Graduate student and chemist
12    Regulatory Affairs for a medical device manufa...
13    restaurant manager (for the moment, looking to...
14    Logistics and technician manager for a radon m...
...
57    Technical Support for a big credit card proces...
58    Class action settlement administration. Been t...
59    IT Consultant here 8) Lot's of IT folk at EF i...
60    This'll be my first year, staying in the Back ...
61    Research assistant in the epidemiology departm...
62    IT undergrad and this will be my second time a...
63    Commercial construction foreman at a tiny company
64    I'm actually a web developer for a company tha...
65             Install cameras, tv's and phone systems.
66    Animation/design/anything creative. Graduated ...
67                                     Career bartender
68    I work in the Traveling Hospitality Business f...
69    Assisstant Manager at a major retail chain...t...
70                                          Barista :) 
71    Hi, I'm Pasquale Rotella (CEO, Insomniac Event...
Length: 72, dtype: object

1 个答案:

答案 0 :(得分:1)

您的系列包含praw.objects.Comment个对象而非字符串。提取身体应该给你你想要的东西:

s = pd.Series(comment.body for comment in submissions.comments)