I have a dataframe that looks like this:
Sentence bin_class
"i wanna go to sleep. too late to take seroquel." 1
"Adam and Juliana are leaving me for 43 days take me with youuuu!" 0
And I also have a list of regex patterns I want to use on these sentences. What I want to do is re.search every pattern in my list on every every sentence in the dataframe and create a new column in the data frame that has a 1 if there is a matching regex and a zero otherwise. I have been able to run the regex patterns against the sentences in the dataframe to create a list of matches but am not sure how to create a new column on the data frame.
matches = []
for x in df['sentence']:
for i in regex:
match = re.search(i,x)
if match:
matches.append((x,i))
答案 0 :(得分:4)
You can probably use the str.count
string method. A small example:
In [25]: df
Out[25]:
Sentence bin_class
0 i wanna go to sleep. too late to take seroquel. 1
1 Adam and Juliana are leaving me for 43 days ta... 0
In [26]: df['Sentence'].str.count(pat='to')
Out[26]:
0 3
1 0
Name: Sentence, dtype: int64
This method also accepts a regex pattern. If you just want the occurence and not the count, contains
is probably enough:
In [27]: df['Sentence'].str.contains(pat='to')
Out[27]:
0 True
1 False
Name: Sentence, dtype: bool
So with this you can loop through your regex patterns and then each time add a column with the above.
See the documentation on this for more examples: http://pandas.pydata.org/pandas-docs/stable/text.html#testing-for-strings-that-match-or-contain-a-pattern