Pandas dataframe - 使用WildCards选择行

时间:2017-02-12 11:56:09

标签: python pandas

我是python中的新手。我的问题有点隐藏。如果来自单元格的任何字符串与特定的通配符规则匹配,我想从dataFrame中选择行。让我们假设这个例子:

表格到屏幕:

df=pd.DataFrame({'Column':[
    'select rows in pandas DataFrame using comparisons against two columns',
    'select rows from a DataFrame based on values in a column in pandas',
    'use a list of values to select rows from a pandas dataframe',
    'selecting columns from a pandas dataframe based on row conditions',
    'select particular columns from inside groups in pandas dataframe']})

  Column
0 select rows in pandas DataFrame using comparisons against two columns
1 select rows from a DataFrame based on values in a column in pandas
2 use a list of values to select rows from a pandas dataframe
3 selecting columns from a pandas dataframe based on row conditions
4 select particular columns from inside groups in pandas dataframe

规则:

Rules=pd.DataFrame({'SearchTerms':['*select*DataFrame*row*','*select*dataframe*row*']})

  SearchTerms
0 *select*DataFrame*row*
1 *select*dataframe*row*

结果:

  Column
0 select rows in pandas DataFrame using comparisons against two columns
1 select rows from a DataFrame based on values in a column in pandas
2 use a list of values to select rows from a pandas dataframe

我尝试将fnmatch与多个语句一起使用:

import fnmatch
selection=[]
for row in df['Column']:
   selection.append(fnmatch.fnmatch(row,Rules[0])|fnmatch.fnmatch(row,Rules[1]))

问题

如何从带有可变数量的通配符语句的dataFrame中选择行?

生活无处可去。来人帮帮我!!! ;)

提前致谢,

2 个答案:

答案 0 :(得分:1)

“通配符”解决方案:

数据:

In [53]: df
Out[53]:
                                                                  Column
0  select rows in pandas DataFrame using comparisons against two columns
1     select rows from a DataFrame based on values in a column in pandas
2            use a list of values to select rows from a pandas dataframe
3      selecting columns from a pandas dataframe based on row conditions
4       select particular columns from inside groups in pandas dataframe

In [54]: Rules
Out[54]:
              SearchTerms
0  *select*DataFrame*row*
1  *select*dataframe*row*

解决方案:

In [55]: pat = Rules.SearchTerms.str.replace('\*', r'.*').str.cat(sep='|')

In [56]: df[df.Column.str.contains(pat, flags=re.I)]
Out[56]:
                                                              Column
3  selecting columns from a pandas dataframe based on row conditions

生成的RegEx模式:

In [64]: pat
Out[64]: '.*select.*DataFrame.*row.*|.*select.*dataframe.*row.*'

答案 1 :(得分:0)

我认为使用pandas中的内置字符串匹配函数可能会有更好的成功。如果你有一个pandas Series对象(一个DataFrame列是一个Series对象),它是一个字符串集合,你可以调用.str.<method>。有很多可用的字符串方法,但在这种情况下,您可以使用.str.match(...).str.contains(...)

这两种方法都接受正则表达式语句。这意味着将通配符表达式更改为regEx。

df[df.Column.str.match('select|DataFrame|row', case=False)]

                                          Column
0  select rows in pandas DataFrame using comparis...
1  select rows from a DataFrame based on values i...
3  selecting columns from a pandas dataframe base...
4  select particular columns from inside groups i...