我是python中的新手。我的问题有点隐藏。如果来自单元格的任何字符串与特定的通配符规则匹配,我想从dataFrame中选择行。让我们假设这个例子:
表格到屏幕:
df=pd.DataFrame({'Column':[
'select rows in pandas DataFrame using comparisons against two columns',
'select rows from a DataFrame based on values in a column in pandas',
'use a list of values to select rows from a pandas dataframe',
'selecting columns from a pandas dataframe based on row conditions',
'select particular columns from inside groups in pandas dataframe']})
Column
0 select rows in pandas DataFrame using comparisons against two columns
1 select rows from a DataFrame based on values in a column in pandas
2 use a list of values to select rows from a pandas dataframe
3 selecting columns from a pandas dataframe based on row conditions
4 select particular columns from inside groups in pandas dataframe
规则:
Rules=pd.DataFrame({'SearchTerms':['*select*DataFrame*row*','*select*dataframe*row*']})
SearchTerms
0 *select*DataFrame*row*
1 *select*dataframe*row*
结果:
Column
0 select rows in pandas DataFrame using comparisons against two columns
1 select rows from a DataFrame based on values in a column in pandas
2 use a list of values to select rows from a pandas dataframe
我尝试将fnmatch与多个语句一起使用:
import fnmatch
selection=[]
for row in df['Column']:
selection.append(fnmatch.fnmatch(row,Rules[0])|fnmatch.fnmatch(row,Rules[1]))
问题
如何从带有可变数量的通配符语句的dataFrame中选择行?
生活无处可去。来人帮帮我!!! ;)
提前致谢,
答案 0 :(得分:1)
“通配符”解决方案:
数据:
In [53]: df
Out[53]:
Column
0 select rows in pandas DataFrame using comparisons against two columns
1 select rows from a DataFrame based on values in a column in pandas
2 use a list of values to select rows from a pandas dataframe
3 selecting columns from a pandas dataframe based on row conditions
4 select particular columns from inside groups in pandas dataframe
In [54]: Rules
Out[54]:
SearchTerms
0 *select*DataFrame*row*
1 *select*dataframe*row*
解决方案:
In [55]: pat = Rules.SearchTerms.str.replace('\*', r'.*').str.cat(sep='|')
In [56]: df[df.Column.str.contains(pat, flags=re.I)]
Out[56]:
Column
3 selecting columns from a pandas dataframe based on row conditions
生成的RegEx模式:
In [64]: pat
Out[64]: '.*select.*DataFrame.*row.*|.*select.*dataframe.*row.*'
答案 1 :(得分:0)
我认为使用pandas中的内置字符串匹配函数可能会有更好的成功。如果你有一个pandas Series对象(一个DataFrame列是一个Series对象),它是一个字符串集合,你可以调用.str.<method>
。有很多可用的字符串方法,但在这种情况下,您可以使用.str.match(...)
或.str.contains(...)
。
这两种方法都接受正则表达式语句。这意味着将通配符表达式更改为regEx。
df[df.Column.str.match('select|DataFrame|row', case=False)]
Column
0 select rows in pandas DataFrame using comparis...
1 select rows from a DataFrame based on values i...
3 selecting columns from a pandas dataframe base...
4 select particular columns from inside groups i...