Question

假设我有一个表ss_prices，其中有一个名为fund_code的主键列，pandas将其视为索引：

>>> arr = list(zip(['MM1', 'MM2', '3MM', '4AA'], range(1,5)))
>>> cols = ['fund_code', 'values']
>>> ss_prices = pd.DataFrame(arr, columns=cols).set_index('fund_code')
>>> ss_prices
              values
    fund_code
    MM1       1
    MM2       2
    3MM       3
    4AA       4

我想只获取主键以＆＃39; MM＆＃39;开头的那些行。在SQL中我可以这样做：

select * from ss_prices
where left(fund_code, 2) = 'MM'

但在pandas似乎我必须这样做：

ss_prices[np.vectorize(lambda x: x[:2] == 'MM')(ss_prices.index.values)]

pandas语法肯定更混乱，更不易读。如果没有使用像pandasql这样的工具，是否有更可读的方法来完成WHERE条款？

Answer 1

您可以使用DataFrame.filter使用正则表达式过滤索引：

~]# cat /etc/rsyslog.d/httpd-socket.conf
$AddUnixListenSocket /run/httpd-log0.sock
local0.* /var/log/httpd/mywsgiapp.log
& stop

~]# cat /etc/httpd/conf.d/00-myvhost.conf
<VirtualHost *:80>
  ServerName r72.example.com
  ServerAlias r72
  DocumentRoot /var/www/html
  CustomLog "|/usr/bin/logger -t httpd -u /run/httpd-log.sock -p local0.info" common
  ErrorLog "|/usr/bin/logger -t httpd -u /run/httpd-log.sock -p local0.error"
</VirtualHost>

Answer 2

您可以使用x.startswith("MM")（您应该看到PEP 8）。

Answer 3

尝试df.index.to_series().str[:2]：

In [324]: df
Out[324]:
     a
MMa  1
MMb  2
AAA  3
BBB  4

In [325]: df[df.index.to_series().str[:2] == 'MM']
Out[325]:
     a
MMa  1
MMb  2

Answer 4

直接在索引上使用str.startswith返回一个布尔掩码：

In [27]:
df[df.index.str.startswith('MM')]

Out[27]:
     a
MMa  1
MMb  2

Pandas WHERE子句用于字符串索引？

4 个答案: