使用value_coutns仅提取某些行

时间:2016-10-23 14:38:11

标签: python pandas

我有简单的代码,可以读取CSV文件并为我提供value_counts的列表,但我想从value_counts结果中仅提取某些行。有什么建议怎么做?

value_count结果如下所示:

domain_a.com            79
domain_b.de             51
domain_c.de             44
domain_d.com            43
domain_e.com            38

我希望能够搜索结果并仅返回与某个域名匹配的行:

期望的结果:

domain_a.com            79
domain_c.de             44
domain_e.com            38
到目前为止

代码:

import pandas as pd

# read csv into the data dataframe
allData = r'/downloads/data/latest/export-2016-09-30-2039-55502fd6.csv'
tickets_df = pd.read_csv((allData),parse_dates=['Created at'],index_col='Created at')
tickets_df.fillna(0,inplace=True)

# Use 2016 data ony
tickets_2016_df = (tickets_df.loc['2016-01-01':'2016-10-20'])

org_counts = tickets_2016_df['Requester domain'].value_counts()
print (org_counts)

1 个答案:

答案 0 :(得分:1)

您可以将系列转换为DataFrame,然后使用.query()方法:

In [120]: org_counts
Out[120]:
domain_a.com    79
domain_b.de     51
domain_c.de     44
domain_d.com    43
domain_e.com    38
Name: val, dtype: int64

In [121]: org_counts.to_frame('count').query("index in ['domain_a.com','domain_c.de','domain_e.com']")
Out[121]:
              count
domain_a.com     79
domain_c.de      44
domain_e.com     38

或使用Index.isin()方法和boolean indexing

In [122]: domains = ['domain_a.com','domain_c.de','domain_e.com']

In [123]: org_counts[org_counts.index.isin(domains)]
Out[123]:
domain_a.com    79
domain_c.de     44
domain_e.com    38
Name: val, dtype: int64