我有简单的代码,可以读取CSV文件并为我提供value_counts
的列表,但我想从value_counts
结果中仅提取某些行。有什么建议怎么做?
value_count结果如下所示:
domain_a.com 79
domain_b.de 51
domain_c.de 44
domain_d.com 43
domain_e.com 38
我希望能够搜索结果并仅返回与某个域名匹配的行:
期望的结果:
domain_a.com 79
domain_c.de 44
domain_e.com 38
到目前为止代码:
import pandas as pd
# read csv into the data dataframe
allData = r'/downloads/data/latest/export-2016-09-30-2039-55502fd6.csv'
tickets_df = pd.read_csv((allData),parse_dates=['Created at'],index_col='Created at')
tickets_df.fillna(0,inplace=True)
# Use 2016 data ony
tickets_2016_df = (tickets_df.loc['2016-01-01':'2016-10-20'])
org_counts = tickets_2016_df['Requester domain'].value_counts()
print (org_counts)
答案 0 :(得分:1)
您可以将系列转换为DataFrame,然后使用.query()方法:
In [120]: org_counts
Out[120]:
domain_a.com 79
domain_b.de 51
domain_c.de 44
domain_d.com 43
domain_e.com 38
Name: val, dtype: int64
In [121]: org_counts.to_frame('count').query("index in ['domain_a.com','domain_c.de','domain_e.com']")
Out[121]:
count
domain_a.com 79
domain_c.de 44
domain_e.com 38
或使用Index.isin()方法和boolean indexing:
In [122]: domains = ['domain_a.com','domain_c.de','domain_e.com']
In [123]: org_counts[org_counts.index.isin(domains)]
Out[123]:
domain_a.com 79
domain_c.de 44
domain_e.com 38
Name: val, dtype: int64