我有下面给出的数据
(base) root@81fb5004ea4c:/# python3 script.py
1.0
这就是我在 Pandas 中加载它的方式
data = [(datetime.datetime(2020, 12, 21, 6, 50, 14, 955551), 'blr', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 7, 6, 0, 242578), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 7, 16, 30, 260692), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 7, 18, 15, 333229), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 7, 29, 0, 839566), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 7, 37, 45, 211979), 'lon', 'del', 'low'), (datetime.datetime(2020, 12, 21, 7, 41, 15, 211376), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 7, 48, 16, 26287), 'lon', 'del', 'low'), (datetime.datetime(2020, 12, 21, 7, 55, 17, 248074), 'ny', 'del', 'low'), (datetime.datetime(2020, 12, 21, 7, 57, 2, 55666), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 8, 4, 2, 319699), 'lon', 'del', 'low'), (datetime.datetime(2020, 12, 21, 8, 25, 5, 982621), 'ny', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 8, 26, 50, 997280), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 8, 39, 7, 14287), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 8, 47, 51, 810956), 'lon', 'del', 'medium'), (datetime.datetime(2020, 12, 21, 9, 37, 23, 99922), 'ny', 'del', 'low')]
现在我可以通过执行类似的操作来获取所有具有特定类型的行,例如 import pandas as pd
import datetime
df = pd.DataFrame(data)
df.columns = ["date", "start", "end", "type"]
df.set_index('date', inplace=True)
medium
现在我想知道对于每对唯一的 print(df[df.values == 'medium'])
和 start
,end
类型的计数是多少?基本上我想要类似的东西
medium
但我不知道如何才能得到它。这怎么办?
答案 0 :(得分:3)
使用 GroupBy.size
和 spcify 列进行测试:
s1 = df[df.values == 'medium'].groupby(['start','end']).size()
print (s1)
start end
blr del 1
lon del 9
ny del 1
dtype: int64
或者如果想要所有组合也与 type
:
print(df.groupby(['type','start','end']).size())
type start end
low lon del 3
ny del 2
medium blr del 1
lon del 9
ny del 1
dtype: int64
print (s.loc['medium'])
start end
blr del 1
lon del 9
ny del 1
dtype: int64
print (s.loc['low'])
start end
lon del 3
ny del 2
dtype: int64
答案 1 :(得分:2)
使用value_counts:
res = df[df['type'].eq('medium')].value_counts()
print(res)
输出
start end type
lon del medium 9
ny del medium 1
blr del medium 1
dtype: int64
来自文档:
<块引用>返回包含 DataFrame 中唯一行计数的系列。
如果要从输出中删除类型,请按照@jezrael 的建议使用 droplevel:
res = df[df['type'].eq('medium')].value_counts().droplevel(level=-1)
print(res)
输出
start end
lon del 9
ny del 1
blr del 1
dtype: int64
这也可以扩展到所有类型,例如,使用:
res = df.value_counts(subset=['type', 'start', 'end']).sort_index(level=0)
print(res)
输出
type start end
low lon del 3
ny del 2
medium blr del 1
lon del 9
ny del 1
dtype: int64
答案 2 :(得分:0)
df.where(lambda x:x.type == "medium").dropna().groupby(['start', 'end']).type.agg("count")
start end
blr del 1
lon del 9
ny del 1
Name: type, dtype: int64