假设我有以下DF:
Date ID
2019-06-01 A
2019-06-01 B
2019-06-01 B
2019-06-02 A
2019-06-02 C
2019-06-03 C
2019-06-03 A
获取每个日期的唯一ID的累计计数的最有效方法是什么:
Date ID
2019-06-01 2
2019-06-02 3
2019-06-03 3
我可以按日期和np.isin
使用for循环,但这在性能上是很糟糕的。
谢谢
答案 0 :(得分:3)
让我们做
import csv
events = ["PASSED", "FAILED", "EXCEPTION", "NA", "DEPRECATED"]
with open('data.csv', 'r') as fin, open('data_out.csv', 'w') as fout:
in_, out = csv.reader(fin), csv.writer(fout)
out.writerow(next(in_) + events)
out.writerows(line + [sum(1 if event == entry else 0 for entry in line[1:])
for event in events]
for line in in_)
答案 1 :(得分:2)
尝试使用groupby().nunique
的{{1}}:
cumsum()
输出:
dates = pd.date_range(df.Date.min(), df.Date.max())
(df.drop_duplicates(['ID'])
.groupby('Date')['ID'].nunique().cumsum()
.reindex(dates).ffill()
)