如何计算python中某个范围内不为零的行数?

时间:2017-03-26 08:29:17

标签: python pandas

我有一个由数字0或1组成的熊猫系列。

2016-01-01    0
2016-01-02    1
2016-01-03    1
2016-01-04    0
2016-01-05    1
2016-01-06    1
2016-01-08    1
...

我想使用这个系列创建一个数据框,添加另一个系列,提供有关在一段时间内存在多少个1的信息。

例如,如果期间为5天,则数据框看起来像

              Value   1s_for_the_last_5days
2016-01-01    0
2016-01-02    1
2016-01-03    1
2016-01-04    0
2016-01-05    1       3
2016-01-06    1       4
2016-01-08    1       4
...

此外,我想知道在以下情况下,我是否可以在一定范围内计算非零行数。

              Value   Not_0_rows_for_the_last_5days
2016-01-01    0
2016-01-02    1.1
2016-01-03    0.4
2016-01-04    0
2016-01-05    0.6       3
2016-01-06    0.2       4
2016-01-08    10        4

感谢您阅读本文。如果您能就此问题给我任何解决方案或提示,我将不胜感激。

3 个答案:

答案 0 :(得分:2)

您可以使用rolling创建一个大小的窗口,并在应用像sum这样的聚合时迭代您的给定列。

首先创建一些虚拟数据:

import pandas as pd
import numpy as np

ser = pd.Series(np.random.randint(0, 2, size=10), 
                index=pd.date_range("2016-01-01", periods=10),
                name="Value")
print(ser)

2016-01-01    1
2016-01-02    0
2016-01-03    0
2016-01-04    0
2016-01-05    0
2016-01-06    0
2016-01-07    0
2016-01-08    0
2016-01-09    1
2016-01-10    0
Freq: D, Name: Value, dtype: int64

现在,使用滚动:

summed = ser.rolling(5).sum()
print(summed)

2016-01-01    NaN
2016-01-02    NaN
2016-01-03    NaN
2016-01-04    NaN
2016-01-05    1.0
2016-01-06    0.0
2016-01-07    0.0
2016-01-08    0.0
2016-01-09    1.0
2016-01-10    1.0
Freq: D, Name: Value, dtype: float64

最后,创建结果数据框:

df = pd.DataFrame({"Value": ser, "Summed": summed})
print(df)

            Summed  Value
2016-01-01     NaN      1
2016-01-02     NaN      0
2016-01-03     NaN      0
2016-01-04     NaN      0
2016-01-05     1.0      0
2016-01-06     0.0      0
2016-01-07     0.0      0
2016-01-08     0.0      0
2016-01-09     1.0      1
2016-01-10     1.0      0

为了计算任意值,请在滚动窗口中与apply一起定义您自己的聚合函数,如:

# dummy function to count zeros
count_func = lambda x: (x==0).sum()

summed = ser.rolling(5).apply(count_func)
print(summed)

您可以将0替换为原始系列的任何值或值组合。

答案 1 :(得分:1)

你想要rolling

s.rolling('5D').sum()

df = pd.DataFrame({'Value': s, '1s_for_the_last_5days': s.rolling('5D').sum()})

答案 2 :(得分:1)

let cacheKey = "Cache" let bookKey: StringKey = "My Favorite Book" func test() { var cache = BookCache() cache[bookKey] = Book(title: "Lord of the Rings") let userDefaults = UserDefaults() let data = NSKeyedArchiver.archivedData(withRootObject: BookCacheCoding(cache: cache)) userDefaults.set(data, forKey: cacheKey) userDefaults.synchronize() if let data = userDefaults.data(forKey: cacheKey), let cache = (NSKeyedUnarchiver.unarchiveObject(with: data) as? BookCacheCoding)?.cache, let book = cache.value(forKey: bookKey) { print(book.title) } } 是一种有用的方法,但您可以使用pythonic方式执行此操作:

pd.Series.rolling

输出:

def rolling_count(l,rolling_num=5,include_same_day=True):
    output_list = []
    for index,_ in enumerate(l):
        start = index - rolling_num - int(include_same_day)
        end = index + int(include_same_day)
        if start < 0:
            start = 0
        output_list.append(sum(l[start:end]))
    return output_list

data = {'Value': [0, 1, 1, 0, 1, 1, 1],
        'date': ['2016-01-01','2016-01-02','2016-01-03','2016-01-04','2016-01-05','2016-01-06','2016-01-08']}

df = pd.DataFrame(data).set_index('date')

l = df['Value'].tolist()

df['1s_for_the_last_5days'] = rolling_count(df['Value'],rolling_num=5)

print(df)