我想创建一个数据集,然后根据插入到第一列中的数字确定第二列的值。
首先,我想创建一个具有2列的数据集(保持一列为空白)
我要创建的数据集:(在数字列中,我要插入从1到50的所有连续数字)
Expected Output:
Numbers Sum
1 55 (this is the sum of first 10 consecutive numbers)
2 65 (this is the sum of consecutive 10 numbers from 2 to 11)
3 75 (this is the sum of consecutive 10 numbers from 3 to 12)
4 85 (this is the sum of consecutive 10 numbers from 4 to 13)
5
6
7
8
9
10
11
12
13
,依此类推,我想通过在列号中插入1到50的值来计算1到50的值的总和,并获取10个连续数字的总和。
答案 0 :(得分:2)
有可能,但是使用rolling
获得第一个NaN
或最后一个df['a'] = df['Numbers'].rolling(10).sum()
print (df)
Numbers a
0 1 NaN
1 2 NaN
2 3 NaN
3 4 NaN
4 5 NaN
5 6 NaN
6 7 NaN
7 8 NaN
8 9 NaN
9 10 55.0
10 11 65.0
11 12 75.0
12 13 85.0
:
df['a'] = df['Numbers'].rolling(10).sum().shift(-9)
print (df)
Numbers a
0 1 55.0
1 2 65.0
2 3 75.0
3 4 85.0
4 5 NaN
5 6 NaN
6 7 NaN
7 8 NaN
8 9 NaN
9 10 NaN
10 11 NaN
11 12 NaN
12 13 NaN
min_periods=1
或使用参数1
进行间隔为1-2
,1-3
,NaN
...的计数以填充前df['a'] = df['Numbers'].rolling(10, min_periods=1).sum()
print (df)
Numbers a
0 1 1.0
1 2 3.0
2 3 6.0
3 4 10.0
4 5 15.0
5 6 21.0
6 7 28.0
7 8 36.0
8 9 45.0
9 10 55.0
10 11 65.0
11 12 75.0
12 13 85.0
s:
1
编辑:
如果可能,请添加下一个值-在61
到NaN
范围内,然后可以创建滚动,然后删除最后df = pd.DataFrame({'Numbers':range(1, 61)})
df['a'] = df['Numbers'].rolling(10).sum().shift(-9)
df = df.iloc[:50]
的行:
print (df)
Numbers a
0 1 55.0
1 2 65.0
2 3 75.0
3 4 85.0
4 5 95.0
5 6 105.0
6 7 115.0
7 8 125.0
8 9 135.0
9 10 145.0
10 11 155.0
11 12 165.0
12 13 175.0
13 14 185.0
14 15 195.0
15 16 205.0
16 17 215.0
17 18 225.0
18 19 235.0
19 20 245.0
20 21 255.0
21 22 265.0
22 23 275.0
23 24 285.0
24 25 295.0
25 26 305.0
26 27 315.0
27 28 325.0
28 29 335.0
29 30 345.0
30 31 355.0
31 32 365.0
32 33 375.0
33 34 385.0
34 35 395.0
35 36 405.0
36 37 415.0
37 38 425.0
38 39 435.0
39 40 445.0
40 41 455.0
41 42 465.0
42 43 475.0
43 44 485.0
44 45 495.0
45 46 505.0
46 47 515.0
47 48 525.0
48 49 535.0
49 50 545.0
Book