如何将值与循环中的下一个或上一个项目进行比较? 我需要总结列中连续出现的重复次数。
之后我需要创建“频率表”,以便dfoutput schould看起来像在底部图片。
此代码不起作用,因为我无法与其他项目进行比较。
也许有另一种简单的方法可以在没有循环的情况下做到这一点?
sumrep=0
df = pd.DataFrame(data = {'1' : [0,0,1,0,1,1,0,1,1,0,1,1,1,1,0],'2' : [0,0,1,1,1,1,0,0,1,0,1,1,0,1,0]})
df.index= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] # It will be easier to assign repetitions in output df - index will be equal to number of repetitions
dfoutput = pd.DataFrame(0,index=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],columns=['1','2'])
#example for column 1
for val1 in df.columns[1]:
if val1 == 1 and val1 ==0: #can't find the way to check NEXT val1 (one row below) in column 1 :/
if sumrep==0:
dfoutput.loc[1,1]=dfoutput.loc[1,1]+1 #count only SINGLE occurences of values and assign it to proper row number 1 in dfoutput
if sumrep>0:
dfoutput.loc[sumrep,1]=dfoutput.loc[sumrep,1]+1 #count repeated occurences greater then 1 and assign them to proper row in dfoutput
sumrep=0
elif val1 == 1 and df[val1+1]==1 :
sumrep=sumrep+1
第1列的所需输出表 - dfoutput:
我不明白为什么没有任何简单的方法来移动Excel中的VBA中的偏移函数等数据框:/
答案 0 :(得分:1)
您可以使用定义的函数here来执行快速运行长度编码:
import numpy as np def rlencode(x, dropna=False): """ Run length encoding. Based on http://stackoverflow.com/a/32681075, which is based on the rle function from R. Parameters ---------- x : 1D array_like Input array to encode dropna: bool, optional Drop all runs of NaNs. Returns ------- start positions, run lengths, run values """ where = np.flatnonzero x = np.asarray(x) n = len(x) if n == 0: return (np.array([], dtype=int), np.array([], dtype=int), np.array([], dtype=x.dtype)) starts = np.r_[0, where(~np.isclose(x[1:], x[:-1], equal_nan=True)) + 1] lengths = np.diff(np.r_[starts, n]) values = x[starts] if dropna: mask = ~np.isnan(values) starts, lengths, values = starts[mask], lengths[mask], values[mask] return starts, lengths, values
使用此功能,您的任务变得更加容易:
import pandas as pd
from collections import Counter
from functools import partial
def get_frequency_of_runs(col, value=1, index=None):
_, lengths, values = rlencode(col)
return pd.Series(Counter(lengths[np.where(values == value)]), index=index)
df = pd.DataFrame(data={'1': [0,0,1,0,1,1,0,1,1,0,1,1,1,1,0],
'2': [0,0,1,1,1,1,0,0,1,0,1,1,0,1,0]})
df.apply(partial(get_frequency_of_runs, index=df.index)).fillna(0)
# 1 2
# 0 0.0 0.0
# 1 1.0 2.0
# 2 2.0 1.0
# 3 0.0 0.0
# 4 1.0 1.0
# 5 0.0 0.0
# 6 0.0 0.0
# 7 0.0 0.0
# 8 0.0 0.0
# 9 0.0 0.0
# 10 0.0 0.0
# 11 0.0 0.0
# 12 0.0 0.0
# 13 0.0 0.0
# 14 0.0 0.0