我正在尝试计算熊猫序列的熵。具体来说,我将Direction
中的字符串按顺序分组。具体来说,使用此功能:
diff_dir = df.iloc[0:,1].ne(df.iloc[0:,1].shift()).cumsum()
将返回Direction
中相同的字符串计数,直到更改为止。因此,对于相同的Direction
字符串的每个序列,我想计算X,Y
的熵。
使用代码对相同字符串进行排序:
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 3
9 3
该代码曾经可以使用,但是现在返回错误。我不确定这是否是升级之后。
import pandas as pd
import numpy as np
def ApEn(U, m = 2, r = 0.2):
'''
Approximate Entropy
Quantify the amount of regularity over time-series data.
Input parameters:
U = Time series
m = Length of compared run of data (subseries length)
r = Filtering level (tolerance). A positive number
'''
def _maxdist(x_i, x_j):
return max([abs(ua - va) for ua, va in zip(x_i, x_j)])
def _phi(m):
x = [U.tolist()[i:i + m] for i in range(N - m + 1)]
C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
return (N - m + 1.0)**(-1) * sum(np.log(C))
N = len(U)
return abs(_phi(m + 1) - _phi(m))
def Entropy(df):
'''
Calculate entropy for individual direction
'''
df = df[['Time','Direction','X','Y']]
diff_dir = df.iloc[0:,1].ne(df.iloc[0:,1].shift()).cumsum()
# Calculate ApEn grouped by direction.
df['ApEn_X'] = df.groupby(diff_dir)['X'].transform(ApEn)
df['ApEn_Y'] = df.groupby(diff_dir)['Y'].transform(ApEn)
return df
df = pd.DataFrame(np.random.randint(0,50, size = (10, 2)), columns=list('XY'))
df['Time'] = range(1, len(df) + 1)
direction = ['Left','Left','Left','Left','Left','Right','Right','Right','Left','Left']
df['Direction'] = direction
# Calculate defensive regularity
entropy = Entropy(df)
错误:
return (N - m + 1.0)**(-1) * sum(np.log(C))
ZeroDivisionError: 0.0 cannot be raised to a negative power
答案 0 :(得分:2)
问题出在下面的代码上
(N - m + 1.0)**(-1)
请考虑以下情况:N==1
,并且由于 N = len(U)
,这种情况发生在groupby产生的组的大小为1时。由于m==2
的最终结果为
(1-2+1)**-1 == 0
我们0**-1
的定义是不确定的,因此错误。
现在,如果从理论上看,您如何定义仅具有一个值的时间序列的近似熵;高度不可预测,因此应尽可能高。对于这种情况,让我们将其设置为np.nan
来表示它未定义(熵总是大于0等于0)
import pandas as pd
import numpy as np
def ApEn(U, m = 2, r = 0.2):
'''
Approximate Entropy
Quantify the amount of regularity over time-series data.
Input parameters:
U = Time series
m = Length of compared run of data (subseries length)
r = Filtering level (tolerance). A positive number
'''
def _maxdist(x_i, x_j):
return max([abs(ua - va) for ua, va in zip(x_i, x_j)])
def _phi(m):
x = [U.tolist()[i:i + m] for i in range(N - m + 1)]
C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
if (N - m + 1) == 0:
return np.nan
return (N - m + 1)**(-1) * sum(np.log(C))
N = len(U)
return abs(_phi(m + 1) - _phi(m))
def Entropy(df):
'''
Calculate entropy for individual direction
'''
df = df[['Time','Direction','X','Y']]
diff_dir = df.iloc[0:,1].ne(df.iloc[0:,1].shift()).cumsum()
# Calculate ApEn grouped by direction.
df['ApEn_X'] = df.groupby(diff_dir)['X'].transform(ApEn)
df['ApEn_Y'] = df.groupby(diff_dir)['Y'].transform(ApEn)
return df
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,50, size = (10, 2)), columns=list('XY'))
df['Time'] = range(1, len(df) + 1)
direction = ['Left','Left','Left','Left','Left','Right','Right','Right','Left','Left']
df['Direction'] = direction
# Calculate defensive regularity
print (Entropy(df))
输出:
Time Direction X Y ApEn_X ApEn_Y
0 1 Left 6 16 0.287682 0.287682
1 2 Left 22 6 0.287682 0.287682
2 3 Left 16 5 0.287682 0.287682
3 4 Left 5 48 0.287682 0.287682
4 5 Left 11 21 0.287682 0.287682
5 6 Right 44 25 0.693147 0.693147
6 7 Right 14 12 0.693147 0.693147
7 8 Right 43 40 0.693147 0.693147
8 9 Left 46 44 NaN NaN
9 10 Left 49 2 NaN NaN
更大的样本(导致0 **-1问题)
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,50, size = (100, 2)), columns=list('XY'))
df['Time'] = range(1, len(df) + 1)
direction = ['Left','Right','Up','Down']
df['Direction'] = np.random.choice((direction), len(df))
print (Entropy(df))
输出:
Time Direction X Y ApEn_X ApEn_Y
0 1 Left 44 47 NaN NaN
1 2 Left 0 3 NaN NaN
2 3 Down 3 39 NaN NaN
3 4 Right 9 19 NaN NaN
4 5 Up 21 36 NaN NaN
.. ... ... .. .. ... ...
95 96 Up 19 33 NaN NaN
96 97 Left 40 32 NaN NaN
97 98 Up 36 6 NaN NaN
98 99 Left 21 31 NaN NaN
99 100 Right 13 7 NaN NaN
答案 1 :(得分:1)
看来,当调用ApEn._phi()
函数时,N
和m
的特定值可能最终返回0
。然后需要将该值提高到-1的负幂,但是它是不确定的(另请参见Why does zero raised to the power of negative one equal infinity?)。
为说明起见,我尝试专门复制您的方案,在transform
操作的第一次迭代中,将发生以下情况:
U is: 1 0
2 48
(第一个分组依据有2个元素)
N is: 2
m is: 3
如此有效地达到_phi()
的返回值时,您正在执行(N - m + 1.0)**-1 = (2 - 3 + 1)**-1 = 0**-1
,这是未定义的。也许这里的关键是您说要按各个方向进行分组,并将U
数组传递到近似熵函数中,但是实际上是按diff_X
和diff_Y
分组,因此由于所应用方法的性质,导致结果非常小。据我了解,如果要计算每个方向的近似熵,只需按“方向”分组即可。
def Entropy(df):
'''
Calculate entropy for individual direction
'''
# Calculate ApEn grouped by direction.
df['ApEn_X'] = df.groupby('Direction')['X'].transform(ApEn)
df['ApEn_Y'] = df.groupby('Direction')['Y'].transform(ApEn)
return df
这将导致这样的数据帧:
entropy.head()
Time Direction X Y ApEn_X ApEn_Y
0 1 Left 28 47 0.035091 0.035091
1 2 Up 8 47 0.013493 0.046520
2 3 Up 0 32 0.013493 0.046520
3 4 Right 34 8 0.044452 0.044452
4 5 Right 49 27 0.044452 0.044452
答案 2 :(得分:0)
您必须处理ZeroDivision。也许是这样:
<marquee behavior="scroll" scrollamount="10" direction="left" width="100%" onmouseover="this.stop();" onmouseout="this.start();" onclick="this.stop();">
<ul style="list-style:none;">
<li class="text-center p-0 token_list_item">
<div class="border-right px-2 border-secondary">
<img src="https://via.placeholder.com/25/bf9763/808080?text=1" class="rounded-circle">
<span>100</span> Test 1
<div class="dropdown" style="display: inline-block;">
<a href="javascript://" class="p-1 dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Button</a>
<div class="dropdown-menu" aria-labelledby="dropdownMenuButton">
<a class="dropdown-item" href="#">Action</a>
<a class="dropdown-item" href="#">Another action</a>
<a class="dropdown-item" href="#">Something else here</a>
</div>
</div>
</div>
</li>
<li class="text-center p-0 token_list_item">
<div class="border-right px-2 border-secondary">
<img src="https://via.placeholder.com/25/77bf63/808080?text=2" class="rounded-circle">
<span>150</span> Test 2
<div class="dropdown" style="display: inline-block;">
<a href="javascript://" class="p-1 dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Button</a>
<div class="dropdown-menu" aria-labelledby="dropdownMenuButton">
<a class="dropdown-item" href="#">Action</a>
<a class="dropdown-item" href="#">Another action</a>
<a class="dropdown-item" href="#">Something else here</a>
</div>
</div>
</div>
</li>
<li class="text-center p-0 token_list_item">
<div class="border-right px-2 border-secondary">
<img src="https://via.placeholder.com/25/63bfb1/808080?text=3" class="rounded-circle">
<span>90</span> Test 3
<div class="dropdown" style="display: inline-block;">
<a href="javascript://" class="p-1 dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Button</a>
<div class="dropdown-menu" aria-labelledby="dropdownMenuButton">
<a class="dropdown-item" href="#">Action</a>
<a class="dropdown-item" href="#">Another action</a>
<a class="dropdown-item" href="#">Something else here</a>
</div>
</div>
</div>
</li>
</ul>
</marquee>
然后,您将在 groupby 上遇到长度不匹配的情况, df 和 diff_X 的长度必须相同。