我的数据包括时间和通量(4116行×2列)。我想通过计算两个连续点之间的通量差来找出亮度变化的分布并计算出现次数。我试图规范化数据(mydata_nor)然后我采取差异(d)但我无法计算出现的次数。另外,我不确定这段代码是否正确。我试图在“通量差异”和“计数”之间绘制一个图表。以下几行显示了mydata的样子:
352.3771366 20458.564
352.3975695 20458.295
352.4384352 20454.715
352.4588681 20468.422
352.4793010 20460.531
352.4997339 20465.701
352.5201667 20463.215
352.5405995 20463.814
352.5610325 20463.986
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
mydata = pd.read_csv('kplr31.txt')
mydata_nor = (mydata - mydata.mean()) / (mydata.max() - mydata.min())
d = np.diff(mydata_nor)
答案 0 :(得分:0)
我认为您的d
会错过参数axis=0
,否则它不会沿着右轴进行。
d = np.diff(mydata_nor,axis=0)
但要做得有点不同,你可以这样做:
mydata_nor = (mydata - mydata.mean()) / (mydata.max() - mydata.min())
# create the column diff_flux with diff()
mydata_nor['diff_flux'] = mydata_nor['flux'].diff()
现在获得带有diff_flux的DF和出现次数:
df_output = (mydata_nor.groupby('diff_flux') #groupby diff_flux value
.count() # count the occurence for each diff_flux
.rename(columns = {'time':'count'}) #rename time by count
.drop('flux',1) #drop the column flux as it's not necessary
.reset_index()) # reset_index to have diff_flux as a column
根据您获得的数据,它给出:
diff_flux count
0 -0.575691 1
1 -0.261180 1
2 -0.181367 1
3 -0.019625 1
4 0.012548 1
5 0.043700 1
6 0.377180 1
7 1.000000 1