我有dataset 温度为一列。由于加热器的工作原理,数据中存在许多空白。为了使不同的数据集直接可比,我想填写这些缺失的温度并在另一列中添加相应的NaN。
我试图使用这里给出的答案,这似乎正是我想要的:link。 但这不起作用 - 我得到一个具有我想要的新温度值的数据框,但相应的数据已经消失:
import pandas as pd
import numpy as np
A1 = pd.read_table('Test data.tsv', encoding='ISO-8859-1', header = 2)
A1.columns = ['time',2,3,4,5,6,7,'freq',9,10,11,12,13,'temp',15,16,17,18,19]
A1truncated = A1[A1.temp >= 25]; A1truncated=A1truncated[A1truncated.temp <= 350.1]
A1averaged = A1truncated.groupby(['temp'], as_index=False)['freq'].mean()
A1averaged = np.around(A1averaged, decimals=1)
A1averaged.set_index('temp')
new_index = pd.Index(np.arange(25, 350, 0.1), name='temp')
A1indexed = A1averaged.set_index('temp').reindex(new_index).reset_index()
将我的19列变为1,温度为索引(A1averaged),然后变为2列,其中包含新的温度列表和一列空数据(A1indexed)。 任何想法为什么这不起作用?还是另一种方法呢?
答案 0 :(得分:1)
带浮点数的索引reindex
有问题,不一致可能是因为浮点精度。所以我使用小骇客 - Int64Index
代替Float64Index
。
我尝试更简单地设置子集:
A1truncated = A1[(A1.temp >= 25) & ( A1.temp <= 350.1)]
然后省略第一个设置索引,因为设置了两次:
A1averaged.set_index('temp')
将new_index
设为Int64Index
:
new_index = pd.Index(np.arange(250, 3500), name='temp')
并使用Int64Index
乘以temp
列10
,最后此列除以10
。
A1averaged['temp'] = A1averaged['temp'] * 10
A1indexed['temp'] = A1indexed['temp'] / 10
所有在一起:
import pandas as pd
import numpy as np
A1 = pd.read_table('Test data.tsv', encoding='ISO-8859-1', header = 2)
A1.columns = ['time',2,3,4,5,6,7,'freq',9,10,11,12,13,'temp',15,16,17,18,19]
A1truncated = A1[(A1.temp >= 25) & ( A1.temp <= 350.1)]
A1averaged = A1truncated.groupby(['temp'], as_index=False)['freq'].mean()
A1averaged = np.around(A1averaged, decimals=1)
new_index = pd.Index(np.arange(250, 3500), name='temp')
A1averaged['temp'] = A1averaged['temp'] * 10
A1indexed = A1averaged.set_index('temp').reindex(new_index).reset_index()
A1indexed['temp'] = A1indexed['temp'] / 10
print A1indexed.tail()
# temp freq
#3245 349.5 5830065.6
#3246 349.6 5830043.5
#3247 349.7 5830046.3
#3248 349.8 5830025.3
#3249 349.9 5830015.6