我正在以2000 Hz的频率记录数据,这意味着每0.5毫秒我会有另一个数据点。但是我的记录软件只能以1毫秒的精度进行记录,因此这意味着我在使用float类型的数据框索引中有重复的值。
因此,为了修复重复项,我想向索引的其他每一行添加0.005。我试过了,但是到目前为止它不起作用:
c = df.iloc[:,0] # select the first column of the dataframe
c = c.iloc[::-1] # reverse order so that time is increasing not decreasing
pd.set_option('float_format', '{:f}'.format) # change the print output to show the decimals (instead of 15.55567E9)
i = c.index # get the index of c - the length is 20000
rp = np.matlib.repmat([0, 0.0005], 1, 10000) # create an array to repeat .0005 0 so that we can add 0.005 to every other row
df.set_index(c, i+rp).astype(float).applymap('{:,.4f}'.format) # set the index of c to i+rp - attempt to format to 4 decimals
print(c) # see if it worked
预期输出:(为节省空间而进行了调整-不显示所有20,000行)
1555677243.401000 4.569000
1555677243.401500 4.569000
1555677243.402000 4.571000
1555677243.402500 4.574000
1555677243.403000 4.574000
1555677243.403500 4.576000
1555677243.404000 4.577000
1555677243.404500 4.577000
1555677243.405000 4.577000
1555677243.405500 4.581000
1555677243.406000 4.581000
1555677243.406500 4.582000
1555677243.407000 4.581000
1555677243.407500 4.582000
1555677243.408000 4.580000
1555677243.408500 4.580000
1555677243.409000 4.582000
1555677243.409500 4.585000
1555677243.410000 4.585000
1555677243.410500 4.585000
实际输出:(注意索引中的重复项)
1555677243.401000 4.569000
1555677243.401000 4.569000
1555677243.402000 4.571000
1555677243.402000 4.574000
1555677243.403000 4.574000
1555677243.403000 4.576000
1555677243.404000 4.577000
1555677243.404000 4.577000
1555677243.405000 4.577000
1555677243.405000 4.581000
1555677243.406000 4.581000
1555677243.406000 4.582000
1555677243.407000 4.581000
1555677243.407000 4.582000
1555677243.408000 4.580000
1555677243.408000 4.580000
1555677243.409000 4.582000
1555677243.409000 4.585000
1555677243.410000 4.585000
1555677243.410000 4.585000
答案 0 :(得分:2)
df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
'B': [1,2,3,4,5,6,7,8,9]})
df.iloc[1::2, 1] = df.iloc[1::2, :].eval('B + 0.005')
A B
0 1 1.000
1 2 2.005
2 3 3.000
3 4 4.005
4 5 5.000
5 6 6.005
6 7 7.000
7 8 8.005
8 9 9.000
只需确保您使用初始iloc选择正确的列。 [1 :: 2]是从索引1开始的每隔1个(所以1,3等)。您需要选择第二个iloc中的所有列,因为eval仅适用于df而不适用于系列。然后,您可以像在代码中一样将该列设置为索引。
答案 1 :(得分:1)
我没有您的数据框,但是您可能会考虑在像偶数/奇数索引之间创建一个循环。您能向我们展示原始DF吗?
data = pd.read_csv('C:/random/d2', sep=',', header=None,names=['W1','W2'])
df=pd.DataFrame(data)
dfNew=pd.DataFrame(columns=['W1','W2'])
rows,clumns=df.shape
for index in range(rows):
if(index %2==0):
tempRow=['{0:.6f}'.format(df.iat[index,0]), df.iat[index,1]]
else:
tempRow=['{0:.6f}'.format(df.iat[index,0]+0.0005), df.iat[index,1]]
dfNew.loc[len(dfNew)]=tempRow
print(df)
print('#############')
print(dfNew)
1555677243.401000,4.569000
1555677243.401000,4.569000
1555677243.402000,4.571000
1555677243.402000,4.574000
1555677243.403000,4.574000
1555677243.403000,4.576000
1555677243.404000,4.577000
1555677243.404000,4.577000
1555677243.405000,4.577000
1555677243.405000,4.581000
1555677243.406000,4.581000
1555677243.406000,4.582000
1555677243.407000,4.581000
1555677243.407000,4.582000
1555677243.408000,4.580000
1555677243.408000,4.580000
1555677243.409000,4.582000
1555677243.409000,4.585000
1555677243.410000,4.585000
1555677243.410000,4.585000
W1 W2
0 1555677243.401000 4.569
1 1555677243.401500 4.569
2 1555677243.402000 4.571
3 1555677243.402500 4.574
4 1555677243.403000 4.574
5 1555677243.403500 4.576
6 1555677243.404000 4.577
7 1555677243.404500 4.577
8 1555677243.405000 4.577
9 1555677243.405500 4.581
10 1555677243.406000 4.581
11 1555677243.406500 4.582
12 1555677243.407000 4.581
13 1555677243.407500 4.582
14 1555677243.408000 4.580
15 1555677243.408500 4.580
16 1555677243.409000 4.582
17 1555677243.409500 4.585
18 1555677243.410000 4.585
19 1555677243.410500 4.585
答案 2 :(得分:1)
您可以拉出索引,将其转换为Series
,对其进行修改,然后再放回索引(Indexes
是不可变的):
import pandas as pd
df = pd.DataFrame(list(range(10)), index=[x/ 1000 for x in range(10)])
new_index = df.index.to_series()
new_index[::2] += 0.0005
result = df.set_index(new_index)
print(result)
输出:
0
0.0005 0
0.0010 1
0.0025 2
0.0030 3
0.0045 4
0.0050 5
0.0065 6
0.0070 7
0.0085 8
0.0090 9
答案 3 :(得分:1)
来自gmds的IIUC数据
df.index+=np.arange(len(df))%2*0.0005
df
0
0.0000 0
0.0015 1
0.0020 2
0.0035 3
0.0040 4
0.0055 5
0.0060 6
0.0075 7
0.0080 8
0.0095 9