Question

我正在使用python 2.7中的一个点列表并在数据上运行一些插值。我的列表有超过5000点，我的列表中有一些重复的“x”值。这些重复的“x”值具有不同的对应“y”值。我想摆脱这些重复点，以便我的插值函数可以工作，因为如果有重复的“x”值具有不同的“y”值，它会运行一个错误，因为它不满足函数的标准。这是我想要做的一个简单的例子：

Input:
x = [1,1,3,4,5]
y = [10,20,30,40,50]


Output:
xy = [(1,10),(3,30),(4,40),(5,50)]

我使用的插值函数是InterpolatedUnivariateSpline(x, y)

Answer 1

有一个存储前一个X值的变量，如果它与当前值相同则跳过当前值。

例如（伪代码，你做python），

int previousX = -1

foreach X
{
    if(x == previousX)
    {/*skip*/}
    else
    {
        InterpolatedUnivariateSpline(x, y)
        previousX = x /*store the x value that will be "previous" in next iteration
    }
}

我假设你已经在迭代，所以你不需要实际的python代码。

Answer 2

有点晚了，但是如果有人感兴趣，这是一个使用numpy和pandas的解决方案：

import pandas as pd
import numpy as np
x = [1,1,3,4,5]
y = [10,20,30,40,50]

#convert list into numpy arrays:
array_x, array_y = np.array(x), np.array(y)

# sort x and y by x value
order = np.argsort(array_x)
xsort, ysort = array_x[order], array_y[order]

#create a dataframe and add 2 columns for your x and y data:
df = pd.DataFrame()
df['xsort'] = xsort
df['ysort'] = ysort

#create new dataframe (mean) with no duplicate x values and corresponding mean values in all other cols:
mean = df.groupby('xsort').mean()
df_x = mean.index
df_y = mean['ysort']

# poly1d to create a polynomial line from coefficient inputs:
trend = np.polyfit(df_x, df_y, 14)
trendpoly = np.poly1d(trend)

# plot polyfit line:
plt.plot(df_x, trendpoly(df_x), linestyle=':', dashes=(6, 5), linewidth='0.8',
color=colour, zorder=9, figure=[name of figure])

此外，如果仅按x的顺序对值使用argsort（），则即使不必删除重复的x值，插值也应有效。尝试我自己的数据集：

polyfit on its own

sorting data in order of x first, then polyfit

sorting data, delete duplicates, then polyfit

...我两次得到相同的结果

删除重复的x值及其对应的y值

2 个答案: