我有一组稀疏的x-y对数据需要平均。我认为我可以使用一系列搜索索引操作来强制解决问题(因为x数据在重叠时应该匹配),但我觉得应该有一个更好的解决方案,我错过了..
我开始编写代码来生成随机数据,但是在这里显示绘图可能更容易(以对数刻度绘制以突出显示不匹配的数据长度)。
提前致谢。
编辑:这是随机生成数据以匹配感兴趣的数据格式的代码。 ys的平均值是我在保持x位置的同时。
from random import random
# Constructing random example data
x1 = range(1,100)
x2 = range(40,150)
x3 = range(30, 200)
xList = [x1, x2, x3]
y1 = [None for x in x1]
y2 = [None for x in x2]
y3 = [None for x in x3]
yList = [y1, y2 ,y3]
for i in range(0,len(xList)):
for j in range(0,len(xList[i])):
yList[i][j] = random()
下面,我将展示我的蛮力方法。我还没有将它概括为for循环,但只是看一看,这就是我要做的。对不起,我不是一个贸易程序员,所以这可能是一个非常迂回的方式。
# Brute force method
minX = min(x1[0], x2[0], x3[0])
maxX = max(x1[-1], x2[-1], x3[-1])
xAll = range(minX, maxX + 1)
x1startIndex = xAll.index(x1[0])
x2startIndex = xAll.index(x2[0])
x3startIndex = xAll.index(x3[0])
x1endIndex = xAll.index(x1[-1])
x2endIndex = xAll.index(x2[-1])
x3endIndex = xAll.index(x3[-1])
from numpy import nan, arange, vstack, nanmean
x1EmptyHead = [nan for x in arange(0,x1startIndex)] #create empty head
x2EmptyHead = [nan for x in arange(0,x2startIndex)]
x3EmptyHead = [nan for x in arange(0,x3startIndex)]
x1EmptyTail = [nan for x in arange(x1endIndex+1,len(xAll))] #create empty tail
x2EmptyTail = [nan for x in arange(x2endIndex+1,len(xAll))]
x3EmptyTail = [nan for x in arange(x3endIndex+1,len(xAll))]
y1EqualLength = x1EmptyHead + y1 + x1EmptyTail #create equal length y-data
y2EqualLength = x2EmptyHead + y2 + x2EmptyTail
y3EqualLength = x3EmptyHead + y3 + x3EmptyTail
yConcat = vstack((y1EqualLength, y2EqualLength, y3EqualLength)) # concatenate
yMean = nanmean(yConcat, axis=0) # arithmetic mean ignoring NaNs
答案 0 :(得分:0)
IIUC,你可以在public static Map<String, String> buildMapFromStringArray( String [] [] stringArray) {
if (stringArray == null ) {
throw new IllegalArgumentException("buildMapFromStringArray: stringArray is null");
}
Map<String, String> map = new HashMap<String, String>( 1 + (2 * stringArray.length) );
for ( String[] keyValue : stringArray) {
map.put(keyValue[0], keyValue[1]);
}
return map;
}
中非常干净地完成你所做的事情,因为你可以利用它的索引对齐。例如,如果我从您的示例的较小版本开始:
pandas
给了我
import pandas as pd
df = pd.concat([pd.Series(y,index=x) for x,y in zip(xList, yList)], axis=1)
df["avg"] = df.mean(axis=1)
>>> df
0 1 2 avg
1 0.956034 NaN NaN 0.956034
2 0.947827 NaN NaN 0.947827
3 0.056551 NaN NaN 0.056551
4 0.084872 0.835499 NaN 0.460185
5 NaN 0.735970 NaN 0.735970
6 NaN 0.669730 NaN 0.669730
7 NaN 0.308136 0.581204 0.444670
8 NaN 0.605944 0.158383 0.382164
9 NaN 0.606802 0.430670 0.518736
10 NaN NaN 0.393532 0.393532
11 NaN NaN 0.723012 0.723012
12 NaN NaN 0.994820 0.994820
13 NaN NaN 0.949395 0.949395
14 NaN NaN 0.544177 0.544177
会自动忽略NaN值。然后可以绘制出来: