R从gam

时间:2017-05-18 01:28:18

标签: r smoothing curves

我有大约80条曲线,这个通用形式是使用包mgcv从gam平滑产生的(这是两条曲线的输入):

curve1 = structure(c(7.49350131435014, 9.20913921518434, 10.897558273626, 12.5315396472817, 14.0838644937566, 15.5273139706588, 16.8354309019618, 17.9992764380826, 19.0274300558767, 19.9292328985738, 20.714026109397, 21.3911508315718, 21.9755738920627, 22.5047648527841, 23.0218189593889, 23.5698314575296, 24.1918975928601, 24.9307185773568, 25.8199328484249, 26.8841160688986, 28.1474498679369, 29.6341158746979, 31.3682957183393, 33.3653189702521, 35.6051069707551, 38.0587290024005, 40.6972543477403, 43.4917522893258, 46.412585160685, 49.4138554677925, 52.4334058890734, 55.4083721539251, 58.2758899917481, 60.973095131941, 63.4387734705847, 65.6183115704873, 67.4587461611377, 68.9071139720254, 69.9104517326394, 70.4166212524078, 70.3924611793416, 69.8237870000362, 68.6972392810251, 66.9994585888416, 64.7170854900195, 61.8447094036925, 58.4087151593964, 54.4434364392668, 49.9832069254401, 45.0623603000519, 39.7160061164762, 33.9970989665565, 27.9764384806052, 21.7256001601735, 15.3161595068118, 8.81969202207108, 2.30818949037147, -4.14469117238674, -10.4648767674334, -16.5782940959985, -22.4108699593122, -27.8894639845452, -32.9623907955166, -37.5994200126921, -41.7712540824789, -45.4485954512844, -48.6021465655158, -51.2130051654573, -53.3038501669034, -54.9077557795256, -56.057796212995, -56.7870456769835, -57.1291276417828, -57.130298571955, -56.8494479263324, -56.3460144243683, -55.6794367855146, -54.9091537292243, -54.0902712864655, -53.2605647342642, -52.4534766611619, -51.702449655701, -51.0409263064234, -50.5015466282473, -50.0984914427865, -49.8274823783449, -49.6834384896055, -49.6612788312509, -49.7559224579641, -49.9605546528188, -50.2614256124518, -50.6430517618898, -51.0899495261606, -51.5866353302916, -52.1178475575856, -52.6734296317039, -53.2483300166574, -53.8377191347404, -54.4367674082406, -55.0406452594513), .Dim = c(100L, 1L), .Dimnames = list(NULL, "pd_2"))
curve2 = structure(c(-4.50299508184076, -3.70453890848835,-2.91058337080674, -2.12562910446626, -1.35417674513703, -0.600726928489934, 0.130318651354656, 0.836833955940232, 1.51896860247409, 2.1769711497127, 2.81109015641316, 3.42157418133328, 4.00979189203588, 4.58159239130783, 5.1439448907422, 5.70381860193193, 6.26818273646995, 6.84402482790598, 7.4387538147967, 8.06020004070592, 8.71621217115462, 9.41463887166305, 10.163328807752, 10.9689152542477, 11.8331699231994, 12.7566491359618, 13.7399092138897, 14.783506478338, 15.8877756023413, 17.0479533475697, 18.254178564329, 19.4963684546043, 20.7644402203811, 22.0483110636448, 23.3368154616899, 24.6144569930464, 25.8646565115534, 27.07083487105, 28.2164129253751, 29.2848459145735, 30.2603799614208, 31.1280520714228, 31.8729336362912, 32.4800960477379, 32.9346106974744, 33.2229344326633, 33.33706592227, 33.2703892907107, 33.0162886624017, 32.5681481617592, 31.9195773795413, 31.0693716323692, 30.0215119627273, 28.7802048794415, 27.3496568913384, 25.7340745072439, 23.939275491331, 21.9775226291594, 19.8626899616355, 17.6086515296656, 15.2292813741561, 12.7385709973978, 10.153213513519, 7.49260364848593, 4.77625358964919, 2.02367552435912, -0.745618360033754, -3.5122785079423, -6.25760589083342, -8.96306411193715, -11.6101167744838, -14.1802274817035, -16.6550294974636, -19.0200582802881, -21.2647514833568, -23.3787164204875, -25.3515604054971, -27.1728907522033, -28.8342453001256, -30.3348839915929, -31.6759972946368, -32.8587756772887, -33.8844096075799, -34.7542687297092, -35.4738437397274, -36.0527463855374, -36.5007675912098, -36.8276982808148, -37.0433293784224, -37.157717968495, -37.1819857770595, -37.1275206905347, -37.0057105953392, -36.8279433778919, -36.6055169318738, -36.3476593180057, -36.0615287640475, -35.7541935050218, -35.4327217759511, -35.1041818118577), 
    .Dim = c(100L, 1L), .Dimnames = list(NULL, "pd_2"))

它们都以零为中心,但零点周围的分布在曲线之间是不同的,即最低值在曲线之间变化,所以它并不像每个值增加55那么简单,因为理想情况下每个值的最小值是在零。曲线的实际值无关紧要,它们只相对于彼此有趣。如何将所有曲线批量移动到零以上,保持它们相对于彼此的一般尺寸?

因此,目标是移动每条曲线,使其最负值为零,假设每条曲线的负值最大。

编辑:Gavin提出了这个解决方案

假设我们有一个模型m< -gam(y~s(x),data = foo),那么我们可以在x的范围内从模型中预测一组x,x'的新值。新值以newd:newd< - with(foo,data.frame(x = seq(min(x),max(x),length = 200)))。我们预测使用predict(),将预测值添加到newd:newd< - transform(newd,fitted = predict(m,newdata = newd,type =“response”))。现在你可以用绘图绘图(拟合~x,data = newd,type =“l”),你会看到适当比例的曲线。

现在Gavin也纠正了我的公式(见评论): test.gam< -gam(ch1~s(row_id),data = test)

newd< -with(test,data.frame(row_id = seq(min(row_id),max(row_id),length = 1197)))

newd< -transform(newd,fitted = predict(test.gam,newdata = newd,type =“response”))

谢谢Gavin !!

1 个答案:

答案 0 :(得分:0)

如果曲线在列表中:

lapply

尝试l.min.zero = lapply(l, function(x) x - min(x)) 将其转换为最小值为零的曲线列表:

app/