我一直在研究[tensorflow haskell
bindings。但是,我很难
来自的基本线性回归示例
readme才能正常工作:
它似乎似乎很容易完成任务:学习行y = 2x+3
,
又称简单线性回归,使用梯度下降。
我制造了一个
github repo
包含一个可运行的示例(使用stack
+ nix
),但这是要点:
-- | compute simple linear regression, using gradient descent on tensorflow
simpleLinearRegression' :: Float -> [Float] -> [Float] -> IO (Float, Float)
simpleLinearRegression' learningRate x y =
TFL.withEventWriter "test.log" $ \eventWriter -> TF.runSession $ do
let x' = TF.vector x
y' = TF.vector y
b0 <- TF.initializedVariable 0
b1 <- TF.initializedVariable 0
let yHat = (x' * TF.readValue b1) + TF.readValue b0
loss = TFC.square $ yHat - y'
TFL.histogramSummary "losses" loss
TFL.scalarSummary "error" $ TF.reduceSum loss
TFL.scalarSummary "intercept" $ TF.readValue b0
TFL.scalarSummary "weight" $ TF.readValue b1
trainStep <- TF.minimizeWith (TF.gradientDescent learningRate)
loss
[b0, b1]
summaryT <- TFL.mergeAllSummaries
forM_ ([1 .. iterations] :: [Int64]) $ \step -> do
if step `mod` logEveryNth == 0
then do
-- TF.run_ trainStep
((), summaryBytes) <- TF.run (trainStep, summaryT)
(TF.Scalar beta0, TF.Scalar beta1) <- TF.run
(TF.readValue b0, TF.readValue b1)
-- liftIO $ putStrLn $ "Y = " ++ show beta1 ++ "X + " ++ show beta0
let summary = decodeMessageOrDie (TF.unScalar summaryBytes)
TFL.logSummary eventWriter step summary
else TF.run_ trainStep
(TF.Scalar b0', TF.Scalar b1') <- TF.run (TF.readValue b0, TF.readValue b1)
return (b0', b1')
这基本上是自述文件中的代码,在这里我打开了learningRate
到参数中并为张量板添加了一些日志记录(这对我没有帮助
理解问题所在。)
有一个小型测试套件展示了不同的情况:
linearRegressionSpec :: Spec
linearRegressionSpec = do
-- n = 6 vs n = 7 on same x range: PASS vs FAIL (beta0, beta1: NaN)
linearRegressionTest 0.01 3 2 $ equidist 6 1 6
linearRegressionTest 0.01 3 2 $ equidist 7 1 6
-- n = 6, larger x range: PASS vs FAIL
linearRegressionTest 0.01 3 2 $ equidist 6 1 6
linearRegressionTest 0.01 3 2 $ equidist 6 1 7
-- n = 12 vs n = 13: PASS vs FAIL (beta0, beta1: NaN) (reduced learning rate)
linearRegressionTest 0.005 3 2 $ equidist 12 1 6
linearRegressionTest 0.005 3 2 $ equidist 13 1 6
-- another one, different learning rate, but diverges with growing sample size.
-- this is the learning rate used in the Readme.
linearRegressionTest 0.001 3 2 $ equidist 26 1 10
linearRegressionTest 0.001 3 2 $ equidist 27 1 10
-- n = 99 vs n = 100, ranging from -1 to 1: PASS vs FAIL (beta1 estimate = 0)
-- this one is different: the failing case does not diverge.
linearRegressionTest 0.01 3 2 $ equidist 99 (-1) 1
linearRegressionTest 0.01 3 2 $ equidist 100 (-1) 1
linearRegressionTest 0.001 3 2 $ equidist 100 (-1) 1
-- initial goal: fit linear regression on advertising data from ISLR, Chapter 3.1
islrOLSSpec
-- | produce a list of n values equally distributed over the range (minX, maxX)
equidist :: Int -> Float -> Float -> [Float]
equidist n minX maxX =
let n' = fromIntegral $ n - 1
f k = ((n' - k) * minX + k*maxX) / n'
in f <$> [0 .. n']
roughlyEqual :: (Num a, Ord a, Fractional a) => a -> a -> Bool
roughlyEqual expected actual = 0.01 > abs (expected - actual)
-- switching between different implementations
-- fitFunction = Readme.fit
fitFunction = simpleLinearRegression'
-- fitFunction = simpleLinearRegressionMMH
linearRegressionTest :: Float -> Float -> Float -> [Float] -> Spec
linearRegressionTest learnRate beta0 beta1 xs = do
let ys = (\x -> beta1*x + beta0) <$> xs
it ("linear regression on one variable, n = " ++
show (length xs) ++ ", range (" ++ show (head xs) ++ ", " ++ show (last xs) ++ ")") $ do
(beta0Hat, beta1Hat) <- fitFunction learnRate (fromList xs) (fromList ys)
beta0Hat `shouldSatisfy` roughlyEqual beta0
beta1Hat `shouldSatisfy` roughlyEqual beta1
我从这些中学到的东西:
但是,我对这种行为感到困惑。我不希望这样的分歧 在我看来非常小的数据集方面,这是一个大问题。
问题:
在尝试拟合初始简单线性回归后,我开始了这项研究
Introduction to Statistical Learning第3.1章中的示例。
我可以得到一个示例(将销售量退还到电视上)以达到0.0000001
的学习率,
需要非常多的步骤。