Last active
October 13, 2021 13:07
-
-
Save mschauer/d6a9814a7adf48131ad8f81f569a57f9 to your computer and use it in GitHub Desktop.
Non-parametric Bayesian regression in Fourier domain
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| n = 1000 | |
| x = range(0, 1, length=n) | |
| Ο = 1.5 # noise level | |
| ΞΌ = 3*x.*sin.(2pi*x) # periodic signal in time domain | |
| #ΞΌ = 6*sqrt.(abs.(x .- 0.5)) # this one is difficult to estimate | |
| # Model: Signal distorted by white noise | |
| y = ΞΌ + Ο*randn(n) | |
| # Prior power decay of prior samples in frequency domain | |
| Ξ± = 2.0 | |
| # Ξ± = 3.0 for type integrated Brownian motion | |
| # Ξ± = 2.0 for type Brownian motion prior | |
| # Ξ± = 1.0 for a very rough prior (pink noise) | |
| H = (Ξ± - 1)/2 # equivalent Hurst exponent | |
| using FFTW, Statistics, LinearAlgebra, Plots | |
| # Fourier matrix | |
| # β± = [exp(2pi*im*i*j/n) for i in fftshift(-nΓ·2:nΓ·2-1), j in fftshift(-nΓ·2:nΓ·2-1)] | |
| # Define prior on unit interval in frequency domain | |
| # by choosing power decay | |
| function freq_decay(n, Ξ±) #ifftshift(-n/2:n/2-1) | |
| fr = 1 ./ fftfreq(n) | |
| fr[1] = 2n # allow for intercept | |
| fr = abs.(fr) .^(Ξ±/2) # scale frequencies to "unit | |
| fr/norm(fr)*sqrt(n) | |
| end | |
| # Canonical posterior contraction rate | |
| contract(n, H) = n^(-H/(1 + 2H)) | |
| # Compute posterior mean | |
| Ξ³ = Ο^(-2) .+ (freq_decay(n, Ξ±)).^(-2) # posterior precision in frequency domain | |
| ΞΌΜ = real(ifft(Ξ³.\(fft(Ο^(-2)*y)))) | |
| m = ΞΌΜ + sqrt(n)*real(ifft(sqrt.(Ξ³).\randn(Complex{Float64},n))); | |
| # note: `fft(x)/sqrt(length(x))` is isometric | |
| # Compute L2 estimation error, compare with theoretical size | |
| @show norm(ΞΌΜ - ΞΌ)/sqrt(n), Ο*contract(n, H) | |
| # We can even compute the posterior covariance | |
| # Use circulant property of covariance in time domain, could be done with https://github.com/JuliaMatrices/ToeplitzMatrices.jl | |
| Ο = 1.96*sqrt.(abs.((ifft(diag(inv(Diagonal(Ξ³)))))))[1] | |
| # Equivalent to | |
| # Ξ = β±*Diagonal(freq_decay(n, Ξ±).^(-2))*β±'/n/n # β ifft(Diagonal(n. + freq_decay(n, Ξ±).^(-2))) | |
| # Ο = 1.96sqrt.(real(diag(inv((I + Ξ))))) | |
| # Plot | |
| p = scatter(x, y, markersize=0.5, label="obs") | |
| plot!(x, ΞΌΜ, color=:blue, ribbon=Ο,fillalpha=0.2, label="posterior mean") | |
| plot!(x, ΞΌ, color=:green, label="truth") | |
| z = sqrt(n)*real(ifft(randn(Complex{Float64},n).*abs.(freq_decay(n, Ξ±)))); | |
| plot!(x, z, label="prior sample") | |
| plot!(x, m, label="posterior sample") | |
| p |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ### A Pluto.jl notebook ### | |
| # v0.16.1 | |
| using Markdown | |
| using InteractiveUtils | |
| # This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error). | |
| macro bind(def, element) | |
| quote | |
| local el = $(esc(element)) | |
| global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : missing | |
| el | |
| end | |
| end | |
| # βββ‘ 37146eed-cabd-4024-9e91-4b8fe15139a3 | |
| # Setup | |
| begin | |
| import Pkg | |
| # careful: this is _not_ a reproducible environment | |
| # activate the global environment | |
| Pkg.activate() | |
| Pkg.add("FFTW") | |
| Pkg.add("Plots") | |
| Pkg.add("PlutoUI") | |
| using FFTW, Statistics, LinearAlgebra, PlutoUI, Plots, Random | |
| end | |
| # βββ‘ ad88469d-06d5-4ec7-868f-611e999930b4 | |
| md""" | |
| ## Nonparametric Bayes and Fourier methods in Julia | |
| ##### Moritz Schauer `@mschauer` | |
| This notebook takes a tour through non-parametric Bayesian regression in Fourier domain using Julia. | |
| ### Content: | |
| * A non-parametric regression problem | |
| * Changing perspective to Fourier domain | |
| * Defining smoothing prior and computing the posterior | |
| * Choosing the right smoothness | |
| But before some notebook setup: | |
| """ | |
| # βββ‘ 426085d1-1f7d-4186-a8dc-28ffd8fb5343 | |
| @bind resample Button("Resample observations!") | |
| # βββ‘ 0722b786-402d-4ba9-a427-8c87cbc3df24 | |
| md"Show hidden ground truth? $(@bind show_truth CheckBox())" | |
| # βββ‘ fb55fde1-7669-4f91-bc67-2a0172e5ae3e | |
| md""" | |
| ## Fourier domain and prior | |
| Taking the vector of observations ``y`` and ground truth vector ``f_i = f(x_i)``, | |
| we can apply the discrete Fourier transformation ``\mathcal F`` to our linear equation | |
| ```math | |
| \mathcal F y = \mathcal F (f + \eta) | |
| ``` | |
| In Julia, ``\mathcal F`` is `fft(x)/sqrt(length(x))`, but you can think of it as a matrix | |
| ```math | |
| \mathcal F_{kl} = \exp(2\pi\cdot i \cdot (k l/n)). | |
| ``` | |
| We obtain the equation in frequency domain with ``\tilde y = \mathcal F y``, ``\tilde f = \mathcal F f`` etc. | |
| ```math | |
| \tilde y_i = \tilde f_i + \tilde\eta_i | |
| ``` | |
| ``\mathcal F`` is an ``L_2`` isometry and maps independend and normal noise on the observations to equivalent independent and normal noise in frequency domain, | |
| so here | |
| ```math | |
| \tilde\eta_i \sim N(0, \varsigma^2) | |
| ``` | |
| """ | |
| # βββ‘ 093506f2-5a52-4217-9dfb-d673ce8b4a29 | |
| # Fourier matrix | |
| # β±(x) = [exp(2pi*im*i*j/n) for i in fftshift(-nΓ·2:nΓ·2-1), j in fftshift(-nΓ·2:nΓ·2-1)]*x | |
| β±(x) = (fft(x)/sqrt(length(x))) | |
| # βββ‘ 7f5a4d79-1e64-4c99-a588-63bcca7267fe | |
| md""" | |
| We define our prior as | |
| ```math | |
| \tilde f_i \sim N(0, \varpi_i^{-\frac{\alpha}{2}}) | |
| ``` | |
| where ``\varpi_i`` is the frequency corresponding to Fourier coordinate ``\tilde f_i``. Julia's `fftfreq` helps with that. | |
| Large values of ``\alpha`` imply higher smoothenss of prior samples. | |
| For example: | |
| * Ξ± = 3.0 for type integrated Brownian motion | |
| * Ξ± = 2.0 for type Brownian motion prior | |
| * Ξ± = 1.0 for a very rough prior (pink noise) | |
| ``\alpha`` can be translated into a Hurst parameter related to self similarity of the path under scaling of both axes | |
| ```math | |
| H = (\alpha - 1)/2 | |
| ``` | |
| Having a prior in frequency domains means we get samples from our prior for ``f`` via the inverse Fourier transform, either the matrix inverse ``{\mathcal F}^{-1}`` or `sqrt(n)*real(ifft(x))` in Julia for the real part. | |
| """ | |
| # βββ‘ 012f54e1-01a3-4acd-b7db-d5d3f3bc7d87 | |
| # Define prior on unit interval in frequency domain | |
| # by choosing power decay | |
| function freq_decay(n, Ξ±) #ifftshift(-n/2:n/2-1) | |
| fr = 1 ./ fftfreq(n) | |
| fr[1] = 2n # allow for intercept | |
| fr = abs.(fr) .^(Ξ±/2) # scale frequencies | |
| fr/norm(fr)*sqrt(n) | |
| end | |
| # βββ‘ e7c6b81f-202a-43e0-9bc0-ac5eb64060db | |
| md""" | |
| Prior power decay of prior samples in frequency domain | |
| $(@bind Ξ±1 Slider(0.5:0.5:4.0)) | |
| """ | |
| # βββ‘ 0e4ea7bf-4eb1-4923-a2e8-a7e2e8bc33a5 | |
| @bind resample_prior Button("Resample prior!") | |
| # βββ‘ 12c188f4-5042-4cc4-9de5-89ea87c08c1e | |
| md""" | |
| ## Bayesian regression in Fourier domain | |
| If we have prior, model and data ``\tilde y`` in Fourier domain, we can compute the posterior distribution of the Fourier coefficients | |
| ``\tilde f_i``. | |
| This gives a Gaussian posterior distribution for the signal in Fourier domain | |
| ```math | |
| \tilde f \sim N(\tilde \mu, \tilde \Sigma) \text{ (a posteriori) } | |
| ``` | |
| with mean ``\tilde \mu`` and covariance ``\tilde \Sigma``. | |
| With the inverse transform we estimate of ``f \approx \mathcal F^{-1} \tilde\mu`` and quantify uncertainty of the estimate as ``\Sigma = \mathcal F^{-1} \tilde \Sigma \mathcal F^{-1}``. | |
| The marginal uncertainty (diagonal entries of ``\Sigma`` ) can be computed very efficiently using `ifft` aswell. | |
| """ | |
| # βββ‘ 517c3961-c104-4811-ab4c-7e189761709b | |
| # Equivalent to | |
| # Ξ = β±*Diagonal(freq_decay(n, Ξ±).^(-2))*β±'/n/n # β ifft(Diagonal(n. + freq_decay(n, Ξ±).^(-2))) | |
| # Ο = 1.96sqrt.(real(diag(inv((I + Ξ))))) | |
| # βββ‘ c51271a7-0614-445f-87ec-a5bc2fee1768 | |
| @bind n Slider(10:10:1000) # number of observations | |
| # βββ‘ 64feb3c2-919b-4246-9fbf-e836a0c80ed0 | |
| # Design points | |
| x = range(0, 1, length=n); | |
| # βββ‘ c502c779-e776-490d-9ec7-02a3379aaa28 | |
| # Ground truth | |
| f = 3*x.*sin.(2pi*x); # periodic signal in x (time domain) | |
| # f = 6*sqrt.(abs.(x .- 0.5)) # this one is difficult to estimate | |
| # βββ‘ b7c5ee08-5d2d-4775-a232-472a033b59a1 | |
| begin | |
| resample_prior | |
| z1 = sqrt(n)*real(ifft(randn(Complex{Float64},n).*abs.(freq_decay(n, Ξ±1)))); | |
| plot(x, z1, label="prior sample") | |
| end | |
| # βββ‘ a97a39ff-059e-4622-ae2e-219d8464ecb2 | |
| @bind Ο Slider(0.5:0.5:2.0) # noise level | |
| # βββ‘ c29066a9-574d-4ce1-ba9b-abb44aeb8fa0 | |
| md""" | |
| ## A nonparametric regression problem | |
| We are interested in recovering the function ``f`` from ``n=`` $(n) noisy observations ``y_i`` at points ``x_i`` | |
| ```math | |
| y_i = f(x_i) + \eta_i, \qquad \eta_i \sim N(0,\varsigma^2), | |
| ``` | |
| where | |
| ``\varsigma =`` $Ο is the noise level. | |
| * You can choose the values for ``n`` and ``\varsigma`` with the sliders below! | |
| """ | |
| # βββ‘ a4ff81fe-a9cf-4789-aa68-5838c58c1f69 | |
| # Model: Signal distorted by white noise | |
| begin | |
| resample | |
| y = f + Ο*randn(n) | |
| end; | |
| # βββ‘ 873ba234-b4b1-4678-8c67-086cd3b149bc | |
| begin | |
| p1 = scatter(x, y, markersize=10/sqrt(n), label="obs", title="Data") | |
| show_truth && plot!(p1, x, f, color=:green, label="truth") | |
| p1 | |
| end | |
| # βββ‘ 0071252a-42cb-4967-b426-7d28ce054ba1 | |
| begin | |
| p2 = scatter(x, real.(β±(y)), markersize=10/sqrt(n), color=:black, label="obs", title="Truth and observation in frequency domain") | |
| scatter!(p2, x, imag.(β±(y)), markersize=10/sqrt(n), color=:black, alpha=0.1, label="obs (imag)", title="Truth and observation in frequency domain") | |
| plot!(p2, x, real.(β±(f)), color=:green, label="truth") | |
| plot!(p2, x, imag.(β±(f)), color=:green, alpha=0.3, label="truth (im)") | |
| p2 | |
| end | |
| # βββ‘ af6df9cd-cb2f-48de-9697-db23bb7e3804 | |
| # Prior power decay of prior samples in frequency domain | |
| @bind Ξ± Slider(0.5:0.5:2.0) | |
| # βββ‘ 5a475143-5078-4033-84cc-ec6abd3c86c4 | |
| md"Chosen prior smoothness Ξ± = $(Ξ±1), H = $(H = (Ξ± - 1)/2)" | |
| # βββ‘ 9c48ca1f-7ab3-4b72-a9bd-4b0d6c1ee8db | |
| # posterior precision in frequency domain | |
| Ξ³ = Ο^(-2) .+ (freq_decay(n, Ξ±)).^(-2); | |
| # βββ‘ fb008be0-b823-4eaf-b2c7-eb189041f0c8 | |
| # posterior mean | |
| ΞΌΜ = real(ifft(Ξ³.\(fft(Ο^(-2)*y)))) | |
| # βββ‘ e0e75144-aa8f-433e-90c5-bfa9bbdfac17 | |
| # We can even compute the posterior covariance | |
| # Use circulant property of covariance in time domain, could be done with https://github.com/JuliaMatrices/ToeplitzMatrices.jl | |
| Ο = 1.96*sqrt.(abs.((ifft(diag(inv(Diagonal(Ξ³)))))))[1] | |
| # βββ‘ 8ba316ed-2c77-4ddf-9e57-e673b4ef00ec | |
| @bind resample_post Button("New posterior sample!") | |
| # βββ‘ 23bfdb98-1001-4f49-a841-292ebe4c0246 | |
| # posterior sample | |
| begin | |
| resample_post | |
| m = ΞΌΜ + sqrt(n)*real(ifft(sqrt.(Ξ³).\randn(Complex{Float64},n))) | |
| end; | |
| # βββ‘ 6615b4b2-26fc-4a80-9576-810c7f919270 | |
| md"Show truth? $(@bind show_truth2 CheckBox())" | |
| # βββ‘ c78bd411-c9f4-4e1f-8c96-ea97480d9cd9 | |
| begin | |
| global p = scatter(x, y, markersize=10/sqrt(n), label="obs") | |
| plot!(x, ΞΌΜ, color=:blue, ribbon=Ο, fillalpha=0.2, label="posterior mean and band") | |
| show_truth2 && plot!(x, f, color=:green, alpha=0.8, label="truth") | |
| resample_post | |
| plot!(x, m, label="posterior sample") | |
| end | |
| # βββ‘ 09af6b8a-73a1-4e87-8a89-ef7ac72c72b4 | |
| md""" | |
| ## What is the right choice of ``\alpha`` | |
| That will depend on the smoothness of the truth we want to estimate. If we know that apriori ``f`` is very smooth, we prefer larger values of ``\alpha`` | |
| Dutch Bayesians (and I should count myself as one) like to think about how to choose ``\alpha`` in an optimal way. | |
| One way to reason about this, is to investigate the frequentist/asymptotic properties of Bayesian procedures (insert Galaxy brain meme). | |
| For the example one finds that the posterior puts most of it's mass near the true function ``f`` with increasing probability if ``n`` becomes large. | |
| This happens at a rate depending on ``\alpha`` as long as we match ``\alpha`` to the smoothness of the underlying truth. | |
| ```math | |
| e = \varsigma n^{-H/(1 + 2H)} | |
| ``` | |
| Let's compute our ``L_2`` estimation error and compare with theoretical size | |
| * `norm(ΞΌΜ - f)/sqrt(n)` = $(norm(ΞΌΜ - f)/sqrt(n)) | |
| * `e` = $(Ο*n^(-H/(1 + 2H))) | |
| Harry van Zanten's summer school slides are a nice starting point: | |
| * https://warwick.ac.uk/fac/sci/statistics/crism/workshops/masterclassapril/ | |
| * https://fdnss.files.wordpress.com/2016/03/vanzanten-slides.pdf | |
| """ | |
| # βββ‘ Cell order: | |
| # β βad88469d-06d5-4ec7-868f-611e999930b4 | |
| # β β37146eed-cabd-4024-9e91-4b8fe15139a3 | |
| # β βc29066a9-574d-4ce1-ba9b-abb44aeb8fa0 | |
| # β β64feb3c2-919b-4246-9fbf-e836a0c80ed0 | |
| # β βc502c779-e776-490d-9ec7-02a3379aaa28 | |
| # β βa4ff81fe-a9cf-4789-aa68-5838c58c1f69 | |
| # β β426085d1-1f7d-4186-a8dc-28ffd8fb5343 | |
| # β β0722b786-402d-4ba9-a427-8c87cbc3df24 | |
| # β β873ba234-b4b1-4678-8c67-086cd3b149bc | |
| # β βfb55fde1-7669-4f91-bc67-2a0172e5ae3e | |
| # β β093506f2-5a52-4217-9dfb-d673ce8b4a29 | |
| # β β0071252a-42cb-4967-b426-7d28ce054ba1 | |
| # β β7f5a4d79-1e64-4c99-a588-63bcca7267fe | |
| # β β012f54e1-01a3-4acd-b7db-d5d3f3bc7d87 | |
| # β βe7c6b81f-202a-43e0-9bc0-ac5eb64060db | |
| # β β5a475143-5078-4033-84cc-ec6abd3c86c4 | |
| # β β0e4ea7bf-4eb1-4923-a2e8-a7e2e8bc33a5 | |
| # β βb7c5ee08-5d2d-4775-a232-472a033b59a1 | |
| # β β12c188f4-5042-4cc4-9de5-89ea87c08c1e | |
| # β β9c48ca1f-7ab3-4b72-a9bd-4b0d6c1ee8db | |
| # β βfb008be0-b823-4eaf-b2c7-eb189041f0c8 | |
| # β β23bfdb98-1001-4f49-a841-292ebe4c0246 | |
| # β βe0e75144-aa8f-433e-90c5-bfa9bbdfac17 | |
| # β β517c3961-c104-4811-ab4c-7e189761709b | |
| # β βc51271a7-0614-445f-87ec-a5bc2fee1768 | |
| # β βa97a39ff-059e-4622-ae2e-219d8464ecb2 | |
| # β βaf6df9cd-cb2f-48de-9697-db23bb7e3804 | |
| # β β8ba316ed-2c77-4ddf-9e57-e673b4ef00ec | |
| # β β6615b4b2-26fc-4a80-9576-810c7f919270 | |
| # β βc78bd411-c9f4-4e1f-8c96-ea97480d9cd9 | |
| # β β09af6b8a-73a1-4e87-8a89-ef7ac72c72b4 |
Author
Fixed the marginal 2sigma credibility band
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.