mschauer/nonparabayes.jl

## nonparabayes.jl
n = 1000
x = range(0, 1, length=n)
ς = 1.5 # noise level

μ = 3*x.*sin.(2pi*x) # periodic signal in time domain
#μ = 6*sqrt.(abs.(x .- 0.5)) # this one is difficult to estimate

# Model: Signal distorted by white noise
y = μ + ς*randn(n)

# Prior power decay of prior samples in frequency domain
α = 2.0
# α = 3.0 for type integrated Brownian motion
# α = 2.0 for type Brownian motion prior
# α = 1.0 for a very rough prior (pink noise)

H = (α - 1)/2 # equivalent Hurst exponent
using FFTW, Statistics, LinearAlgebra, Plots


# Fourier matrix
# ℱ = [exp(2pi*im*i*j/n) for i in fftshift(-n÷2:n÷2-1), j in fftshift(-n÷2:n÷2-1)]

# Define prior on unit interval in frequency domain
# by choosing power decay
function freq_decay(n, α)  #ifftshift(-n/2:n/2-1)
    fr = 1 ./ fftfreq(n)
    fr[1] = 2n # allow for intercept
    fr = abs.(fr) .^(α/2) # scale frequencies to "unit
    fr/norm(fr)*sqrt(n)
end

# Canonical posterior contraction rate
contract(n, H) =  n^(-H/(1 + 2H))


# Compute posterior mean
γ = ς^(-2) .+ (freq_decay(n, α)).^(-2) # posterior precision in frequency domain
μ̂ = real(ifft(γ.\(fft(ς^(-2)*y))))
m = μ̂ + sqrt(n)*real(ifft(sqrt.(γ).\randn(Complex{Float64},n)));
# note: `fft(x)/sqrt(length(x))` is isometric


# Compute L2 estimation error, compare with theoretical size
@show norm(μ̂ - μ)/sqrt(n),  ς*contract(n, H)

# We can even compute the posterior covariance


# Use circulant property of covariance in time domain, could be done with https://github.com/JuliaMatrices/ToeplitzMatrices.jl
σ = 1.96*sqrt.(abs.((ifft(diag(inv(Diagonal(γ)))))))[1]

# Equivalent to
# Γ = ℱ*Diagonal(freq_decay(n, α).^(-2))*ℱ'/n/n # ≈ ifft(Diagonal(n. + freq_decay(n, α).^(-2)))
# σ = 1.96sqrt.(real(diag(inv((I + Γ)))))

# Plot
p = scatter(x, y, markersize=0.5, label="obs")
plot!(x, μ̂, color=:blue, ribbon=σ,fillalpha=0.2, label="posterior mean")
plot!(x, μ, color=:green, label="truth")
z = sqrt(n)*real(ifft(randn(Complex{Float64},n).*abs.(freq_decay(n, α))));
plot!(x, z, label="prior sample")
plot!(x, m, label="posterior sample")


p

## plutonotebook.jl
### A Pluto.jl notebook ###
# v0.16.1

using Markdown
using InteractiveUtils

# This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error).
macro bind(def, element)
    quote
        local el = $(esc(element))
        global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : missing
        el
    end
end

# ╔═╡ 37146eed-cabd-4024-9e91-4b8fe15139a3
# Setup
begin
    import Pkg
    # careful: this is _not_ a reproducible environment
    # activate the global environment
    Pkg.activate()
	Pkg.add("FFTW")
	Pkg.add("Plots")
	Pkg.add("PlutoUI")
	using FFTW, Statistics, LinearAlgebra, PlutoUI, Plots, Random
end

# ╔═╡ ad88469d-06d5-4ec7-868f-611e999930b4
md"""
## Nonparametric Bayes and Fourier methods in Julia
##### Moritz Schauer `@mschauer`

This notebook takes a tour through non-parametric Bayesian regression in Fourier domain using Julia.


### Content:


* A non-parametric regression problem

* Changing perspective to Fourier domain

* Defining smoothing prior and computing the posterior

* Choosing the right smoothness


But before some notebook setup:
"""

# ╔═╡ 426085d1-1f7d-4186-a8dc-28ffd8fb5343
@bind resample Button("Resample observations!")

# ╔═╡ 0722b786-402d-4ba9-a427-8c87cbc3df24
md"Show hidden ground truth? $(@bind show_truth CheckBox())"

# ╔═╡ fb55fde1-7669-4f91-bc67-2a0172e5ae3e
md"""
## Fourier domain and prior

Taking the vector of observations ``y`` and ground truth vector ``f_i = f(x_i)``,
we can apply the discrete Fourier transformation ``\mathcal F`` to our linear equation
```math
\mathcal F y = \mathcal F (f +  \eta)
```
In Julia, ``\mathcal F`` is `fft(x)/sqrt(length(x))`, but you can think of it as a matrix
```math
\mathcal F_{kl} = \exp(2\pi\cdot i \cdot (k l/n)).
```

We obtain the equation in frequency domain with ``\tilde y = \mathcal F y``, ``\tilde f = \mathcal F f`` etc.
```math
\tilde y_i = \tilde f_i + \tilde\eta_i
```

``\mathcal F`` is an ``L_2`` isometry and maps independend and normal noise  on the observations to equivalent independent and normal noise in frequency domain,
so here
```math
\tilde\eta_i \sim N(0, \varsigma^2)
```

"""

# ╔═╡ 093506f2-5a52-4217-9dfb-d673ce8b4a29
# Fourier matrix
# ℱ(x) = [exp(2pi*im*i*j/n) for i in fftshift(-n÷2:n÷2-1), j in fftshift(-n÷2:n÷2-1)]*x
ℱ(x) = (fft(x)/sqrt(length(x)))

# ╔═╡ 7f5a4d79-1e64-4c99-a588-63bcca7267fe
md"""

We define our prior as

```math
\tilde f_i \sim N(0, \varpi_i^{-\frac{\alpha}{2}})
```
where ``\varpi_i`` is the frequency corresponding to Fourier coordinate ``\tilde f_i``. Julia's `fftfreq` helps with that.

Large values of ``\alpha`` imply higher smoothenss of prior samples.


For example:

* α = 3.0 for type integrated Brownian motion
* α = 2.0 for type Brownian motion prior
* α = 1.0 for a very rough prior (pink noise)

``\alpha`` can be translated into a Hurst parameter related to self similarity of the path under scaling of both axes
```math
H = (\alpha - 1)/2
```

Having a prior in frequency domains means we get samples from our prior for ``f`` via the inverse Fourier transform, either the matrix inverse ``{\mathcal F}^{-1}`` or `sqrt(n)*real(ifft(x))` in Julia for the real part.


"""

# ╔═╡ 012f54e1-01a3-4acd-b7db-d5d3f3bc7d87
# Define prior on unit interval in frequency domain
# by choosing power decay
function freq_decay(n, α)  #ifftshift(-n/2:n/2-1)
    fr = 1 ./ fftfreq(n)
    fr[1] = 2n # allow for intercept
    fr = abs.(fr) .^(α/2) # scale frequencies
    fr/norm(fr)*sqrt(n)
end

# ╔═╡ e7c6b81f-202a-43e0-9bc0-ac5eb64060db
md"""
Prior power decay of prior samples in frequency domain
$(@bind α1 Slider(0.5:0.5:4.0))
"""

# ╔═╡ 0e4ea7bf-4eb1-4923-a2e8-a7e2e8bc33a5
@bind resample_prior Button("Resample prior!")

# ╔═╡ 12c188f4-5042-4cc4-9de5-89ea87c08c1e
md"""
## Bayesian regression in Fourier domain

If we have prior, model and data ``\tilde y`` in Fourier domain, we can compute the posterior distribution of the Fourier coefficients
``\tilde f_i``.

This gives a Gaussian posterior distribution for the signal in Fourier domain
```math
\tilde f \sim N(\tilde \mu, \tilde \Sigma) \text{ (a posteriori) }
```
with mean ``\tilde \mu`` and covariance ``\tilde \Sigma``.

With the inverse transform we estimate of ``f \approx \mathcal F^{-1} \tilde\mu`` and quantify uncertainty of the estimate as ``\Sigma = \mathcal F^{-1} \tilde \Sigma \mathcal F^{-1}``.

The marginal uncertainty (diagonal entries of ``\Sigma`` ) can be computed very efficiently using `ifft` aswell.

"""

# ╔═╡ 517c3961-c104-4811-ab4c-7e189761709b
# Equivalent to
# Γ = ℱ*Diagonal(freq_decay(n, α).^(-2))*ℱ'/n/n # ≈ ifft(Diagonal(n. + freq_decay(n, α).^(-2)))
# σ = 1.96sqrt.(real(diag(inv((I + Γ)))))

# ╔═╡ c51271a7-0614-445f-87ec-a5bc2fee1768
@bind n Slider(10:10:1000) # number of observations

# ╔═╡ 64feb3c2-919b-4246-9fbf-e836a0c80ed0
# Design points
x = range(0, 1, length=n);

# ╔═╡ c502c779-e776-490d-9ec7-02a3379aaa28
# Ground truth
f = 3*x.*sin.(2pi*x); # periodic signal in x (time domain)
# f = 6*sqrt.(abs.(x .- 0.5)) # this one is difficult to estimate

# ╔═╡ b7c5ee08-5d2d-4775-a232-472a033b59a1
begin
	resample_prior
	z1 = sqrt(n)*real(ifft(randn(Complex{Float64},n).*abs.(freq_decay(n, α1))));
	plot(x, z1, label="prior sample")
end

# ╔═╡ a97a39ff-059e-4622-ae2e-219d8464ecb2
@bind ς Slider(0.5:0.5:2.0)  # noise level

# ╔═╡ c29066a9-574d-4ce1-ba9b-abb44aeb8fa0
md"""
## A nonparametric regression problem

We are interested in recovering the function ``f`` from ``n=`` $(n) noisy observations ``y_i`` at points ``x_i``

```math
y_i = f(x_i) + \eta_i, \qquad \eta_i \sim N(0,\varsigma^2),
```
where
``\varsigma =`` $ς is the noise level.

* You can choose the values for ``n`` and ``\varsigma`` with the sliders below!

"""

# ╔═╡ a4ff81fe-a9cf-4789-aa68-5838c58c1f69
# Model: Signal distorted by white noise
begin
	resample
	y = f + ς*randn(n)
end;

# ╔═╡ 873ba234-b4b1-4678-8c67-086cd3b149bc
begin
	p1 = scatter(x, y, markersize=10/sqrt(n), label="obs", title="Data")
	show_truth && plot!(p1, x, f, color=:green, label="truth")
	p1
end

# ╔═╡ 0071252a-42cb-4967-b426-7d28ce054ba1
begin
	p2 = scatter(x, real.(ℱ(y)), markersize=10/sqrt(n), color=:black, label="obs", title="Truth and observation in frequency domain")
	scatter!(p2, x, imag.(ℱ(y)), markersize=10/sqrt(n), color=:black, alpha=0.1, label="obs (imag)", title="Truth and observation in frequency domain")
	plot!(p2, x, real.(ℱ(f)), color=:green, label="truth")
	plot!(p2, x, imag.(ℱ(f)), color=:green, alpha=0.3, label="truth (im)")
	p2
end

# ╔═╡ af6df9cd-cb2f-48de-9697-db23bb7e3804
# Prior power decay of prior samples in frequency domain
@bind α Slider(0.5:0.5:2.0)

# ╔═╡ 5a475143-5078-4033-84cc-ec6abd3c86c4
md"Chosen prior smoothness α = $(α1), H = $(H = (α - 1)/2)"

# ╔═╡ 9c48ca1f-7ab3-4b72-a9bd-4b0d6c1ee8db
# posterior precision in frequency domain
γ = ς^(-2) .+ (freq_decay(n, α)).^(-2);

# ╔═╡ fb008be0-b823-4eaf-b2c7-eb189041f0c8
# posterior mean
μ̂ = real(ifft(γ.\(fft(ς^(-2)*y))))

# ╔═╡ e0e75144-aa8f-433e-90c5-bfa9bbdfac17
# We can even compute the posterior covariance

# Use circulant property of covariance in time domain, could be done with https://github.com/JuliaMatrices/ToeplitzMatrices.jl
σ = 1.96*sqrt.(abs.((ifft(diag(inv(Diagonal(γ)))))))[1]

# ╔═╡ 8ba316ed-2c77-4ddf-9e57-e673b4ef00ec
@bind resample_post Button("New posterior sample!")

# ╔═╡ 23bfdb98-1001-4f49-a841-292ebe4c0246
# posterior sample
begin
	resample_post
    m = μ̂ + sqrt(n)*real(ifft(sqrt.(γ).\randn(Complex{Float64},n)))
end;

# ╔═╡ 6615b4b2-26fc-4a80-9576-810c7f919270
md"Show truth? $(@bind show_truth2 CheckBox())"

# ╔═╡ c78bd411-c9f4-4e1f-8c96-ea97480d9cd9
begin
	global p = scatter(x, y, markersize=10/sqrt(n), label="obs")
	plot!(x, μ̂, color=:blue, ribbon=σ, fillalpha=0.2, label="posterior mean and band")
	show_truth2 && plot!(x, f, color=:green, alpha=0.8, label="truth")
    resample_post
	plot!(x, m, label="posterior sample")
end

# ╔═╡ 09af6b8a-73a1-4e87-8a89-ef7ac72c72b4
md"""

## What is the right choice of ``\alpha``

That will depend on the smoothness of the truth we want to estimate. If we know that apriori ``f`` is very smooth, we prefer larger values of ``\alpha``

Dutch Bayesians (and I should count myself as one) like to think about how to choose ``\alpha`` in an optimal way.

One way to reason about this, is to investigate the frequentist/asymptotic properties of Bayesian procedures (insert Galaxy brain meme).

For the example one finds that the posterior puts most of it's mass near the true function ``f`` with increasing probability if ``n`` becomes large.

This happens at a rate depending on ``\alpha`` as long as we match ``\alpha`` to the smoothness of the underlying truth.

```math
e = \varsigma n^{-H/(1 + 2H)}
```

Let's compute our ``L_2`` estimation error and compare with theoretical size

* `norm(μ̂ - f)/sqrt(n)` = $(norm(μ̂ - f)/sqrt(n))
* `e` =  $(ς*n^(-H/(1 + 2H)))

Harry van Zanten's summer school slides are a nice starting point:

* https://warwick.ac.uk/fac/sci/statistics/crism/workshops/masterclassapril/
* https://fdnss.files.wordpress.com/2016/03/vanzanten-slides.pdf


"""


# ╔═╡ Cell order:
# ╠═ad88469d-06d5-4ec7-868f-611e999930b4
# ╠═37146eed-cabd-4024-9e91-4b8fe15139a3
# ╠═c29066a9-574d-4ce1-ba9b-abb44aeb8fa0
# ╠═64feb3c2-919b-4246-9fbf-e836a0c80ed0
# ╠═c502c779-e776-490d-9ec7-02a3379aaa28
# ╠═a4ff81fe-a9cf-4789-aa68-5838c58c1f69
# ╠═426085d1-1f7d-4186-a8dc-28ffd8fb5343
# ╠═0722b786-402d-4ba9-a427-8c87cbc3df24
# ╠═873ba234-b4b1-4678-8c67-086cd3b149bc
# ╠═fb55fde1-7669-4f91-bc67-2a0172e5ae3e
# ╠═093506f2-5a52-4217-9dfb-d673ce8b4a29
# ╠═0071252a-42cb-4967-b426-7d28ce054ba1
# ╠═7f5a4d79-1e64-4c99-a588-63bcca7267fe
# ╠═012f54e1-01a3-4acd-b7db-d5d3f3bc7d87
# ╠═e7c6b81f-202a-43e0-9bc0-ac5eb64060db
# ╠═5a475143-5078-4033-84cc-ec6abd3c86c4
# ╠═0e4ea7bf-4eb1-4923-a2e8-a7e2e8bc33a5
# ╠═b7c5ee08-5d2d-4775-a232-472a033b59a1
# ╠═12c188f4-5042-4cc4-9de5-89ea87c08c1e
# ╠═9c48ca1f-7ab3-4b72-a9bd-4b0d6c1ee8db
# ╠═fb008be0-b823-4eaf-b2c7-eb189041f0c8
# ╠═23bfdb98-1001-4f49-a841-292ebe4c0246
# ╠═e0e75144-aa8f-433e-90c5-bfa9bbdfac17
# ╠═517c3961-c104-4811-ab4c-7e189761709b
# ╠═c51271a7-0614-445f-87ec-a5bc2fee1768
# ╠═a97a39ff-059e-4622-ae2e-219d8464ecb2
# ╠═af6df9cd-cb2f-48de-9697-db23bb7e3804
# ╠═8ba316ed-2c77-4ddf-9e57-e673b4ef00ec
# ╠═6615b4b2-26fc-4a80-9576-810c7f919270
# ╠═c78bd411-c9f4-4e1f-8c96-ea97480d9cd9
# ╠═09af6b8a-73a1-4e87-8a89-ef7ac72c72b4
	n = 1000
	x = range(0, 1, length=n)
	ς = 1.5 # noise level

	μ = 3x.sin.(2pi*x) # periodic signal in time domain
	#μ = 6*sqrt.(abs.(x .- 0.5)) # this one is difficult to estimate

	# Model: Signal distorted by white noise
	y = μ + ς*randn(n)

	# Prior power decay of prior samples in frequency domain
	α = 2.0
	# α = 3.0 for type integrated Brownian motion
	# α = 2.0 for type Brownian motion prior
	# α = 1.0 for a very rough prior (pink noise)

	H = (α - 1)/2 # equivalent Hurst exponent
	using FFTW, Statistics, LinearAlgebra, Plots


	# Fourier matrix
	# ℱ = [exp(2piimi*j/n) for i in fftshift(-n÷2:n÷2-1), j in fftshift(-n÷2:n÷2-1)]

	# Define prior on unit interval in frequency domain
	# by choosing power decay
	function freq_decay(n, α) #ifftshift(-n/2:n/2-1)
	fr = 1 ./ fftfreq(n)
	fr[1] = 2n # allow for intercept
	fr = abs.(fr) .^(α/2) # scale frequencies to "unit
	fr/norm(fr)*sqrt(n)
	end

	# Canonical posterior contraction rate
	contract(n, H) = n^(-H/(1 + 2H))


	# Compute posterior mean
	γ = ς^(-2) .+ (freq_decay(n, α)).^(-2) # posterior precision in frequency domain
	μ̂ = real(ifft(γ.\(fft(ς^(-2)*y))))
	m = μ̂ + sqrt(n)*real(ifft(sqrt.(γ).\randn(Complex{Float64},n)));
	# note: `fft(x)/sqrt(length(x))` is isometric


	# Compute L2 estimation error, compare with theoretical size
	@show norm(μ̂ - μ)/sqrt(n), ς*contract(n, H)

	# We can even compute the posterior covariance



	# Use circulant property of covariance in time domain, could be done with https://github.com/JuliaMatrices/ToeplitzMatrices.jl
	σ = 1.96*sqrt.(abs.((ifft(diag(inv(Diagonal(γ)))))))[1]

	# Equivalent to
	# Γ = ℱDiagonal(freq_decay(n, α).^(-2))ℱ'/n/n # ≈ ifft(Diagonal(n. + freq_decay(n, α).^(-2)))
	# σ = 1.96sqrt.(real(diag(inv((I + Γ)))))

	# Plot
	p = scatter(x, y, markersize=0.5, label="obs")
	plot!(x, μ̂, color=:blue, ribbon=σ,fillalpha=0.2, label="posterior mean")
	plot!(x, μ, color=:green, label="truth")
	z = sqrt(n)real(ifft(randn(Complex{Float64},n).abs.(freq_decay(n, α))));
	plot!(x, z, label="prior sample")
	plot!(x, m, label="posterior sample")


	p
	### A Pluto.jl notebook ###
	# v0.16.1

	using Markdown
	using InteractiveUtils

	# This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error).
	macro bind(def, element)
	quote
	local el = $(esc(element))
	global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : missing
	el
	end
	end

	# ╔═╡ 37146eed-cabd-4024-9e91-4b8fe15139a3
	# Setup
	begin
	import Pkg
	# careful: this is _not_ a reproducible environment
	# activate the global environment
	Pkg.activate()
	Pkg.add("FFTW")
	Pkg.add("Plots")
	Pkg.add("PlutoUI")
	using FFTW, Statistics, LinearAlgebra, PlutoUI, Plots, Random
	end

	# ╔═╡ ad88469d-06d5-4ec7-868f-611e999930b4
	md"""
	## Nonparametric Bayes and Fourier methods in Julia
	##### Moritz Schauer `@mschauer`

	This notebook takes a tour through non-parametric Bayesian regression in Fourier domain using Julia.


	### Content:



	* A non-parametric regression problem

	* Changing perspective to Fourier domain

	* Defining smoothing prior and computing the posterior

	* Choosing the right smoothness


	But before some notebook setup:
	"""

	# ╔═╡ 426085d1-1f7d-4186-a8dc-28ffd8fb5343
	@bind resample Button("Resample observations!")

	# ╔═╡ 0722b786-402d-4ba9-a427-8c87cbc3df24
	md"Show hidden ground truth? $(@bind show_truth CheckBox())"

	# ╔═╡ fb55fde1-7669-4f91-bc67-2a0172e5ae3e
	md"""
	## Fourier domain and prior

	Taking the vector of observations ``y`` and ground truth vector ``f_i = f(x_i)``,
	we can apply the discrete Fourier transformation ``\mathcal F`` to our linear equation
	```math
	\mathcal F y = \mathcal F (f + \eta)
	```
	In Julia, ``\mathcal F`` is `fft(x)/sqrt(length(x))`, but you can think of it as a matrix
	```math
	\mathcal F_{kl} = \exp(2\pi\cdot i \cdot (k l/n)).
	```

	We obtain the equation in frequency domain with ``\tilde y = \mathcal F y``, ``\tilde f = \mathcal F f`` etc.
	```math
	\tilde y_i = \tilde f_i + \tilde\eta_i
	```

	``\mathcal F`` is an ``L_2`` isometry and maps independend and normal noise on the observations to equivalent independent and normal noise in frequency domain,
	so here
	```math
	\tilde\eta_i \sim N(0, \varsigma^2)
	```

	"""

	# ╔═╡ 093506f2-5a52-4217-9dfb-d673ce8b4a29
	# Fourier matrix
	# ℱ(x) = [exp(2piimij/n) for i in fftshift(-n÷2:n÷2-1), j in fftshift(-n÷2:n÷2-1)]x
	ℱ(x) = (fft(x)/sqrt(length(x)))

	# ╔═╡ 7f5a4d79-1e64-4c99-a588-63bcca7267fe
	md"""

	We define our prior as

	```math
	\tilde f_i \sim N(0, \varpi_i^{-\frac{\alpha}{2}})
	```
	where ``\varpi_i`` is the frequency corresponding to Fourier coordinate ``\tilde f_i``. Julia's `fftfreq` helps with that.

	Large values of ``\alpha`` imply higher smoothenss of prior samples.


	For example:

	* α = 3.0 for type integrated Brownian motion
	* α = 2.0 for type Brownian motion prior
	* α = 1.0 for a very rough prior (pink noise)

	``\alpha`` can be translated into a Hurst parameter related to self similarity of the path under scaling of both axes
	```math
	H = (\alpha - 1)/2
	```

	Having a prior in frequency domains means we get samples from our prior for ``f`` via the inverse Fourier transform, either the matrix inverse ``{\mathcal F}^{-1}`` or `sqrt(n)*real(ifft(x))` in Julia for the real part.


	"""

	# ╔═╡ 012f54e1-01a3-4acd-b7db-d5d3f3bc7d87
	# Define prior on unit interval in frequency domain
	# by choosing power decay
	function freq_decay(n, α) #ifftshift(-n/2:n/2-1)
	fr = 1 ./ fftfreq(n)
	fr[1] = 2n # allow for intercept
	fr = abs.(fr) .^(α/2) # scale frequencies
	fr/norm(fr)*sqrt(n)
	end

	# ╔═╡ e7c6b81f-202a-43e0-9bc0-ac5eb64060db
	md"""
	Prior power decay of prior samples in frequency domain
	$(@bind α1 Slider(0.5:0.5:4.0))
	"""

	# ╔═╡ 0e4ea7bf-4eb1-4923-a2e8-a7e2e8bc33a5
	@bind resample_prior Button("Resample prior!")

	# ╔═╡ 12c188f4-5042-4cc4-9de5-89ea87c08c1e
	md"""
	## Bayesian regression in Fourier domain

	If we have prior, model and data ``\tilde y`` in Fourier domain, we can compute the posterior distribution of the Fourier coefficients
	``\tilde f_i``.

	This gives a Gaussian posterior distribution for the signal in Fourier domain
	```math
	\tilde f \sim N(\tilde \mu, \tilde \Sigma) \text{ (a posteriori) }
	```
	with mean ``\tilde \mu`` and covariance ``\tilde \Sigma``.

	With the inverse transform we estimate of ``f \approx \mathcal F^{-1} \tilde\mu`` and quantify uncertainty of the estimate as ``\Sigma = \mathcal F^{-1} \tilde \Sigma \mathcal F^{-1}``.

	The marginal uncertainty (diagonal entries of ``\Sigma`` ) can be computed very efficiently using `ifft` aswell.

	"""

	# ╔═╡ 517c3961-c104-4811-ab4c-7e189761709b
	# Equivalent to
	# Γ = ℱDiagonal(freq_decay(n, α).^(-2))ℱ'/n/n # ≈ ifft(Diagonal(n. + freq_decay(n, α).^(-2)))
	# σ = 1.96sqrt.(real(diag(inv((I + Γ)))))

	# ╔═╡ c51271a7-0614-445f-87ec-a5bc2fee1768
	@bind n Slider(10:10:1000) # number of observations

	# ╔═╡ 64feb3c2-919b-4246-9fbf-e836a0c80ed0
	# Design points
	x = range(0, 1, length=n);

	# ╔═╡ c502c779-e776-490d-9ec7-02a3379aaa28
	# Ground truth
	f = 3x.sin.(2pi*x); # periodic signal in x (time domain)
	# f = 6*sqrt.(abs.(x .- 0.5)) # this one is difficult to estimate

	# ╔═╡ b7c5ee08-5d2d-4775-a232-472a033b59a1
	begin
	resample_prior
	z1 = sqrt(n)real(ifft(randn(Complex{Float64},n).abs.(freq_decay(n, α1))));
	plot(x, z1, label="prior sample")
	end

	# ╔═╡ a97a39ff-059e-4622-ae2e-219d8464ecb2
	@bind ς Slider(0.5:0.5:2.0) # noise level

	# ╔═╡ c29066a9-574d-4ce1-ba9b-abb44aeb8fa0
	md"""
	## A nonparametric regression problem

	We are interested in recovering the function ``f`` from ``n=`` $(n) noisy observations ``y_i`` at points ``x_i``

	```math
	y_i = f(x_i) + \eta_i, \qquad \eta_i \sim N(0,\varsigma^2),
	```
	where
	``\varsigma =`` $ς is the noise level.

	* You can choose the values for ``n`` and ``\varsigma`` with the sliders below!

	"""

	# ╔═╡ a4ff81fe-a9cf-4789-aa68-5838c58c1f69
	# Model: Signal distorted by white noise
	begin
	resample
	y = f + ς*randn(n)
	end;

	# ╔═╡ 873ba234-b4b1-4678-8c67-086cd3b149bc
	begin
	p1 = scatter(x, y, markersize=10/sqrt(n), label="obs", title="Data")
	show_truth && plot!(p1, x, f, color=:green, label="truth")
	p1
	end

	# ╔═╡ 0071252a-42cb-4967-b426-7d28ce054ba1
	begin
	p2 = scatter(x, real.(ℱ(y)), markersize=10/sqrt(n), color=:black, label="obs", title="Truth and observation in frequency domain")
	scatter!(p2, x, imag.(ℱ(y)), markersize=10/sqrt(n), color=:black, alpha=0.1, label="obs (imag)", title="Truth and observation in frequency domain")
	plot!(p2, x, real.(ℱ(f)), color=:green, label="truth")
	plot!(p2, x, imag.(ℱ(f)), color=:green, alpha=0.3, label="truth (im)")
	p2
	end

	# ╔═╡ af6df9cd-cb2f-48de-9697-db23bb7e3804
	# Prior power decay of prior samples in frequency domain
	@bind α Slider(0.5:0.5:2.0)

	# ╔═╡ 5a475143-5078-4033-84cc-ec6abd3c86c4
	md"Chosen prior smoothness α = $(α1), H = $(H = (α - 1)/2)"

	# ╔═╡ 9c48ca1f-7ab3-4b72-a9bd-4b0d6c1ee8db
	# posterior precision in frequency domain
	γ = ς^(-2) .+ (freq_decay(n, α)).^(-2);

	# ╔═╡ fb008be0-b823-4eaf-b2c7-eb189041f0c8
	# posterior mean
	μ̂ = real(ifft(γ.\(fft(ς^(-2)*y))))

	# ╔═╡ e0e75144-aa8f-433e-90c5-bfa9bbdfac17
	# We can even compute the posterior covariance

	# Use circulant property of covariance in time domain, could be done with https://github.com/JuliaMatrices/ToeplitzMatrices.jl
	σ = 1.96*sqrt.(abs.((ifft(diag(inv(Diagonal(γ)))))))[1]

	# ╔═╡ 8ba316ed-2c77-4ddf-9e57-e673b4ef00ec
	@bind resample_post Button("New posterior sample!")

	# ╔═╡ 23bfdb98-1001-4f49-a841-292ebe4c0246
	# posterior sample
	begin
	resample_post
	m = μ̂ + sqrt(n)*real(ifft(sqrt.(γ).\randn(Complex{Float64},n)))
	end;

	# ╔═╡ 6615b4b2-26fc-4a80-9576-810c7f919270
	md"Show truth? $(@bind show_truth2 CheckBox())"

	# ╔═╡ c78bd411-c9f4-4e1f-8c96-ea97480d9cd9
	begin
	global p = scatter(x, y, markersize=10/sqrt(n), label="obs")
	plot!(x, μ̂, color=:blue, ribbon=σ, fillalpha=0.2, label="posterior mean and band")
	show_truth2 && plot!(x, f, color=:green, alpha=0.8, label="truth")
	resample_post
	plot!(x, m, label="posterior sample")
	end

	# ╔═╡ 09af6b8a-73a1-4e87-8a89-ef7ac72c72b4
	md"""

	## What is the right choice of ``\alpha``

	That will depend on the smoothness of the truth we want to estimate. If we know that apriori ``f`` is very smooth, we prefer larger values of ``\alpha``

	Dutch Bayesians (and I should count myself as one) like to think about how to choose ``\alpha`` in an optimal way.

	One way to reason about this, is to investigate the frequentist/asymptotic properties of Bayesian procedures (insert Galaxy brain meme).

	For the example one finds that the posterior puts most of it's mass near the true function ``f`` with increasing probability if ``n`` becomes large.

	This happens at a rate depending on ``\alpha`` as long as we match ``\alpha`` to the smoothness of the underlying truth.

	```math
	e = \varsigma n^{-H/(1 + 2H)}
	```

	Let's compute our ``L_2`` estimation error and compare with theoretical size

	* `norm(μ̂ - f)/sqrt(n)` = $(norm(μ̂ - f)/sqrt(n))
	* `e` = $(ς*n^(-H/(1 + 2H)))

	Harry van Zanten's summer school slides are a nice starting point:

	* https://warwick.ac.uk/fac/sci/statistics/crism/workshops/masterclassapril/
	* https://fdnss.files.wordpress.com/2016/03/vanzanten-slides.pdf


	"""



	# ╔═╡ Cell order:
	# ╠═ad88469d-06d5-4ec7-868f-611e999930b4
	# ╠═37146eed-cabd-4024-9e91-4b8fe15139a3
	# ╠═c29066a9-574d-4ce1-ba9b-abb44aeb8fa0
	# ╠═64feb3c2-919b-4246-9fbf-e836a0c80ed0
	# ╠═c502c779-e776-490d-9ec7-02a3379aaa28
	# ╠═a4ff81fe-a9cf-4789-aa68-5838c58c1f69
	# ╠═426085d1-1f7d-4186-a8dc-28ffd8fb5343
	# ╠═0722b786-402d-4ba9-a427-8c87cbc3df24
	# ╠═873ba234-b4b1-4678-8c67-086cd3b149bc
	# ╠═fb55fde1-7669-4f91-bc67-2a0172e5ae3e
	# ╠═093506f2-5a52-4217-9dfb-d673ce8b4a29
	# ╠═0071252a-42cb-4967-b426-7d28ce054ba1
	# ╠═7f5a4d79-1e64-4c99-a588-63bcca7267fe
	# ╠═012f54e1-01a3-4acd-b7db-d5d3f3bc7d87
	# ╠═e7c6b81f-202a-43e0-9bc0-ac5eb64060db
	# ╠═5a475143-5078-4033-84cc-ec6abd3c86c4
	# ╠═0e4ea7bf-4eb1-4923-a2e8-a7e2e8bc33a5
	# ╠═b7c5ee08-5d2d-4775-a232-472a033b59a1
	# ╠═12c188f4-5042-4cc4-9de5-89ea87c08c1e
	# ╠═9c48ca1f-7ab3-4b72-a9bd-4b0d6c1ee8db
	# ╠═fb008be0-b823-4eaf-b2c7-eb189041f0c8
	# ╠═23bfdb98-1001-4f49-a841-292ebe4c0246
	# ╠═e0e75144-aa8f-433e-90c5-bfa9bbdfac17
	# ╠═517c3961-c104-4811-ab4c-7e189761709b
	# ╠═c51271a7-0614-445f-87ec-a5bc2fee1768
	# ╠═a97a39ff-059e-4622-ae2e-219d8464ecb2
	# ╠═af6df9cd-cb2f-48de-9697-db23bb7e3804
	# ╠═8ba316ed-2c77-4ddf-9e57-e673b4ef00ec
	# ╠═6615b4b2-26fc-4a80-9576-810c7f919270
	# ╠═c78bd411-c9f4-4e1f-8c96-ea97480d9cd9
	# ╠═09af6b8a-73a1-4e87-8a89-ef7ac72c72b4