NonWhite/IA_Lista_2.pdf

## example.png

      
    Raw
  

              example.png
            
          
## grid.png

      
    Raw
  

              grid.png
            
          
## IA_Lista_2.pdf

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              IA_Lista_2.pdf
            
          
            View raw
        
    
## IA_Lista_2.tex
\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[portuguese]{babel}
\usepackage{listings}
\usepackage{subfigure}

\lstset{
	basicstyle=\footnotesize,
	breaklines=true,
	language=Python,
	numbers=left,
	numbersep=-20pt,
	stepnumber=1,
	tabsize=2,
	showstringspaces=false,
	escapechar=|
}

\usepackage{tikz}
\usetikzlibrary{arrows}

\title{Lista 2 de Inteligência Artificial}
\author{9410313 - Walter Perez Urcia}

\begin{document}

\maketitle

\section{Quantificando as incertezas}
	\subsection{Declaração}
		Após seu \emph{checkup} médico anual, o doutor lhe informa que tem uma boa e uma má notícia. A má notícia é que seu exame deu positivo para uma doença grave e que o exame tem acurácia de 99\%, isto é, a probabilidade do teste dar positivo dado que se tem a doença é 0.99, da mesma forma que a probabilidade do resultado ser negativo dado que não se tem a doença é 0.99. A boa  notícia é que se trata de uma doença rara, que ocorre somente em 1 caso a cada 10.000 pessoas de sua idade.\\
		\textbf{Questões:}
		\begin{itemize}
			\item Por qué a doença ser rara é uma boa notícia?
			\item Quais são as chances que você de fato tenha a doença?
		\end{itemize}
	\subsection{Solução a}
		Definimos as variáveis booleanas:
		\begin{itemize}
			\item $T$: Teste deu positivo ($T = 1$) ou negativo ($T = 0$)
			\item $D$: A pessoa tem a doença ($D = 1$) ou não tem ($D = 0$)
		\end{itemize}
		Então traduzindo o enunciado em probabilidades temos:
		\begin{itemize}
			\item $P( T = 1  \mid D = 1 ) = 0.99$
			\item $P( T = 0 \mid D = 0 ) = 0.99$
			\item $P( D = 1 ) = X = 0.0001$
		\end{itemize}
		Podemos calcular a probabilidade do teste dar positivo para qualquer pessoa:
		\begin{itemize}
			\item $P( T = 1 ) = P( T = 1 \mid D = 0 ) P( D = 0 ) + P( T = 1 \mid D = 1 ) P( D = 1 )$
			\item $P( T = 1 ) = 0.01 ( 1 - X ) + 0.99X$
			\item $P( T = 1 ) = 0.01 - 0.01X + 0.99X$
			\item $P( T = 1 ) = 0.01 + 0.98X$
			\item $P( T = 1 ) = 0.010098$
		\end{itemize}
		No penúltimo passo, podemos ver que se a doença não fosse rara, a probabilidade que o teste de positivo poderia ser até 99\%. Portanto, é uma boa notícia que a doença seja rara porque diminui aquela probabilidade até 1\%.

	\subsection{Solução b}
		As chances de ter a doença dado que o teste deu positivo são:
		\begin{itemize}
			\item $P( D = 1 \mid T = 1 ) = \frac{P( T = 1 \mid D = 1 ) P( D = 1 )}{P( T = 1 )}$
			\item $P( D = 1 \mid T = 1 ) = \frac{0.99 \times 0.0001}{0.010098}$
			\item $P( D = 1 \mid T = 1 ) = 0.00980392156863$
		\end{itemize}

\section{Raciocínio Probabilístico}
	\subsection{Declaração}
		Em uma usina nuclear, existe um alarme que dispara quando a temperatura de um sensor excede um determinado limiar. O sensor mede a temperatura no núcleo da planta. Considere as variáveis booleanas $A$ (alarme dispara), $F_A$ (alarme está com defeito) e $F_G$ (sensor está com defeito) e as variáveis multivaloradas $G$ (leitura do sensor de temperatura) e $T$ (temperatura real do núcleo).\\
		\textbf{Questões:}
		\begin{itemize}
			\item Desenhe uma rede Bayesiana para esse domínio (utilizando as variáveis descritas anteriormente), dado que se torna mais provável que o sensor esteja com defeito quando a temperatura do núcleo estiver muito alta.
			\item Considere que existem apenas duas possibilidades da temperatura real e da medida de temperatura: normal e alta. Suponha que a probabilidade de que o sensor marque a leitura correta da temperatura do núcleo seja $x$ quando não estiver com defeito e $y$ quando estiver com defeito. Determine a tabela de probabilidades condicionais associada à variável $G$.
			\item Suponha que o alarme funcione normalmente quando não esteja com defeito e que pare totalmente de disparar quando estiver com defeito. Determine a tabela de probabilidades condicionadas à variável $A$.
		\end{itemize}

	\subsection{Solução a}
		Temos que:
		\begin{itemize}
			\item Alarme dispara dependendo do sensor, então existe um arco $G \rightarrow A$
			\item A leitura do sensor depende da temperatura no núcleo ($T \rightarrow G$)
			\item Alarme dispara dependendo se está com defeito ($F_A \rightarrow A$)
			\item Leitura é correta se o sensor não tem defeito ($F_G \rightarrow G$)
		\end{itemize}
		A rede Bayesiana com aquelas relações é a seguinte:
		\begin{figure}[h]
			\centering
			\begin{tikzpicture}[ scale = 0.5 ]
				\tikzset{ vertex/.style = { shape = circle , draw , minimum size = 2.5em } }
				\tikzset{ edge/.style = { ->,> = latex' } }
				% vertices
				\node[ vertex ] (A) at  ( 6 , 0 ) { ${A}$ } ;
				\node[ vertex ] (F_A) at  ( 3 , 3 ) { ${F_A}$ } ;
				\node[ vertex ] (G) at  ( 6 , 3 ) { ${G}$ } ;
				\node[ vertex ] (F_G) at  ( 3 , 6 ) { ${F_G}$ } ;
				\node[ vertex ] (T) at  ( 6 , 6 ) { ${T}$ } ;
			%edges
				\draw[ edge ] (T) to (G) ;
				\draw[ edge ] (F_G) to (G) ;
				\draw[ edge ] (G) to (A) ;
				\draw[ edge ] (F_A) to (A) ;
			\end{tikzpicture}
		\end{figure}

	\subsection{Solução b}
		\begin{table}[h]
			\centering
			\begin{tabular}{ l | c | c | c | c | }
				\cline{2-5}
				& \multicolumn{2}{|c|}{$T = normal$} & \multicolumn{2}{|c|}{$T = alta$} \\ \cline{2-5}
				& $F_G = 0$ & $F_G = 1$ & $F_G = 0$ & $F_G = 1$ \\ \hline
				$G = normal$ & $x$ & $y$ & $1 - x$ & $1 - y$ \\ \hline
				$G = alta$ & $1 - x$ & $1 - y$ & $x$ & $y$ \\ \hline
			\end{tabular}
			\caption{Tabela de probabilidades para a variável $G$}
			\label{fig:variableg}
		\end{table}

	\subsection{Solução c}
		\begin{table}[h]
			\centering
			\begin{tabular}{ l | c | c | c | c | }
				\cline{2-5}
				& \multicolumn{2}{|c|}{$G = normal$} & \multicolumn{2}{|c|}{$G = alta$} \\ \cline{2-5}
				& $F_A = 0$ & $F_A = 1$ & $F_A = 0$ & $F_A = 1$ \\ \hline
				$A = 0$ & 1 & 1 & 0 & 1 \\ \hline
				$A = 1$ & 0 & 0 & 1 & 0 \\ \hline
			\end{tabular}
			\caption{Tabela de probabilidades para a variável $A$}
			\label{fig:variablea}
		\end{table}

\section{Raciocínio Probabilístico Temporal}
	\subsection{Declaração}
		Um professor quer saber se seus estudantes estão tendo tempos suficientes de sono. Diariamente, o professor observa se os estudantes dormem em sala de aula e/ou se eles apresentam os olhos vermelhos de sono. O professor tem a seguinte teoria:
		\begin{itemize}
			\item A probabilidade \emph{a priori} de se ter tempo suficiente de sono, isto é, sem nenhuma observação, é 0.7.
			\item A probabilidade de se ter tempo suficiente de sono em uma noite $t$ é 0.8, dado que o estudante teve tempo suficiente de sono na noite anterior $t - 1$, e 0.3 se não teve.
			\item A probabilidade de ter olhos vermelhos de sono em classe é 0.2, se o estudante teve tempo suficiente de sono na noite anterior, e 0.7, se não teve.
			\item A probabilidade de se dormir em classe é 0.1 se o estudante teve tempo suficiente de sono na noite anterior, e 0.3 se não teve.
		\end{itemize}
		\textbf{Questões:}
		\begin{itemize}
			\item Formule ese problema através de uma rede bayesiana dinâmica que o professor possa usar para estimar se o estudante teve tempo de sono suficiente na última noite (\emph{filtering}) dado uma sequência de observações.
			\item Apresente todas as tabelas de probabilidades condicionais do modelo
			\item Dado as seguintes evidências para um aluno:
			\begin{itemize}
				\item $e_1 :=$ \emph{não olhos vermelhos, não dormiu em classe}
				\item $e_2 :=$ \emph{olhos vermelhos, não dormiu em classe}
				\item $e_3 :=$ \emph{olhos vermelhos, dormiu em classe}
			\end{itemize}
			calcule as probabilidades $P( {TempoSuficienteDeSono}_t \mid e_{1:t} )$ para cada $t = 1,2,3$, onde $e_{1:t}$ representa a sequência de evidências do tempo 1 até o tempo $t$
		\end{itemize}

	\subsection{Solução a}
		A figura~\ref{fig:dynamic} mostra uma versão geral da rede Bayesiana dinâmica na parte esquerda, mas para o problema será usada a rede na direita porque inicialmente o professor não tem observações.
		\begin{figure}[h]
			\centering
			\subfigure[Rede Bayesiana dinâmica]{
				\begin{tikzpicture}[ scale = 0.5 ]
					\tikzset{ vertex/.style = { shape = circle , draw , minimum size = 3em } }
					\tikzset{ edge/.style = { ->,> = latex' } }
					% vertices
					\node[ vertex ] (Tt0) at  ( 0 , 4 ) { ${T_{t-1}}$ } ;
					\node[ vertex ] (Ot0) at  ( 0 , 0 ) { ${O_{t-1}}$ } ;
					\node[ vertex ] (Dt0) at  ( 4 , 4 ) { ${D_{t-1}}$ } ;
					\node[ vertex ] (Tt1) at  ( 8 , 0 ) { ${T_{t}}$ } ;
					\node[ vertex ] (Ot1) at  ( 4 , 0 ) { ${O_{t}}$ } ;
					\node[ vertex ] (Dt1) at  ( 8 , 4 ) { ${D_{t}}$ } ;
					%edges
					\draw[ edge ] (Tt0) to (Tt1) ;
					\draw[ edge ] (Ot0) to (Ot1) ;
					\draw[ edge ] (Dt0) to (Dt1) ;
					\draw[ edge ] (Tt0) to (Ot0) ;
					\draw[ edge ] (Tt0) to (Dt0) ;
					\draw[ edge ] (Tt1) to (Ot1) ;
					\draw[ edge ] (Tt1) to (Dt1) ;
				\end{tikzpicture}
			}
			\subfigure[Representação para o problema]{
				\centering
				\begin{tikzpicture}[ scale = 0.5 ]
					\tikzset{ vertex/.style = { shape = circle , draw , minimum size = 3em } }
					\tikzset{ edge/.style = { ->,> = latex' } }
					% vertices
					\node[ vertex ] (Tt0) at  ( 0 , 4 ) { ${T_0}$ } ;
					\node[ vertex ] (Tt1) at  ( 4 , 4 ) { ${T_1}$ } ;
					\node[ vertex ] (Ot1) at  ( 0 , 0 ) { ${O_1}$ } ;
					\node[ vertex ] (Dt1) at  ( 4 , 0 ) { ${D_1}$ } ;
					%edges
					\draw[ edge ] (Tt0) to (Tt1) ;
					\draw[ edge ] (Tt1) to (Ot1) ;
					\draw[ edge ] (Tt1) to (Dt1) ;
				\end{tikzpicture}
				\label{fig:representation}
			}
			\caption{Rede Bayesiana dinâmica}
			\label{fig:dynamic}
		\end{figure}

	\subsection{Solução b}
		As tabelas~\ref{tab:t0},~\ref{tab:t1},~\ref{tab:o1} e~\ref{tab:d1} mostram as probabilidades condicionais para as variáveis $T_0$, $T_1$, $O_1$ e $D_1$ respectivamente.
		\begin{table}[h]
			\centering
			\subtable[$T_0$]{
				\centering
				\begin{tabular}{ | l | c | }
					\hline
					$v$ & $P( T_0 = v )$ \\ \hline
					0 & 0.3 \\ \hline
					1 & 0.7 \\ \hline
				\end{tabular}
				\label{tab:t0}
			}
			\subtable[$T_1$]{
				\centering
				\begin{tabular}{ | l | c | c | }
					\hline
					$T_1$ & $T_0 = 0$ & $T_0 = 1$ \\ \hline
					0 & 0.7 & 0.2 \\ \hline
					1 & 0.3 & 0.8 \\ \hline
				\end{tabular}
				\label{tab:t1}
			}
			\subtable[$O_1$]{
				\centering
				\begin{tabular}{ | l | c | c | }
					\hline
					$O_1$ & $T_1 = 0$ & $T_1 = 1$ \\ \hline
					0 & 0.3 & 0.8 \\ \hline
					1 & 0.7 & 0.2 \\ \hline
				\end{tabular}
				\label{tab:o1}
			}
			\subtable[$D_1$]{
				\centering
				\begin{tabular}{ | l | c | c | }
					\hline
					$D_1$ & $T_1 = 0$ & $T_1 = 1$ \\ \hline
					0 & 0.7 & 0.9 \\ \hline
					1 & 0.3 & 0.1 \\ \hline
				\end{tabular}
				\label{tab:d1}
			}
			\caption{Probabilidades condicionais das variáveis em~\ref{fig:representation}}
		\end{table}

\section{Tomando decisões simples}
	\subsection{Declaração}
		Considere um estudante que tem a escolha de comprar ou não comprar um livro-texto para um curso. Vamos modelar essa situação como um problema de decisão com as seguintes variáveis booleanas $B$ (${BuyBook}$), indicando se o estudante escolhe comprar o livro, $M$(${Mastery}$), indicando se o estudante dominou o conteúdo do livro e $P$ (${Pass}$), indicando se o estudante passa no curso. Além disso, considere a variável de utilidade $U$. Você pode pensar que $P$ seria independente de $B$ dado $M$, mas para esse curso existe uma prova final com consulta ao livro, então ter o livro pode ajudar. O modelo de decisão que ilustra esse problema é dado na Figura~\ref{fig:model}.
		\begin{figure}[h]
			\centering
			\includegraphics[height=4cm]{model}
			\caption{Modelo de decisão de se comprar ou não o livro-texto}
			\label{fig:model}
		\end{figure}
		Um estudante, João, tem a seguinte função de utilidade aditiva: \$0 para não comprar o livro e -\$100 para comprar o livro; e \$2000 para passar o curso e \$0 para não passar. As estimativas de probabilidade condicionais são as seguintes:
		\begin{itemize}
			\item $P( p \mid b , m ) = 0.9$
			\item $P( p \mid b , \lnot m ) = 0.5$
			\item $P( p \mid \lnot b , m ) = 0.8$
			\item $P( p \mid \lnot b , \lnot m ) = 0.3$
			\item $P( m \mid b ) = 0.9$
			\item $P( m \mid \lnot b ) = 0.7$
		\end{itemize}
		\textbf{Questões:}
		\begin{itemize}
			\item Calcule a utilidade esperada de João decidir comprar o livro e de decidir não comprar o livro
			\item Racionalmente, o que João deveria fazer
		\end{itemize}

	\subsection{Solução a}
		A utilidade esperada de comprar o livro é:
		\begin{itemize}
			\item $E( U( b , p ) ) + E( U( b , \lnot p ) )$
			\item O primer termo pode ser calculado da seguinte forma:
			\begin{itemize}
				\item $U( b ) P( b ) + U( p ) P( p \mid b , m ) P( m \mid b ) + U( p ) P( p \mid b , \lnot m ) P( \lnot m \mid b )$
				\item $-100 (0.5) + 2000 (0.9)(0.9) + 2000 (0.5)(0.1)$
				\item 1,670
			\end{itemize}
			\item Da mesma forma para o segundo termo:
			\begin{itemize}
				\item $U( b ) P( b ) + U( \lnot p ) P( \lnot p \mid b , m ) P( m \mid b ) + U( \lnot p ) P( \lnot p \mid b , \lnot m ) P( \lnot m \mid b ) $
				\item $-100 (0.5) + 0 (0.1)(0.9) + 0 (0.5)(0.1)$
				\item -100
			\end{itemize}
			\item Então a utilidade esperada será: 1,570
		\end{itemize}
		Por outro lado a utilidade esperada de \textbf{NÃO} comprar o livro é:
		\begin{itemize}
			\item $E( U( \lnot b , p ) ) + E( U( \lnot b , \lnot p ) )$
			\item Primer termo:
			\begin{itemize}
				\item $U( \lnot b ) P( \lnot b ) + U( p ) P( p \mid \lnot b , m ) P( m \mid \lnot b ) + U( p ) P( p \mid \lnot b , \lnot m ) P( \lnot m \mid \lnot b )$
				\item $0 (0.5) + 2000 (0.8)(0.7) + 2000 (0.3)(0.3)$
				\item 1,300
			\end{itemize}
			\item Segundo termo:
			\begin{itemize}
				\item $U( \lnot b ) P( \lnot b ) + U( \lnot p ) P( \lnot p \mid \lnot b , m ) P( m \mid \lnot b ) + U( \lnot p ) P( \lnot p \mid \lnot b , \lnot m ) P( \lnot m \mid \lnot b )$
				\item $0 (0.5) + 0 (0.1) + 0 (0.5)$
				\item 0
			\end{itemize}
			\item Então a utilidade esperada será: 1,300
		\end{itemize}

	\subsection{Solução b}
		Tomando em consideração as utilidades esperadas de comprar ou não o livro, João deveria comprar o livro porque tem uma utilidade esperada maior.

\section{Tomando decisões complexas}
	\subsection{Declaração}
		Considere o problema do \emph{grid world} de tamanho 3x3 apresentado na Figura~\ref{fig:grid}. As ações do agente em cada estado são \emph{Up} (mover-se para acima), \emph{Down} (mover-se para baixo), \emph{Right} (mover-se para direita) e \emph{Left} (mover-se para esqueda). O modelo de transição é o seguinte: 80\% do tempo o agente se move na direção desejada; e no resto do tempo o agente se move em um ângulo de 90 graus com a direção desejada com 10\% de chance para cada direção. Uma colisão com a parede resulta em ficar parado.
		\begin{figure}[h]
			\centering
			\includegraphics[height=3cm]{grid}
			\caption{\footnotesize{Grid world 3 x 3: cada estado é uma posição $( i , j ), 1 \leq i , j \leq 3$ no
			grid; as recompensas de cada estado estão marcadas nas respectivas posições, isto é, $R(1, 3) = +10, R(1, 1) = r, \ldots .$}}
			\label{fig:grid}
		\end{figure}
		\\
		\textbf{Questões:}
		\begin{itemize}
			\item Implemente o algoritmo de \textbf{Iteração de Valor} para esse problema para cada valor de $r \in \{-1000, -100, -3, 0\}$. Use um fator de desconto $\gamma = 0, 99$. Copie no relatório apenas o código do algoritmo (não é necessário entregar arquivos de código fonte).
OBS: a implementação do algoritmo pode ser feita em qualquer linguagem.
			\item Apresente a política obtida para cada valor de $r$ em um mapa de ações em um grid de 3x3 (conforme exemplo da Figura~\ref{fig:example}) e explique intuitivamente o porquê de cada valor de $r$ implicar em cada política.
		\end{itemize}
		\begin{figure}[h]
			\centering
			\includegraphics[height=3cm]{example}
			\caption{\footnotesize{Exemplo de mapa de ações para um \emph{grid} 3x4}}
			\label{fig:example}
		\end{figure}

	\subsection{Solução a}
		A implementação do algoritmo se mostra a continuação:
		\begin{lstlisting}
			def transition( state , R , v_previous ) :
				values = []
				for direction in actions :
					v = 0.0
					for A in probs[ direction ] :
						new_state = s_line( state , A )
						v += probs[ direction ][ A ] * ( R[ new_state ] + GAMMA * v_previous[ new_state ] )
					values.append( v )
				return values

			def policy( state , V_opt ) :
				max_val = -1e6
				poli = None
				for direction in actions :
					v = 0.0
					for A in probs[ direction ] :
						new_state = s_line( state , A )
						v += probs[ direction ][ A ] * V_opt[ new_state ]
					if v > max_val :
						max_val = v
						poli = direction
				return poli

			def iteration_value( fpath ) :
				R = read_values( fpath )
				states = R.keys()
				# Initialize
				V = dict( [ ( ( x , y ) , 0.0 ) for ( x , y ) in states ] )
				# Repeat until convergence
				currentLayer = copy( V )
				while True :
					nextLayer = {}
					for s in states :
						transition_values = transition( s , R , currentLayer )
						nextLayer[ s ] = max( transition_values )
					max_diff = max( [ fabs( nextLayer[ s ] - currentLayer[ s ] ) for s in states ] )
					if max_diff <= EPS : break
					currentLayer = copy( nextLayer )
				V = copy( currentLayer )
				# Compute policy
				P = {}
				for s in states : P[ s ] = policy( s , V )
				return P
		\end{lstlisting}
		onde $s\_line$ retorna o estado seguinte dada uma ação $A$, $GAMMA = 0.9$, $EPS = 1e-6$, $actions$ contem todas as ações que pode escolher o agente e $probs$ as probabilidades de ir numa direção dado que escolheu alguma ação.

	\subsection{Solução b}
		Para os tres primeiros valores de $r$, as políticas obtidas tentam não mover-se para a celda com valor $r$ porque estes são valores negativos e fazem que a utilidade esperada para ir até ela seja bastante menor comparada com outras ações, mas quando o valor deixa de ser negativo ($r = 0$) a melhor política obtém que é melhor passar por ela. Os gráficos mostram que para os primeiros casos ($r = -1000, -100 , -3$) sempre vai para a direita, ou seja ficando mais longe daquela celda, mas para o último valor ($r=0$), tenta ficar sempre mais perto dela.
		\begin{figure}[h]
			\centering
			\subfigure[Políticas usando $r = -1000$]{
				\centering
				\begin{tikzpicture}[ scale = 1.5 ]
					\draw[step=1cm,gray,very thin] (0,0) grid (3,3);
					\draw[thick,->] (0.25,2.5) -- (0.75,2.5);
					\draw[thick,->] (1.25,2.5) -- (1.75,2.5);
					\draw[thick,->] (2.5,2.25) -- (2.5,2.75);
					\draw[thick,->] (0.25,1.5) -- (0.75,1.5);
					\draw[thick,->] (1.25,1.5) -- (1.75,1.5);
					\draw[thick,->] (2.5,1.25) -- (2.5,1.75);
					\draw[thick,->] (0.25,0.5) -- (0.75,0.5);
					\draw[thick,->] (1.25,0.5) -- (1.75,0.5);
					\draw[thick,->] (2.5,0.25) -- (2.5,0.75);
				\end{tikzpicture}
				\label{fig:it1}
			}
			\subfigure[Políticas usando $r = -100$]{
				\centering
				\begin{tikzpicture}[ scale = 1.5 ]
					\draw[step=1cm,gray,very thin] (0,0) grid (3,3);
					\draw[thick,->] (0.25,2.5) -- (0.75,2.5);
					\draw[thick,->] (1.25,2.5) -- (1.75,2.5);
					\draw[thick,->] (2.5,2.25) -- (2.5,2.75);
					\draw[thick,->] (0.25,1.5) -- (0.75,1.5);
					\draw[thick,->] (1.25,1.5) -- (1.75,1.5);
					\draw[thick,->] (2.5,1.25) -- (2.5,1.75);
					\draw[thick,->] (0.25,0.5) -- (0.75,0.5);
					\draw[thick,->] (1.25,0.5) -- (1.75,0.5);
					\draw[thick,->] (2.5,0.25) -- (2.5,0.75);
				\end{tikzpicture}
				\label{fig:it2}
			}
			\centering
			\subfigure[Políticas usando $r = -3$]{
				\centering
				\begin{tikzpicture}[ scale = 1.5 ]
					\draw[step=1cm,gray,very thin] (0,0) grid (3,3);
					\draw[thick,->] (0.25,2.5) -- (0.75,2.5);
					\draw[thick,->] (1.25,2.5) -- (1.75,2.5);
					\draw[thick,->] (2.5,2.25) -- (2.5,2.75);
					\draw[thick,->] (0.25,1.5) -- (0.75,1.5);
					\draw[thick,->] (1.25,1.5) -- (1.75,1.5);
					\draw[thick,->] (2.5,1.25) -- (2.5,1.75);
					\draw[thick,->] (0.25,0.5) -- (0.75,0.5);
					\draw[thick,->] (1.25,0.5) -- (1.75,0.5);
					\draw[thick,->] (2.5,0.25) -- (2.5,0.75);
				\end{tikzpicture}
				\label{fig:it3}
			}
			\subfigure[Políticas usando $r = 0$]{
				\centering
				\begin{tikzpicture}[ scale = 1.5 ]
					\draw[step=1cm,gray,very thin] (0,0) grid (3,3);
					\draw[thick,->] (0.25,2.5) -- (0.75,2.5);
					\draw[thick,->] (1.25,2.5) -- (1.75,2.5);
					\draw[thick,->] (2.5,2.25) -- (2.5,2.75);
					\draw[thick,->] (0.5,1.25) -- (0.5,1.75);
					\draw[thick,->] (1.5,1.25) -- (1.5,1.75);
					\draw[thick,->] (2.5,1.25) -- (2.5,1.75);
					\draw[thick,->] (0.5,0.25) -- (0.5,0.75);
					\draw[thick,->] (1.5,0.25) -- (1.5,0.75);
					\draw[thick,->] (2.5,0.25) -- (2.5,0.75);
				\end{tikzpicture}
				\label{fig:it4}
			}
		\end{figure}
\end{document}

## iteration_value.py
import sys
from math import fabs
from copy import deepcopy as copy

GAMMA = 0.9
EPS = 1e-6
actions = [ 'Up' , 'Down' , 'Left' , 'Right' ]
probs = {
	'Up' : { 'Up' : 0.8 , 'Left' : 0.1 , 'Right' : 0.1 } ,
	'Down' : { 'Down' : 0.8 , 'Left' : 0.1 , 'Right' : 0.1 } ,
	'Left' : { 'Up' : 0.1 , 'Left' : 0.8 , 'Down' : 0.1 } ,
	'Right' : { 'Up' : 0.1 , 'Down' : 0.1 , 'Right' : 0.8 }
}

def read_values( fpath ) :
	R = []
	with open( fpath , 'r' ) as f :
		for line in f :
			row = line[ :-1 ].split()
			row = [ float( x ) for x in row ]
			R.append( row )
	states = [ ( i , j ) for i in xrange( len( R ) ) for j in xrange( len( R[ i ] ) ) ]
	R = dict( [ ( ( x , y ) , R[ x ][ y ] ) for ( x , y ) in states ] )
	return R

def s_line( state , action ) :
	x , y = state[ 0 ] , state[ 1 ]
	if action == 'Down' :
		x = ( x + 1 if x + 1 < 3 else x )
	elif action == 'Up' :
		x = ( x - 1 if x - 1 >= 0 else x )
	elif action == 'Left' :
		y = ( y - 1 if y - 1 >= 0 else y )
	elif action == 'Right' :
		y = ( y + 1 if y + 1 < 3 else y )
	return ( x , y )

def transition( state , R , v_previous ) :
	values = []
	#print "Calculating values for V_i[ %s ] =================================" % ( state ,)
	for direction in actions :
		v = 0.0
		#print "ACTION = %s" % direction
		for A in probs[ direction ] :
			new_state = s_line( state , A )
			#print "P( %s , %s ) * ( R[ %s ] + %s * V_i-1[ %s ] ) = %s ( %s + %s * %s )" % ( new_state , A , new_state , GAMMA , new_state , probs[ direction ][ A ] , R[ new_state ] , GAMMA , v_previous[ new_state ] )
			v += probs[ direction ][ A ] * ( R[ new_state ] + GAMMA * v_previous[ new_state ] )
		values.append( v )
	return values

def debug( l ) :
	print "NEXT"
	print "%10.3f %10.3f %10.3f" % ( l[ ( 0 , 0 ) ] , l[ ( 0 , 1 ) ] , l[ ( 0 , 2 ) ] )
	print "%10.3f %10.3f %10.3f" % ( l[ ( 1 , 0 ) ] , l[ ( 1 , 1 ) ] , l[ ( 1 , 2 ) ] )
	print "%10.3f %10.3f %10.3f" % ( l[ ( 2 , 0 ) ] , l[ ( 2 , 1 ) ] , l[ ( 2 , 2 ) ] )

def policy( state , V_opt ) :
	max_val = -1e6
	poli = None
	for direction in actions :
		v = 0.0
		for A in probs[ direction ] :
			new_state = s_line( state , A )
			v += probs[ direction ][ A ] * V_opt[ new_state ]
		if v > max_val :
			max_val = v
			poli = direction
	return poli

def iteration_value( fpath ) :
	R = read_values( fpath )
	states = R.keys()
	# Initialize
	V = dict( [ ( ( x , y ) , 0.0 ) for ( x , y ) in states ] )
	# Repeat until convergence
	currentLayer = copy( V )
	i = 0
	while True :
		#print "ITERATION %s" % i
		nextLayer = {}
		for s in states :
			transition_values = transition( s , R , currentLayer )
			nextLayer[ s ] = max( transition_values )
		#debug( nextLayer )
		max_diff = max( [ fabs( nextLayer[ s ] - currentLayer[ s ] ) for s in states ] )
		#print "DIFF = %s" % max( diff )
		if max_diff <= EPS : break
		currentLayer = copy( nextLayer )
		i += 1
	V = copy( currentLayer )
	debug( V )
	# Compute policy
	P = {}
	for s in states : P[ s ] = policy( s , V )
	return P

def to_latex( P ) :
	for i in range( 3 ) : print "%s %s %s" % ( P[ ( i , 0 ) ] , P[ ( i , 1 ) ] , P[ ( i , 2 ) ] )
	for i in range( 3 ) :
		for j in range( 3 ) :
			#print P[ ( i , j ) ]
			if P[ ( i , j ) ] == 'Right' or P[ ( i , j ) ] == 'Left' :
				left = True if P[ ( i , j ) ] == 'Left' else False
				x_ini = j + ( 0.25 if not left else 0.75 )
				x_end = j + ( 0.75 if not left else 0.25 )
				y = 2 - i + 0.5
				print "\t\t\t\t\t\draw[thick,->] (%s,%s) -- (%s,%s);" % ( x_ini , y , x_end , y )
			else :
				down = True if P[ ( i , j ) ] == 'Down' else False
				y_ini = 2 - i + ( 0.25 if not left else 0.75 )
				y_end = 2 - i + ( 0.75 if not left else 0.25 )
				x = j + 0.5
				print "\t\t\t\t\t\draw[thick,->] (%s,%s) -- (%s,%s);" % ( x , y_ini , x , y_end )


if __name__ == '__main__' :
	vfile = 'r1.txt'
	if len( sys.argv ) > 1 :
		vfile = sys.argv[ 1 ]
	P = iteration_value( vfile )
	#to_latex( P )

## model.png

      
    Raw
  

              model.png
            
          
## r1.txt
-1000 -1 10
-1 -1 -1
-1 -1 -1

## r2.txt
-100 -1 10
-1 -1 -1
-1 -1 -1

## r3.txt
-3 -1 10
-1 -1 -1
-1 -1 -1

## r4.txt
0 -1 10
-1 -1 -1
-1 -1 -1

## r5.txt
0 0 -0.1
0 10 -0.1
0 0 -0.1

## r6.txt
9 -1 10
-1 -1 -1
-1 -1 -1
	\documentclass{article}

	\usepackage[utf8]{inputenc}
	\usepackage[portuguese]{babel}
	\usepackage{listings}
	\usepackage{subfigure}

	\lstset{
	basicstyle=\footnotesize,
	breaklines=true,
	language=Python,
	numbers=left,
	numbersep=-20pt,
	stepnumber=1,
	tabsize=2,
	showstringspaces=false,
	escapechar=\|
	}

	\usepackage{tikz}
	\usetikzlibrary{arrows}

	\title{Lista 2 de Inteligência Artificial}
	\author{9410313 - Walter Perez Urcia}

	\begin{document}

	\maketitle

	\section{Quantificando as incertezas}
	\subsection{Declaração}
	Após seu \emph{checkup} médico anual, o doutor lhe informa que tem uma boa e uma má notícia. A má notícia é que seu exame deu positivo para uma doença grave e que o exame tem acurácia de 99\%, isto é, a probabilidade do teste dar positivo dado que se tem a doença é 0.99, da mesma forma que a probabilidade do resultado ser negativo dado que não se tem a doença é 0.99. A boa notícia é que se trata de uma doença rara, que ocorre somente em 1 caso a cada 10.000 pessoas de sua idade.\\
	\textbf{Questões:}
	\begin{itemize}
	\item Por qué a doença ser rara é uma boa notícia?
	\item Quais são as chances que você de fato tenha a doença?
	\end{itemize}
	\subsection{Solução a}
	Definimos as variáveis booleanas:
	\begin{itemize}
	\item $T$: Teste deu positivo ($T = 1$) ou negativo ($T = 0$)
	\item $D$: A pessoa tem a doença ($D = 1$) ou não tem ($D = 0$)
	\end{itemize}
	Então traduzindo o enunciado em probabilidades temos:
	\begin{itemize}
	\item $P( T = 1 \mid D = 1 ) = 0.99$
	\item $P( T = 0 \mid D = 0 ) = 0.99$
	\item $P( D = 1 ) = X = 0.0001$
	\end{itemize}
	Podemos calcular a probabilidade do teste dar positivo para qualquer pessoa:
	\begin{itemize}
	\item $P( T = 1 ) = P( T = 1 \mid D = 0 ) P( D = 0 ) + P( T = 1 \mid D = 1 ) P( D = 1 )$
	\item $P( T = 1 ) = 0.01 ( 1 - X ) + 0.99X$
	\item $P( T = 1 ) = 0.01 - 0.01X + 0.99X$
	\item $P( T = 1 ) = 0.01 + 0.98X$
	\item $P( T = 1 ) = 0.010098$
	\end{itemize}
	No penúltimo passo, podemos ver que se a doença não fosse rara, a probabilidade que o teste de positivo poderia ser até 99\%. Portanto, é uma boa notícia que a doença seja rara porque diminui aquela probabilidade até 1\%.

	\subsection{Solução b}
	As chances de ter a doença dado que o teste deu positivo são:
	\begin{itemize}
	\item $P( D = 1 \mid T = 1 ) = \frac{P( T = 1 \mid D = 1 ) P( D = 1 )}{P( T = 1 )}$
	\item $P( D = 1 \mid T = 1 ) = \frac{0.99 \times 0.0001}{0.010098}$
	\item $P( D = 1 \mid T = 1 ) = 0.00980392156863$
	\end{itemize}

	\section{Raciocínio Probabilístico}
	\subsection{Declaração}
	Em uma usina nuclear, existe um alarme que dispara quando a temperatura de um sensor excede um determinado limiar. O sensor mede a temperatura no núcleo da planta. Considere as variáveis booleanas $A$ (alarme dispara), $F_A$ (alarme está com defeito) e $F_G$ (sensor está com defeito) e as variáveis multivaloradas $G$ (leitura do sensor de temperatura) e $T$ (temperatura real do núcleo).\\
	\textbf{Questões:}
	\begin{itemize}
	\item Desenhe uma rede Bayesiana para esse domínio (utilizando as variáveis descritas anteriormente), dado que se torna mais provável que o sensor esteja com defeito quando a temperatura do núcleo estiver muito alta.
	\item Considere que existem apenas duas possibilidades da temperatura real e da medida de temperatura: normal e alta. Suponha que a probabilidade de que o sensor marque a leitura correta da temperatura do núcleo seja $x$ quando não estiver com defeito e $y$ quando estiver com defeito. Determine a tabela de probabilidades condicionais associada à variável $G$.
	\item Suponha que o alarme funcione normalmente quando não esteja com defeito e que pare totalmente de disparar quando estiver com defeito. Determine a tabela de probabilidades condicionadas à variável $A$.
	\end{itemize}

	\subsection{Solução a}
	Temos que:
	\begin{itemize}
	\item Alarme dispara dependendo do sensor, então existe um arco $G \rightarrow A$
	\item A leitura do sensor depende da temperatura no núcleo ($T \rightarrow G$)
	\item Alarme dispara dependendo se está com defeito ($F_A \rightarrow A$)
	\item Leitura é correta se o sensor não tem defeito ($F_G \rightarrow G$)
	\end{itemize}
	A rede Bayesiana com aquelas relações é a seguinte:
	\begin{figure}[h]
	\centering
	\begin{tikzpicture}[ scale = 0.5 ]
	\tikzset{ vertex/.style = { shape = circle , draw , minimum size = 2.5em } }
	\tikzset{ edge/.style = { ->,> = latex' } }
	% vertices
	\node[ vertex ] (A) at ( 6 , 0 ) { ${A}$ } ;
	\node[ vertex ] (F_A) at ( 3 , 3 ) { ${F_A}$ } ;
	\node[ vertex ] (G) at ( 6 , 3 ) { ${G}$ } ;
	\node[ vertex ] (F_G) at ( 3 , 6 ) { ${F_G}$ } ;
	\node[ vertex ] (T) at ( 6 , 6 ) { ${T}$ } ;
	%edges
	\draw[ edge ] (T) to (G) ;
	\draw[ edge ] (F_G) to (G) ;
	\draw[ edge ] (G) to (A) ;
	\draw[ edge ] (F_A) to (A) ;
	\end{tikzpicture}
	\end{figure}

	\subsection{Solução b}
	\begin{table}[h]
	\centering
	\begin{tabular}{ l \| c \| c \| c \| c \| }
	\cline{2-5}
	& \multicolumn{2}{\|c\|}{$T = normal$} & \multicolumn{2}{\|c\|}{$T = alta$} \\ \cline{2-5}
	& $F_G = 0$ & $F_G = 1$ & $F_G = 0$ & $F_G = 1$ \\ \hline
	$G = normal$ & $x$ & $y$ & $1 - x$ & $1 - y$ \\ \hline
	$G = alta$ & $1 - x$ & $1 - y$ & $x$ & $y$ \\ \hline
	\end{tabular}
	\caption{Tabela de probabilidades para a variável $G$}
	\label{fig:variableg}
	\end{table}

	\subsection{Solução c}
	\begin{table}[h]
	\centering
	\begin{tabular}{ l \| c \| c \| c \| c \| }
	\cline{2-5}
	& \multicolumn{2}{\|c\|}{$G = normal$} & \multicolumn{2}{\|c\|}{$G = alta$} \\ \cline{2-5}
	& $F_A = 0$ & $F_A = 1$ & $F_A = 0$ & $F_A = 1$ \\ \hline
	$A = 0$ & 1 & 1 & 0 & 1 \\ \hline
	$A = 1$ & 0 & 0 & 1 & 0 \\ \hline
	\end{tabular}
	\caption{Tabela de probabilidades para a variável $A$}
	\label{fig:variablea}
	\end{table}

	\section{Raciocínio Probabilístico Temporal}
	\subsection{Declaração}
	Um professor quer saber se seus estudantes estão tendo tempos suficientes de sono. Diariamente, o professor observa se os estudantes dormem em sala de aula e/ou se eles apresentam os olhos vermelhos de sono. O professor tem a seguinte teoria:
	\begin{itemize}
	\item A probabilidade \emph{a priori} de se ter tempo suficiente de sono, isto é, sem nenhuma observação, é 0.7.
	\item A probabilidade de se ter tempo suficiente de sono em uma noite $t$ é 0.8, dado que o estudante teve tempo suficiente de sono na noite anterior $t - 1$, e 0.3 se não teve.
	\item A probabilidade de ter olhos vermelhos de sono em classe é 0.2, se o estudante teve tempo suficiente de sono na noite anterior, e 0.7, se não teve.
	\item A probabilidade de se dormir em classe é 0.1 se o estudante teve tempo suficiente de sono na noite anterior, e 0.3 se não teve.
	\end{itemize}
	\textbf{Questões:}
	\begin{itemize}
	\item Formule ese problema através de uma rede bayesiana dinâmica que o professor possa usar para estimar se o estudante teve tempo de sono suficiente na última noite (\emph{filtering}) dado uma sequência de observações.
	\item Apresente todas as tabelas de probabilidades condicionais do modelo
	\item Dado as seguintes evidências para um aluno:
	\begin{itemize}
	\item $e_1 :=$ \emph{não olhos vermelhos, não dormiu em classe}
	\item $e_2 :=$ \emph{olhos vermelhos, não dormiu em classe}
	\item $e_3 :=$ \emph{olhos vermelhos, dormiu em classe}
	\end{itemize}
	calcule as probabilidades $P( {TempoSuficienteDeSono}_t \mid e_{1:t} )$ para cada $t = 1,2,3$, onde $e_{1:t}$ representa a sequência de evidências do tempo 1 até o tempo $t$
	\end{itemize}

	\subsection{Solução a}
	A figura~\ref{fig:dynamic} mostra uma versão geral da rede Bayesiana dinâmica na parte esquerda, mas para o problema será usada a rede na direita porque inicialmente o professor não tem observações.
	\begin{figure}[h]
	\centering
	\subfigure[Rede Bayesiana dinâmica]{
	\begin{tikzpicture}[ scale = 0.5 ]
	\tikzset{ vertex/.style = { shape = circle , draw , minimum size = 3em } }
	\tikzset{ edge/.style = { ->,> = latex' } }
	% vertices
	\node[ vertex ] (Tt0) at ( 0 , 4 ) { ${T_{t-1}}$ } ;
	\node[ vertex ] (Ot0) at ( 0 , 0 ) { ${O_{t-1}}$ } ;
	\node[ vertex ] (Dt0) at ( 4 , 4 ) { ${D_{t-1}}$ } ;
	\node[ vertex ] (Tt1) at ( 8 , 0 ) { ${T_{t}}$ } ;
	\node[ vertex ] (Ot1) at ( 4 , 0 ) { ${O_{t}}$ } ;
	\node[ vertex ] (Dt1) at ( 8 , 4 ) { ${D_{t}}$ } ;
	%edges
	\draw[ edge ] (Tt0) to (Tt1) ;
	\draw[ edge ] (Ot0) to (Ot1) ;
	\draw[ edge ] (Dt0) to (Dt1) ;
	\draw[ edge ] (Tt0) to (Ot0) ;
	\draw[ edge ] (Tt0) to (Dt0) ;
	\draw[ edge ] (Tt1) to (Ot1) ;
	\draw[ edge ] (Tt1) to (Dt1) ;
	\end{tikzpicture}
	}
	\subfigure[Representação para o problema]{
	\centering
	\begin{tikzpicture}[ scale = 0.5 ]
	\tikzset{ vertex/.style = { shape = circle , draw , minimum size = 3em } }
	\tikzset{ edge/.style = { ->,> = latex' } }
	% vertices
	\node[ vertex ] (Tt0) at ( 0 , 4 ) { ${T_0}$ } ;
	\node[ vertex ] (Tt1) at ( 4 , 4 ) { ${T_1}$ } ;
	\node[ vertex ] (Ot1) at ( 0 , 0 ) { ${O_1}$ } ;
	\node[ vertex ] (Dt1) at ( 4 , 0 ) { ${D_1}$ } ;
	%edges
	\draw[ edge ] (Tt0) to (Tt1) ;
	\draw[ edge ] (Tt1) to (Ot1) ;
	\draw[ edge ] (Tt1) to (Dt1) ;
	\end{tikzpicture}
	\label{fig:representation}
	}
	\caption{Rede Bayesiana dinâmica}
	\label{fig:dynamic}
	\end{figure}

	\subsection{Solução b}
	As tabelas~\ref{tab:t0},~\ref{tab:t1},~\ref{tab:o1} e~\ref{tab:d1} mostram as probabilidades condicionais para as variáveis $T_0$, $T_1$, $O_1$ e $D_1$ respectivamente.
	\begin{table}[h]
	\centering
	\subtable[$T_0$]{
	\centering
	\begin{tabular}{ \| l \| c \| }
	\hline
	$v$ & $P( T_0 = v )$ \\ \hline
	0 & 0.3 \\ \hline
	1 & 0.7 \\ \hline
	\end{tabular}
	\label{tab:t0}
	}
	\subtable[$T_1$]{
	\centering
	\begin{tabular}{ \| l \| c \| c \| }
	\hline
	$T_1$ & $T_0 = 0$ & $T_0 = 1$ \\ \hline
	0 & 0.7 & 0.2 \\ \hline
	1 & 0.3 & 0.8 \\ \hline
	\end{tabular}
	\label{tab:t1}
	}
	\subtable[$O_1$]{
	\centering
	\begin{tabular}{ \| l \| c \| c \| }
	\hline
	$O_1$ & $T_1 = 0$ & $T_1 = 1$ \\ \hline
	0 & 0.3 & 0.8 \\ \hline
	1 & 0.7 & 0.2 \\ \hline
	\end{tabular}
	\label{tab:o1}
	}
	\subtable[$D_1$]{
	\centering
	\begin{tabular}{ \| l \| c \| c \| }
	\hline
	$D_1$ & $T_1 = 0$ & $T_1 = 1$ \\ \hline
	0 & 0.7 & 0.9 \\ \hline
	1 & 0.3 & 0.1 \\ \hline
	\end{tabular}
	\label{tab:d1}
	}
	\caption{Probabilidades condicionais das variáveis em~\ref{fig:representation}}
	\end{table}

	\section{Tomando decisões simples}
	\subsection{Declaração}
	Considere um estudante que tem a escolha de comprar ou não comprar um livro-texto para um curso. Vamos modelar essa situação como um problema de decisão com as seguintes variáveis booleanas $B$ (${BuyBook}$), indicando se o estudante escolhe comprar o livro, $M$(${Mastery}$), indicando se o estudante dominou o conteúdo do livro e $P$ (${Pass}$), indicando se o estudante passa no curso. Além disso, considere a variável de utilidade $U$. Você pode pensar que $P$ seria independente de $B$ dado $M$, mas para esse curso existe uma prova final com consulta ao livro, então ter o livro pode ajudar. O modelo de decisão que ilustra esse problema é dado na Figura~\ref{fig:model}.
	\begin{figure}[h]
	\centering
	\includegraphics[height=4cm]{model}
	\caption{Modelo de decisão de se comprar ou não o livro-texto}
	\label{fig:model}
	\end{figure}
	Um estudante, João, tem a seguinte função de utilidade aditiva: \$0 para não comprar o livro e -\$100 para comprar o livro; e \$2000 para passar o curso e \$0 para não passar. As estimativas de probabilidade condicionais são as seguintes:
	\begin{itemize}
	\item $P( p \mid b , m ) = 0.9$
	\item $P( p \mid b , \lnot m ) = 0.5$
	\item $P( p \mid \lnot b , m ) = 0.8$
	\item $P( p \mid \lnot b , \lnot m ) = 0.3$
	\item $P( m \mid b ) = 0.9$
	\item $P( m \mid \lnot b ) = 0.7$
	\end{itemize}
	\textbf{Questões:}
	\begin{itemize}
	\item Calcule a utilidade esperada de João decidir comprar o livro e de decidir não comprar o livro
	\item Racionalmente, o que João deveria fazer
	\end{itemize}

	\subsection{Solução a}
	A utilidade esperada de comprar o livro é:
	\begin{itemize}
	\item $E( U( b , p ) ) + E( U( b , \lnot p ) )$
	\item O primer termo pode ser calculado da seguinte forma:
	\begin{itemize}
	\item $U( b ) P( b ) + U( p ) P( p \mid b , m ) P( m \mid b ) + U( p ) P( p \mid b , \lnot m ) P( \lnot m \mid b )$
	\item $-100 (0.5) + 2000 (0.9)(0.9) + 2000 (0.5)(0.1)$
	\item 1,670
	\end{itemize}
	\item Da mesma forma para o segundo termo:
	\begin{itemize}
	\item $U( b ) P( b ) + U( \lnot p ) P( \lnot p \mid b , m ) P( m \mid b ) + U( \lnot p ) P( \lnot p \mid b , \lnot m ) P( \lnot m \mid b ) $
	\item $-100 (0.5) + 0 (0.1)(0.9) + 0 (0.5)(0.1)$
	\item -100
	\end{itemize}
	\item Então a utilidade esperada será: 1,570
	\end{itemize}
	Por outro lado a utilidade esperada de \textbf{NÃO} comprar o livro é:
	\begin{itemize}
	\item $E( U( \lnot b , p ) ) + E( U( \lnot b , \lnot p ) )$
	\item Primer termo:
	\begin{itemize}
	\item $U( \lnot b ) P( \lnot b ) + U( p ) P( p \mid \lnot b , m ) P( m \mid \lnot b ) + U( p ) P( p \mid \lnot b , \lnot m ) P( \lnot m \mid \lnot b )$
	\item $0 (0.5) + 2000 (0.8)(0.7) + 2000 (0.3)(0.3)$
	\item 1,300
	\end{itemize}
	\item Segundo termo:
	\begin{itemize}
	\item $U( \lnot b ) P( \lnot b ) + U( \lnot p ) P( \lnot p \mid \lnot b , m ) P( m \mid \lnot b ) + U( \lnot p ) P( \lnot p \mid \lnot b , \lnot m ) P( \lnot m \mid \lnot b )$
	\item $0 (0.5) + 0 (0.1) + 0 (0.5)$
	\item 0
	\end{itemize}
	\item Então a utilidade esperada será: 1,300
	\end{itemize}

	\subsection{Solução b}
	Tomando em consideração as utilidades esperadas de comprar ou não o livro, João deveria comprar o livro porque tem uma utilidade esperada maior.

	\section{Tomando decisões complexas}
	\subsection{Declaração}
	Considere o problema do \emph{grid world} de tamanho 3x3 apresentado na Figura~\ref{fig:grid}. As ações do agente em cada estado são \emph{Up} (mover-se para acima), \emph{Down} (mover-se para baixo), \emph{Right} (mover-se para direita) e \emph{Left} (mover-se para esqueda). O modelo de transição é o seguinte: 80\% do tempo o agente se move na direção desejada; e no resto do tempo o agente se move em um ângulo de 90 graus com a direção desejada com 10\% de chance para cada direção. Uma colisão com a parede resulta em ficar parado.
	\begin{figure}[h]
	\centering
	\includegraphics[height=3cm]{grid}
	\caption{\footnotesize{Grid world 3 x 3: cada estado é uma posição $( i , j ), 1 \leq i , j \leq 3$ no
	grid; as recompensas de cada estado estão marcadas nas respectivas posições, isto é, $R(1, 3) = +10, R(1, 1) = r, \ldots .$}}
	\label{fig:grid}
	\end{figure}
	\\
	\textbf{Questões:}
	\begin{itemize}
	\item Implemente o algoritmo de \textbf{Iteração de Valor} para esse problema para cada valor de $r \in \{-1000, -100, -3, 0\}$. Use um fator de desconto $\gamma = 0, 99$. Copie no relatório apenas o código do algoritmo (não é necessário entregar arquivos de código fonte). OBS: a implementação do algoritmo pode ser feita em qualquer linguagem.
	\item Apresente a política obtida para cada valor de $r$ em um mapa de ações em um grid de 3x3 (conforme exemplo da Figura~\ref{fig:example}) e explique intuitivamente o porquê de cada valor de $r$ implicar em cada política.
	\end{itemize}
	\begin{figure}[h]
	\centering
	\includegraphics[height=3cm]{example}
	\caption{\footnotesize{Exemplo de mapa de ações para um \emph{grid} 3x4}}
	\label{fig:example}
	\end{figure}

	\subsection{Solução a}
	A implementação do algoritmo se mostra a continuação:
	\begin{lstlisting}
	def transition( state , R , v_previous ) :
	values = []
	for direction in actions :
	v = 0.0
	for A in probs[ direction ] :
	new_state = s_line( state , A )
	v += probs[ direction ][ A ] * ( R[ new_state ] + GAMMA * v_previous[ new_state ] )
	values.append( v )
	return values

	def policy( state , V_opt ) :
	max_val = -1e6
	poli = None
	for direction in actions :
	v = 0.0
	for A in probs[ direction ] :
	new_state = s_line( state , A )
	v += probs[ direction ][ A ] * V_opt[ new_state ]
	if v > max_val :
	max_val = v
	poli = direction
	return poli

	def iteration_value( fpath ) :
	R = read_values( fpath )
	states = R.keys()
	# Initialize
	V = dict( [ ( ( x , y ) , 0.0 ) for ( x , y ) in states ] )
	# Repeat until convergence
	currentLayer = copy( V )
	while True :
	nextLayer = {}
	for s in states :
	transition_values = transition( s , R , currentLayer )
	nextLayer[ s ] = max( transition_values )
	max_diff = max( [ fabs( nextLayer[ s ] - currentLayer[ s ] ) for s in states ] )
	if max_diff <= EPS : break
	currentLayer = copy( nextLayer )
	V = copy( currentLayer )
	# Compute policy
	P = {}
	for s in states : P[ s ] = policy( s , V )
	return P
	\end{lstlisting}
	onde $s\_line$ retorna o estado seguinte dada uma ação $A$, $GAMMA = 0.9$, $EPS = 1e-6$, $actions$ contem todas as ações que pode escolher o agente e $probs$ as probabilidades de ir numa direção dado que escolheu alguma ação.

	\subsection{Solução b}
	Para os tres primeiros valores de $r$, as políticas obtidas tentam não mover-se para a celda com valor $r$ porque estes são valores negativos e fazem que a utilidade esperada para ir até ela seja bastante menor comparada com outras ações, mas quando o valor deixa de ser negativo ($r = 0$) a melhor política obtém que é melhor passar por ela. Os gráficos mostram que para os primeiros casos ($r = -1000, -100 , -3$) sempre vai para a direita, ou seja ficando mais longe daquela celda, mas para o último valor ($r=0$), tenta ficar sempre mais perto dela.
	\begin{figure}[h]
	\centering
	\subfigure[Políticas usando $r = -1000$]{
	\centering
	\begin{tikzpicture}[ scale = 1.5 ]
	\draw[step=1cm,gray,very thin] (0,0) grid (3,3);
	\draw[thick,->] (0.25,2.5) -- (0.75,2.5);
	\draw[thick,->] (1.25,2.5) -- (1.75,2.5);
	\draw[thick,->] (2.5,2.25) -- (2.5,2.75);
	\draw[thick,->] (0.25,1.5) -- (0.75,1.5);
	\draw[thick,->] (1.25,1.5) -- (1.75,1.5);
	\draw[thick,->] (2.5,1.25) -- (2.5,1.75);
	\draw[thick,->] (0.25,0.5) -- (0.75,0.5);
	\draw[thick,->] (1.25,0.5) -- (1.75,0.5);
	\draw[thick,->] (2.5,0.25) -- (2.5,0.75);
	\end{tikzpicture}
	\label{fig:it1}
	}
	\subfigure[Políticas usando $r = -100$]{
	\centering
	\begin{tikzpicture}[ scale = 1.5 ]
	\draw[step=1cm,gray,very thin] (0,0) grid (3,3);
	\draw[thick,->] (0.25,2.5) -- (0.75,2.5);
	\draw[thick,->] (1.25,2.5) -- (1.75,2.5);
	\draw[thick,->] (2.5,2.25) -- (2.5,2.75);
	\draw[thick,->] (0.25,1.5) -- (0.75,1.5);
	\draw[thick,->] (1.25,1.5) -- (1.75,1.5);
	\draw[thick,->] (2.5,1.25) -- (2.5,1.75);
	\draw[thick,->] (0.25,0.5) -- (0.75,0.5);
	\draw[thick,->] (1.25,0.5) -- (1.75,0.5);
	\draw[thick,->] (2.5,0.25) -- (2.5,0.75);
	\end{tikzpicture}
	\label{fig:it2}
	}
	\centering
	\subfigure[Políticas usando $r = -3$]{
	\centering
	\begin{tikzpicture}[ scale = 1.5 ]
	\draw[step=1cm,gray,very thin] (0,0) grid (3,3);
	\draw[thick,->] (0.25,2.5) -- (0.75,2.5);
	\draw[thick,->] (1.25,2.5) -- (1.75,2.5);
	\draw[thick,->] (2.5,2.25) -- (2.5,2.75);
	\draw[thick,->] (0.25,1.5) -- (0.75,1.5);
	\draw[thick,->] (1.25,1.5) -- (1.75,1.5);
	\draw[thick,->] (2.5,1.25) -- (2.5,1.75);
	\draw[thick,->] (0.25,0.5) -- (0.75,0.5);
	\draw[thick,->] (1.25,0.5) -- (1.75,0.5);
	\draw[thick,->] (2.5,0.25) -- (2.5,0.75);
	\end{tikzpicture}
	\label{fig:it3}
	}
	\subfigure[Políticas usando $r = 0$]{
	\centering
	\begin{tikzpicture}[ scale = 1.5 ]
	\draw[step=1cm,gray,very thin] (0,0) grid (3,3);
	\draw[thick,->] (0.25,2.5) -- (0.75,2.5);
	\draw[thick,->] (1.25,2.5) -- (1.75,2.5);
	\draw[thick,->] (2.5,2.25) -- (2.5,2.75);
	\draw[thick,->] (0.5,1.25) -- (0.5,1.75);
	\draw[thick,->] (1.5,1.25) -- (1.5,1.75);
	\draw[thick,->] (2.5,1.25) -- (2.5,1.75);
	\draw[thick,->] (0.5,0.25) -- (0.5,0.75);
	\draw[thick,->] (1.5,0.25) -- (1.5,0.75);
	\draw[thick,->] (2.5,0.25) -- (2.5,0.75);
	\end{tikzpicture}
	\label{fig:it4}
	}
	\end{figure}
	\end{document}
	import sys
	from math import fabs
	from copy import deepcopy as copy

	GAMMA = 0.9
	EPS = 1e-6
	actions = [ 'Up' , 'Down' , 'Left' , 'Right' ]
	probs = {
	'Up' : { 'Up' : 0.8 , 'Left' : 0.1 , 'Right' : 0.1 } ,
	'Down' : { 'Down' : 0.8 , 'Left' : 0.1 , 'Right' : 0.1 } ,
	'Left' : { 'Up' : 0.1 , 'Left' : 0.8 , 'Down' : 0.1 } ,
	'Right' : { 'Up' : 0.1 , 'Down' : 0.1 , 'Right' : 0.8 }
	}

	def read_values( fpath ) :
	R = []
	with open( fpath , 'r' ) as f :
	for line in f :
	row = line[ :-1 ].split()
	row = [ float( x ) for x in row ]
	R.append( row )
	states = [ ( i , j ) for i in xrange( len( R ) ) for j in xrange( len( R[ i ] ) ) ]
	R = dict( [ ( ( x , y ) , R[ x ][ y ] ) for ( x , y ) in states ] )
	return R

	def s_line( state , action ) :
	x , y = state[ 0 ] , state[ 1 ]
	if action == 'Down' :
	x = ( x + 1 if x + 1 < 3 else x )
	elif action == 'Up' :
	x = ( x - 1 if x - 1 >= 0 else x )
	elif action == 'Left' :
	y = ( y - 1 if y - 1 >= 0 else y )
	elif action == 'Right' :
	y = ( y + 1 if y + 1 < 3 else y )
	return ( x , y )

	def transition( state , R , v_previous ) :
	values = []
	#print "Calculating values for V_i[ %s ] =================================" % ( state ,)
	for direction in actions :
	v = 0.0
	#print "ACTION = %s" % direction
	for A in probs[ direction ] :
	new_state = s_line( state , A )
	#print "P( %s , %s ) * ( R[ %s ] + %s * V_i-1[ %s ] ) = %s ( %s + %s * %s )" % ( new_state , A , new_state , GAMMA , new_state , probs[ direction ][ A ] , R[ new_state ] , GAMMA , v_previous[ new_state ] )
	v += probs[ direction ][ A ] * ( R[ new_state ] + GAMMA * v_previous[ new_state ] )
	values.append( v )
	return values

	def debug( l ) :
	print "NEXT"
	print "%10.3f %10.3f %10.3f" % ( l[ ( 0 , 0 ) ] , l[ ( 0 , 1 ) ] , l[ ( 0 , 2 ) ] )
	print "%10.3f %10.3f %10.3f" % ( l[ ( 1 , 0 ) ] , l[ ( 1 , 1 ) ] , l[ ( 1 , 2 ) ] )
	print "%10.3f %10.3f %10.3f" % ( l[ ( 2 , 0 ) ] , l[ ( 2 , 1 ) ] , l[ ( 2 , 2 ) ] )

	def policy( state , V_opt ) :
	max_val = -1e6
	poli = None
	for direction in actions :
	v = 0.0
	for A in probs[ direction ] :
	new_state = s_line( state , A )
	v += probs[ direction ][ A ] * V_opt[ new_state ]
	if v > max_val :
	max_val = v
	poli = direction
	return poli

	def iteration_value( fpath ) :
	R = read_values( fpath )
	states = R.keys()
	# Initialize
	V = dict( [ ( ( x , y ) , 0.0 ) for ( x , y ) in states ] )
	# Repeat until convergence
	currentLayer = copy( V )
	i = 0
	while True :
	#print "ITERATION %s" % i
	nextLayer = {}
	for s in states :
	transition_values = transition( s , R , currentLayer )
	nextLayer[ s ] = max( transition_values )
	#debug( nextLayer )
	max_diff = max( [ fabs( nextLayer[ s ] - currentLayer[ s ] ) for s in states ] )
	#print "DIFF = %s" % max( diff )
	if max_diff <= EPS : break
	currentLayer = copy( nextLayer )
	i += 1
	V = copy( currentLayer )
	debug( V )
	# Compute policy
	P = {}
	for s in states : P[ s ] = policy( s , V )
	return P

	def to_latex( P ) :
	for i in range( 3 ) : print "%s %s %s" % ( P[ ( i , 0 ) ] , P[ ( i , 1 ) ] , P[ ( i , 2 ) ] )
	for i in range( 3 ) :
	for j in range( 3 ) :
	#print P[ ( i , j ) ]
	if P[ ( i , j ) ] == 'Right' or P[ ( i , j ) ] == 'Left' :
	left = True if P[ ( i , j ) ] == 'Left' else False
	x_ini = j + ( 0.25 if not left else 0.75 )
	x_end = j + ( 0.75 if not left else 0.25 )
	y = 2 - i + 0.5
	print "\t\t\t\t\t\draw[thick,->] (%s,%s) -- (%s,%s);" % ( x_ini , y , x_end , y )
	else :
	down = True if P[ ( i , j ) ] == 'Down' else False
	y_ini = 2 - i + ( 0.25 if not left else 0.75 )
	y_end = 2 - i + ( 0.75 if not left else 0.25 )
	x = j + 0.5
	print "\t\t\t\t\t\draw[thick,->] (%s,%s) -- (%s,%s);" % ( x , y_ini , x , y_end )


	if __name__ == '__main__' :
	vfile = 'r1.txt'
	if len( sys.argv ) > 1 :
	vfile = sys.argv[ 1 ]
	P = iteration_value( vfile )
	#to_latex( P )