## Neural Networks and the Simplist XOR Problem
- This was adopted from the PyTorch Tutorials. 
- Simple supervised machine learning.
- http://pytorch.org/tutorials/beginner/pytorch_with_examples.html

## Neural Networks 
- Neural networks are the foundation of deep learning, which has revolutionized the 

```In the mathematical theory of artificial neural networks, the universal approximation theorem states[1] that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function.```

- A simple task that Neural Networks can do but simple linear models cannot is called the [XOR problem](https://medium.com/@jayeshbahire/the-xor-problem-in-neural-networks-50006411840b).

- The XOR problem involves an output being 1 if either of two inputs is 1, but not both. 

### Generate Fake Data
- `D_in` is the number of dimensions of an input varaible.
- `D_out` is the number of dimentions of an output variable.
- Here we are learning some special "fake" data that represents the xor problem. 
- Here, the dv is 1 if either the first or second variable is 


In [27]:
# -*- coding: utf-8 -*-
import numpy as np

#This is our independent and dependent variables. 
x = np.array([ [0,0,0],[1,0,0],[0,1,0],[0,0,0] ])
y = np.array([[0,1,1,0]]).T
print("Input data:\n",x,"\n Output data:\n",y)

Input data:
 [[0 0 0]
 [1 0 0]
 [0 1 0]
 [0 0 0]] 
 Output data:
 [[0]
 [1]
 [1]
 [0]]


### A Simple Neural Network 
- Here we are going to build a neural network. 
- First layer (`D_in`)has to be the length of the input.
- `H` is the length of the output.
-  `D_out` is 1 as it will be the probability it is a 1.

In [0]:
np.random.seed(seed=83832)
#D_in is the number of input variables. 
#H is the hidden dimension.
#D_out is the number of dimensions for the output. 
D_in, H, D_out = 3, 2, 1

# Randomly initialize weights og out 2 hidden layer network.
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
bias = np.random.randn(H, 1)

### But "Hidden Layers" Aren't Hidden
- Let's take a look 
- These are just random numbers.

In [29]:
print(w1, w2)

[[-0.20401151  0.62388689]
 [-0.10186284  1.47372825]
 [ 1.07856887  0.01873049]] [[ 0.49346731]
 [-1.19376828]]


### Update the Weights using Gradient Decent
- Calculate the predited value
- Calculate the loss function
- Compute the gradients of w1 and w2 with respect to the loss function
- Update the weights using the learning rate 

In [30]:
learning_rate = .01
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)

    #A relu is just the activation.
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 10.65792615907139
1 9.10203339892777
2 7.928225580610054
3 7.016030709608875
4 6.289798199184453
5 5.699847385692147
6 5.2123305302347624
7 4.803466247932402
8 4.456102755004962
9 4.1575876890269665
10 3.898402733982808
11 3.671262676836925
12 3.4705056296083194
13 3.291670966818706
14 3.1312013137273507
15 2.9862283397788603
16 2.854416299096229
17 2.733846078586037
18 2.622928124188624
19 2.5203362600714687
20 2.4249568284296723
21 2.335849203166264
22 2.2522148435722413
23 2.173372827242625
24 2.0987403459205147
25 2.0278170362586616
26 1.9601722976944669
27 1.8954349540796849
28 1.8332847664299674
29 1.773445416375481
30 1.7156786642283006
31 1.6597794495384952
32 1.6055717509418743
33 1.5529050598636533
34 1.5016513520352168
35 1.4517024638575724
36 1.402967798918812
37 1.3553723045677533
38 1.3088546702007282
39 1.2633657084618402
40 1.2188668883615361
41 1.175328995740111
42 1.1327309018102432
43 1.0910584249086992
44 1.0503032742251954
45 1.0104620672725408
46 0.9715354153057

Fully connected 

### Verify the Predictions 
- Obtained a predicted value from our model and compare to origional. 

In [31]:
pred = np.maximum(x.dot(w1),0).dot(w2)

print (pred, "\n", y)

[[0.        ]
 [0.99992661]
 [1.00007337]
 [0.        ]] 
 [[0]
 [1]
 [1]
 [0]]


In [32]:
y


array([[0],
       [1],
       [1],
       [0]])

In [33]:
#We can see that the weights have been updated. 
w1

array([[-0.20401151,  1.01377406],
       [-0.10186284,  1.01392285],
       [ 1.07856887,  0.01873049]])

In [34]:

w2

array([[0.49346731],
       [0.98634069]])

In [35]:
# Relu just removes the negative numbers.  
h_relu

array([[0.        , 0.        ],
       [0.        , 1.01377258],
       [0.        , 1.01392433],
       [0.        , 0.        ]])