#forward propagation indicates from input layers to output layers calculating and storaging results in each layer.
#we all know backward propagation is to calculate derivatives of each W(W1, W2), so forward propagation is just #forward calculate, note: it is not derivatives, it is just XW, then ReLU(XW), then go to the next layer and get the output.
4.8 numerical stability and model initializing
1 2
#previously we assign a distribution to initialize parameters like W and b, but initializing scheme is very important. #It is significant to keep numerical stability. A terrible choice may cause gradient explosion or gradient disappearance.
4.8.1 gradient disappearance and gradient explosion
#我们来生成100个高斯随机矩阵,并将他们与某个初始矩阵相乘。 M = torch.normal(0, 1, size=(4, 4))#因为方差取的很大,所以会越乘越大。 print('一个矩阵 \n', M) for i inrange(100): M = torch.mm(M, torch.normal(0, 1, size=(4, 4)))#也就是将其乘100次 print('乘以类似的100个矩阵后: \n', M)