《深度学习》2021

李宏毅

1. 预测本频道观看人数

enter description here

改变不同wbc结果

enter description here

2. 机器学习任务攻略

enter description here

overfitting 解决办法：

增加训练集
不让模型有太大弹性，限制模型（）

3. 机器学习任务攻略

optimization issue 不是过拟合
过拟合——>train loss min 但是 test loss max
overfitting 模型弹性大
enter description here

Data augmentation：（增加数据集，避免过拟合）

左右翻转图片，镜像等等；
model 写限制比较小的神经元数目，早结束，丢弃，较少特征。

N-fold Cross Validation

分成不同分，然后做多次，每次tarin与val不一样
enter description here

Mismatch

训练资料与测速资料不一样

4. 局部最小值(Loacl Minima)与鞍点(Saddle point)

local minima 是最低，saddle point不是最低，还可以使loss降低

enter description here

critical piont 临界点，一阶导为0，所以只看后面红色部分。

三种情况

H是矩阵Haition

H的特征值小于零可以找更低loss点
enter description here

二维空间的 local minima 不一定是最低，可以看高纬度，说不定就是saddle point
local minima并没有那么常见，多数还是saddle point

5. 批次（batch）与动量（Momentum）

Batch

epod：计算loss不是所有资料，只会拿一个batch计算，然后再更新，所有batch计算之后叫一个epod
shuffle：每一个epod的batch不一样
使用batch——> 每一个batch 都会更新数据
不使用batch

不使用batch	使用batch
蓄力时间长，威力比较大	技能快，比较不准

平行计算，不使用batch所使用的时间不一定大
一个epoch大的batch花的时间反而是比较少的
enter description here
大的batch size 结果不行，小的batch size 结果反而更好。不是overfitting

可能的解释：full batch 没有办法更新参数，但是small batch可以更新参数，每次的loss都是有差异的

小的 batch 在testing 结果会更差：small batch 与large batch在training时相同准确率，但是在testing时 small batch准确率反而不好。如果small batch在testing 中效果不好——>overfitting

large batch——>走向小峡谷里面
small batch——>走向大盆地里面，方向多

large and small batch

Momentum

gradient的方向加上前次移动方向
enter description here

enter description here

6. 自动调整学习速率（Learning Rate）

某一个方向上gradient值很小，非常平坦——>learning rate 调大
某一个方向上gradient值很大，非常陡峭——>learning rate 调小

Adaptive

enter description here

RMSProp

enter description here

Adam：RMSProp + Momentum

Adaptive结果：
结果与Learing Rate Decaye改进

解决办法：

Learing Rate Decay，随着时间不断进行，随着参数不断update，让这个值越来越小
Warm Up，先变大参数，后变小

7. 损失函数（Loss）

Loss of Classification
enter description here

Minimizing cross-entropy is equivalent to maximizing likelihood
两个是一模一样的东西
cross-entropy 更加适合用在分类问题（ptorch会自动把soft max加在network的最后一层）

enter description here

8. 批次标准化（Batch Normalization）

batch比较大的时候才适合使用batch normalization

标准化，0到1的数值

在testing时，未知参数，使用平均值
enter description here

9. 卷积神经网络（ Convolutional Neural Networks）

Simplification 1—receptive filed

设置 receptive file ，每一个neural只关心自己 receptive file 里面发生的事情

receptive file自己定，可以长方形、叠加、只取红色等等。一般3*3且一组neural守备

stride：步长。
padding：超出部分补植

Simplification 2— parameter sharing

共用一个参数，不同receptive file 的数据使用同一个neural

convolutional layer=receptive filed + parameter sharing

Pooling

Max Pooling 选择一个最大的那个，自己决定矩阵大小

相当于减小图像分辨率，Go playing 并没有使用pooling，下围棋，棋盘不适合使用pooling

CNN 不适合识别裁剪后的图像，所以要把识别图像剪切放大、旋转让它训练识别

10. 自注意力机制 (Self-attention) (上)

为了解决：输入是一个向量且大小会改变

Sequence Labeling

输入与输出是相同