2021_05 KAIST AI 603 ML Theory

Statistical learning theory (통계적 학습 이론), Computational learning theory (계산 학습 이론)

과목의 필요성 : Statistical Learning Theory >> Motivation (전상균, NAVER AI Lab Tech Lead)



Brief

Professor : Seyoung Yun, Graduate School of AI, KAIST

Syllabus : URL (Public) (cf. All courses offered by KAIST)

Textbook : UML, Understanding Machine Learning: From Theory to Algorithms, 2014, By Shai Shalev-Shwartz and Shai Ben-David (pdf, free)

Readings

not offered by professor

Similar course

유용한 한글 자료

useful sites in korean

Search query

Search Query for google on the above materials

( if you have a lot of results, remove the last 'OR site:github.io' )

위에 언급한 자료들로부터 검색하기

(
site:cs.cornell.edu/courses/cs6781/2020sp/ OR 
site:stat.cmu.edu/~ryantibs/statml/ OR 
site:stat.cmu.edu/~larry/=sml/ OR 
site:alex.smola.org/teaching/ OR 
site:stat.purdue.edu/~jianzhan/STAT598Y/ OR
site:parkgeonyeong.github.io OR 
site:sanghyukchun.github.io OR 
site:davidsilver.uk OR 
site:stanford.edu OR 
site:github.com/pilsung-kang OR
site:cs.huji.ac.il/~shais/UnderstandingMachineLearning/ OR
site:github.io
) AND shatter


Syllabus

WeekContentHWDetailChap, ReadingsMy Anki Card

1

Introduction





2

Concentration Inequalities

HW1

Markov Inequality
Chernoff Bound
Hoeffding’s Inequality
Bernstein Inequality
Azuma-Hoeffding inequality
McDiarmid’s Inequality
Talagrand Concentration Inequality

UML appendix B

PARK-CONC-VC


3

PAC learning, Learning via uniform convergence


Statistical Learning Framework (notation, setting)
- Empirical risk (training error)
- Generalization error

Sampling Complexity

ERM (Empirical Risk Minimization)

Learnability; PAC Learning (Learnability)
- ε - δ technique
- other learnability covered in other session

Finite Hypothesis
Union Bound

Agnostic PAC
- assumption = regularization

Uniform Convergence Theorem

UML chap 03, 04

CHUN-PAC-VC

3Q on COLT

RA, Realizability Assumption

PAC

Agnostic PAC

Uniform Convergence

4

Bias-Complexity tradeoff

HW2

NFL thm, No Free Lunch theorem
- No Universal Algorithm

Error Decomposition

VC-Dim : Learnable with infinite domain set

UML chap 05

Error Decompose

5

VC-Dimension


Learnable Classes

PAC learnable using ERM rule with n samples

Shattering

Growth Function
- shatter function

VC-Dimension

VC-Dimension of Some Set Systems

Sauer’s Lemma

ε-Net Theorem

ε-Sample and Error

PAC learning and VC-dimension

Uniform Convergence

UML chap 06

CHUN-PAC-VC

PARK-CONC-VC

Shattering

VC-Dimension

6

Nonuniform learnability


Nonuniform Learnability

Agnostic PAC learnability

SRM (Structural Risk Minimization)
- we can specify preferences over hypotheses

Weight Function and Union Bound

SRM Sample Complexity

Model Selection
- Occam’s Razor - simple is the best
- assign a higher weight to a simpler hypothesis

Model Selection and Validation Set

UML chap 07

YNGIE Shatter-VCdim-SRM

Nonuniform Learnability

SRM


7

The runtime of learning

HW3

Big O notation

Computational Complexity of Learning

Implementing the ERM rule

Efficient Search
Not Efficiently Learnable (NP-hard)
- ERM problem in the agnostic setting is NP-hard

Boolean conjunction

Learning 3-Term DNF

UML chap 08
8Midterm week of KAIST (no mid-term)



9

Rademacher Complexity


Rep() : Approximate Representativeness

R() : Rademacher Complexity; 
- two disjoint sets
- estimate the representativeness

R() and Rep()
- Rademacher complexity explains the representativeness

R() and Generalization Error

Generalization Bounds with the Confidence Parameter

Rademacher Calculus

Rademacher Complexity of Linear Classes

UML chap 26

PARK-RADAM

ε-representative

ε/2-representative

representativeness of S


데이터와의 “correlation” 개념에서 complexity를 measure한다. ‘어떤 임의의 랜덤한 binary data와 classifier의 prediction이 평균적으로 얼마나 maximally correlated 가능한지’가 rademacher complexity라 할 수 있다.


Generalization Bounds
with the Confidence Parameter


Massart Lemma

10

Online learning


Online Learning
- Batch Learning vs Online Learning

Online Classification in the Realizable Case (settings)

Adversary Game
- Learner vs. Adversary

Consistent Algorithm

Halving Algorithm

Online Learnability
- stattering tree
- Ldim, Littlestone dimension

Shattered Tree

VC-dim vs. Ldim

SOA, Standard Optimal Algorithm

Ldim and M_SOA

Online Learning in the Unrealizable Case

Randomization

Weighted Majority

Regret Bound

UML chap 21Regret

11

Online convex optimization

HW4

Online Convex Optimization
- Regret minimization

FTL, Follow-The-Leader
- Regret of FTL
- Failure of FTL

FoReL, Follow the Regularized Leader
- Gradient Descent and FoReL

OGD, Online Gradient Descent
- Regret of OGD

Bregman divergence
Mirror map
Mirror Descent

Online Mirror Descent
- Regret of OMD

UML chap 21

12-13

Multi Armed Bandit

HW5

Stochastic MAB

Bernoulli Bandit
Asymptotic Optimal Algorithm

Exploration vs. Exploitation

ε-greedy
- exploration with probability ε
- exploitation with probability 1 − ε

Optimistic Algorithm
- UCB, Upper Confidence Bound

AlphaZero MCTS

UCB Algorithm

KL Divergence

KL-UCB

Thompson Sampling

KL-LUCB
- Lower Upper Bound

Adversary Bandit
- Adversary Cases

EXP3
- Online Mirror Descent
- FoRel for Adversary Bandit
- Stability
- FoRel Regret

Regret Bounds

one armed bandit (cambridge dict, pic)

CORNELL2020-MAB

CHUN-MABㄱ


KLD

14

Kernel Method





15

Q&A

HW6




16

Final Exam





etc

suggestion on the class - participation with scribbing lectures (korean)