2021_05 KAIST AI 603 ML Theory
Statistical learning theory (통계적 학습 이론), Computational learning theory (계산 학습 이론)
과목의 필요성 : Statistical Learning Theory >> Motivation (전상균, NAVER AI Lab Tech Lead)
Brief
Professor : Seyoung Yun, Graduate School of AI, KAIST
Syllabus : URL (Public) (cf. All courses offered by KAIST)
Textbook : UML, Understanding Machine Learning: From Theory to Algorithms, 2014, By Shai Shalev-Shwartz and Shai Ben-David (pdf, free)
Readings
not offered by professor
Similar course
- CORNELL2020, Foundations of Modern Machine Learning, 2020 Cornell, Nika Haghtalab - 3~4 pages of LN(Lecture Note) per each class ⭐
- CMU2013, Machine Learning, 2013 Former CMU and Current AWS, Alexander J. Smola ( YouTube ) ⭐
- CMU2017, Statistical Machine Learning, 2017 CMU, Ryan Tibshirani
CMU2019, Statistical Methods for Machine Learning, 2019 CMU, Larry Wasserman
- PERDUE2009, Statistical Learning Theory, 2009 Perdue
유용한 한글 자료
useful sites in korean
- CHUN, sanghyukchun.github.io, 전상혁 NAVER AI Lab Tech Lead
- CHUN-PAC-VC, PAC Learning & Statistical Learning Theory : PAC with finite H class → infinite H class → shatter, VCdim
- CHUN-MAB, MAB : 4 algorithm - Gittins index, ε-greedy, UCB, Thompson Sampling
- PARK, parkgeonyeong.github.io, 박건영
- PARK-CONC-VC, Concentration Inequality → VC Bound : markov inequality (~1st momentum), chebyshev’s inequality (~2nd), chernoff bound (~MGF),
- PARK-RADAM, Rademacher Complexity
Search query
Search Query for google on the above materials
( if you have a lot of results, remove the last 'OR site:github.io' )
위에 언급한 자료들로부터 검색하기
( site:cs.cornell.edu/courses/cs6781/2020sp/ OR site:stat.cmu.edu/~ryantibs/statml/ OR site:stat.cmu.edu/~larry/=sml/ OR site:alex.smola.org/teaching/ OR site:stat.purdue.edu/~jianzhan/STAT598Y/ OR site:parkgeonyeong.github.io OR site:sanghyukchun.github.io OR site:davidsilver.uk OR site:stanford.edu OR site:github.com/pilsung-kang OR site:cs.huji.ac.il/~shais/UnderstandingMachineLearning/ OR site:github.io ) AND shatter
Syllabus
Week | Content | HW | Detail | Chap, Readings | My Anki Card |
---|---|---|---|---|---|
1 | Introduction | ||||
2 | Concentration Inequalities | HW1 | Markov Inequality Chernoff Bound Hoeffding’s Inequality Bernstein Inequality Azuma-Hoeffding inequality McDiarmid’s Inequality Talagrand Concentration Inequality | UML appendix B PARK-CONC-VC | |
3 | PAC learning, Learning via uniform convergence | Statistical Learning Framework (notation, setting) Sampling Complexity ERM (Empirical Risk Minimization) Learnability; PAC Learning (Learnability) Finite Hypothesis Agnostic PAC Uniform Convergence Theorem | UML chap 03, 04 CHUN-PAC-VC | Uniform Convergence | |
4 | Bias-Complexity tradeoff | HW2 | NFL thm, No Free Lunch theorem Error Decomposition VC-Dim : Learnable with infinite domain set | UML chap 05 | Error Decompose |
5 | VC-Dimension | Learnable Classes PAC learnable using ERM rule with n samples Shattering Growth Function VC-Dimension VC-Dimension of Some Set Systems Sauer’s Lemma ε-Net Theorem ε-Sample and Error PAC learning and VC-dimension Uniform Convergence | UML chap 06 CHUN-PAC-VC PARK-CONC-VC | Shattering VC-Dimension | |
6 | Nonuniform learnability | Nonuniform Learnability Agnostic PAC learnability SRM (Structural Risk Minimization) Weight Function and Union Bound SRM Sample Complexity Model Selection Model Selection and Validation Set | UML chap 07 YNGIE Shatter-VCdim-SRM | Nonuniform Learnability SRM | |
7 | The runtime of learning | HW3 | Big O notation Computational Complexity of Learning Implementing the ERM rule Efficient Search Boolean conjunction Learning 3-Term DNF | UML chap 08 | |
8 | Midterm week of KAIST (no mid-term) | ||||
9 | Rademacher Complexity | Rep() : Approximate Representativeness R() : Rademacher Complexity; R() and Rep() R() and Generalization Error Generalization Bounds with the Confidence Parameter Rademacher Calculus Rademacher Complexity of Linear Classes | UML chap 26 PARK-RADAM | ε-representative ε/2-representative representativeness of S 데이터와의 “correlation” 개념에서 complexity를 measure한다. ‘어떤 임의의 랜덤한 binary data와 classifier의 prediction이 평균적으로 얼마나 maximally correlated 가능한지’가 rademacher complexity라 할 수 있다. Generalization Bounds Massart Lemma | |
10 | Online learning | Online Learning Online Classification in the Realizable Case (settings) Adversary Game Consistent Algorithm Halving Algorithm Online Learnability Shattered Tree VC-dim vs. Ldim SOA, Standard Optimal Algorithm Ldim and M_SOA Online Learning in the Unrealizable Case Randomization Weighted Majority Regret Bound | UML chap 21 | Regret | |
11 | Online convex optimization | HW4 | Online Convex Optimization FTL, Follow-The-Leader FoReL, Follow the Regularized Leader OGD, Online Gradient Descent Bregman divergence Online Mirror Descent | UML chap 21 | |
12-13 | Multi Armed Bandit | HW5 | Stochastic MAB Bernoulli Bandit Exploration vs. Exploitation ε-greedy Optimistic Algorithm AlphaZero MCTS UCB Algorithm KL Divergence KL-UCB Thompson Sampling KL-LUCB Adversary Bandit EXP3 Regret Bounds | one armed bandit (cambridge dict, pic) CHUN-MABㄱ | |
14 | Kernel Method | ||||
15 | Q&A | HW6 | |||
16 | Final Exam |
etc
suggestion on the class - participation with scribbing lectures (korean)