Capable Machine

Selected Papers

Yann LeCun et al., 1998, Efficient BackProp

By Xavier Glorot et al., 2011 Deep sparse rectifier neural networks

CrossValidated, 2015, A list of cost functions used in neural networks, alongside applications

Andrew Trask, 2015, A Neural Network in 13 lines of Python (Part 2 – Gradient Descent)

Michael Nielsen, 2015, Neural Networks and Deep Learning

Yann LeCun et al., 1998, Gradient-Based Learning Applied to Document Recognition

Jianxin Wu, 2017, Introduction to Convolutional Neural Networks

C.-C. Jay Kuo, 2016, Understanding Convolutional Neural Networks with A Mathematical Model

Kaiming He et al., 2015, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Dominik Scherer et al., 2010, Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition

Adit Deshpande, 2016, The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3)

Rob DiPietro, 2016, A Friendly Introduction to Cross-Entropy Loss

Peter Roelants, 2016, How to implement a neural network Intermezzo 2

Oscar Sharp & Benjamin, 2016, Sunspring

Sepp (Josef) Hochreiter, 1991, Untersuchungen zu dynamischen neuronalen Netzen

Yoshua Bengio, 1994, Learning Long-Term Dependencies with Gradient Descent is Difficult

Razvan Pascanu, 2013, On the difficulty of training recurrent neural networks

Sepp Hochreiter & Jurgen Schmidhuber, 1997, Long Short-Term Memory

Christopher Olah, 2015, Understanding LSTM Networks

Shi Yan, 2016, Understanding LSTM and its diagrams

Andrej Karpathy, 2015, The Unreasonable Effectiveness of Recurrent Neural Networks

Andrej Karpathy, 2015, Visualizing and Understanding Recurrent Networks

Klaus Greff, 2015, LSTM: A Search Space Odyssey

Xavier Glorot, 2011, Deep sparse rectifier neural networks

Recent Papers

Estimating Approximate Incentive Compatibility. With Tuomas Sandholm and Ellen Vitercik. ACM EC 2019.

Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization. With Travis Dick and Ellen Vitercik. FOCS 2018.

Learning to Branch. With Travis Dick, Tuomas Sandholm, and Ellen Vitercik. ICML 2018.

A General Theory of Sample Complexity for Multi-Item Profit Maximization. With Tuomas Sandholm and Ellen Vitercik. ACM EC 2018.

Submodular Functions: Learnability, Structure, and Optimization. With Nick Harvey. SIAM Journal of Computing 2018.

Earlier version titled Learning Submodular Functions in STOC 2011.
Also a NECTAR track paper at ECML-PKDD 2012 (for “significant machine learning results”).

Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems. With Vaishnavh Nagarajan, Ellen Vitercik, and Colin White. COLT 2017.

The Power of Localization for Efficiently Learning Linear Separators with Noise. With Pranjal Awasthi and Phil Long. Journal of the ACM 2017.
Earlier version in STOC 2014.

Data Driven Resource Allocation for Distributed Learning. With Travis Dick, Mu Li, Venkata Krishna Pillutla, Colin White, and Alex Smola. AISTATS 2017.

Label Efficient Learning by Exploiting Multi-class Output Codes. With Travis Dick and Yishay Mansour. AAAI 2017.

Clustering under Perturbation Resilience. With Yingyu Liang. SIAM Journal of Computing 2016.

Efficient Algorithms for Learning and 1-bit Compressed Sensing under Asymmetric Noise. With Pranjal Awasthi, Nika Haghtalab, and Hongyang Zhang. COLT 2016.

Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy. With Vitaly Feldman. Algorithmica 2015 (special issue, invited).
Earlier version in NIPS 2013.

Robust Hierarchical Clustering. With Yingyu Liang and Pramod Gupta. Journal of Machine Learning Research 2014.
Earlier version in COLT 2010.

Scalable Kernel Methods via Doubly Stochastic Gradients. With Bo Dai, Xie Dai, Niao He, Yingyu Liang, Anant Raj, and Le Song. NIPS 2014.

Influence Function Learning in Information Diffusion Networks. With Nan Du, Yingyu Liang, and Le Song. ICML 2014.

Active and passive learning of linear separators under log-concave distributions. With Phil Long. COLT 2013.

Finding Endogenously Formed Communities. With Christian Borgs, Mark Braverman, Jennifer Chayes, and Shang-Hua Teng. SODA 2013.

Older Papers

Clustering under Approximation Stability. With Avrim Blum and Anupam Gupta. Journal of the ACM 2013.
Earlier version Approximate Clustering without the Approximation in SODA 2009.

Learning Valuation Functions. With Florin Constantin, Satoru Iwata, and Lei Wang. COLT 2012.

Robust Interactive Learning. With Steve Hanneke. COLT 2012.

Distributed Learning, Communication Complexity, and Privacy. With Avrim Blum, Shie Fine, and Yishay Mansour. COLT 2012.

Active Clustering of Biological Sequences. With Heiko Roglin, ShangHua Teng, Konstantin Voevodski, and Yu Xia. Journal of Machine Learning Research 2012.
Earlier version in UAI 2010.

The True Sample Complexity of Active Learning. With Steve Hanneke and Jennifer Wortman. Machine Learning Journal 2010 (special issue, invited).
Earlier version in COLT 2008.

A Discriminative Model for Semi-Supervised Learning. With Avrim Blum. Journal of the ACM 2010.
Earlier version in COLT 2005.

Agnostic Active Learning. With Alina Beygelzimer and John Langford. Journal of Computer and System Sciences 2009 (special issue, invited).
Earlier version in ICML 2006.

A Discriminative Framework for Clustering via Similarity Functions. With Avrim Blum and Santosh Vempala. STOC 2008.
See also long version.

Reducing Mechanism Design to Algorithm Design via Machine Learning. With Avrim Blum, Jason Hartline, and Yishay Mansour. Journal of Computer and System Sciences 2008 (special issue, invited).
Earlier version in FOCS 2005.

On a Theory of Learning with Similarity Functions. With Avrim Blum. ICML 2006.

Note – Selection of this research papers are done from various websites.

Create a website or blog at