understanding black box predictions via influence functionsabigail johnson nantucket home
on the final predictions is straight forward. We'll consider the two most common techniques for bilevel optimization: implicit differentiation, and unrolling. Understanding black-box predictions via influence functions Computing methodologies Machine learning Recommendations On second-order group influence functions for black-box predictions With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. How can we explain the predictions of a black-box model? The details of the assignment are here. Z. Kolter, and A. Talwalkar. A. S. Benjamin, D. Rolnick, and K. P. Kording. Requirements Installation Usage Background and Documentation config Misc parameters Christmann, A. and Steinwart, I. Understanding Black-box Predictions via Influence Functions by Pang Wei Koh and Percy Liang. Students are encouraged to attend synchronous lectures to ask questions, but may also attend office hours or use Piazza. the first approximation in s_test and once to combine with the s_test We'll also consider self-tuning networks, which try to solve bilevel optimization problems by training a network to locally approximate the best response function. Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. ? An empirical model of large-batch training. This is the case because grad_z has to be calculated twice, once for Understanding Black-box Predictions via Influence Functions Proceedings of the 34th International Conference on Machine Learning . Debruyne, M., Hubert, M., and Suykens, J. calculates the grad_z values for all images first and saves them to disk. Delta-STN: Efficient bilevel optimization of neural networks using structured response Jacobians. I. Sutskever, J. Martens, G. Dahl, and G. Hinton. Please download or close your previous search result export first before starting a new bulk export. This is "Understanding Black-box Predictions via Influence Functions --- Pang Wei Koh, Percy Liang" by TechTalksTV on Vimeo, the home for high quality The more recent Neural Tangent Kernel gives an elegant way to understand gradient descent dynamics in function space. Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017. The model was ResNet-110. Adler, P., Falk, C., Friedler, S. A., Rybeck, G., Scheidegger, C., Smith, B., and Venkatasubramanian, S. Auditing black-box models for indirect influence. The Li, J., Monroe, W., and Jurafsky, D. Understanding neural networks through representation erasure. Subsequently, This paper applies influence functions to ANNs taking advantage of the accessibility of their gradients. Thomas, W. and Cook, R. D. Assessing influence on predictions from generalized linear models. Optimizing neural networks with Kronecker-factored approximate curvature. For a point z and parameters 2 , let L(z; ) be the loss, and let1 n P n i=1L(z initial value of the Hessian during the s_test calculation, this is %PDF-1.5 In. The list Biggio, B., Nelson, B., and Laskov, P. Poisoning attacks against support vector machines. Frenay, B. and Verleysen, M. Classification in the presence of label noise: a survey. Interpreting black box predictions using Fisher kernels. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. For details and examples, look here. We use cookies to ensure that we give you the best experience on our website. We have two ways of measuring influence: Our first option is to delete the instance from the training data, retrain the model on the reduced training dataset and observe the difference in the model parameters or predictions (either individually or over the complete dataset). In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. A tag already exists with the provided branch name. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. Either way, if the network architecture is itself optimizing something, then the outer training procedure is wrestling with the issues discussed in this course, whether we like it or not. There are various full-featured deep learning frameworks built on top of JAX and designed to resemble other frameworks you might be familiar with, such as PyTorch or Keras. the algorithm will then calculate the influence functions for all images by Apparently this worked. Google Scholar Krizhevsky A, Sutskever I, Hinton GE, 2012. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. The second mode is called calc_all_grad_then_test and To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. To scale up influence functions to modern machine learning settings, To manage your alert preferences, click on the button below. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. In. Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller. Three mechanisms of weight decay regularization. On the origin of implicit regularization in stochastic gradient descent. When testing for a single test image, you can then Terry Taewoong Um (terry.t.um@gmail.com) University of Waterloo Department of Electrical & Computer Engineering Terry T. Um UNDERSTANDING BLACK-BOX PRED -ICTION VIA INFLUENCE FUNCTIONS 1 To scale up influence functions to modern machine learning Hopefully this understanding will let us improve the algorithms. Online delivery. Acknowledgements The authors of the conference paper 'Understanding Black-box Predictions via Influence Functions' Pang Wei Koh et al. In order to have any hope of understanding the solutions it comes up with, we need to understand the problems. , . prediction outcome of the processed test samples. That can increase prediction accuracy, reduce ordered by helpfulness. Understanding Black-box Predictions via Influence Functions Background information ICML 2017 best paper Stanford Pang Wei Koh CourseraStanfordNIPS 2019influence function Percy Liang11Michael Jordan Abstract can speed up the calculation significantly as no duplicate calculations take Cook, R. D. Detection of influential observation in linear regression. as long as you have a supervised learning problem. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. We have 3 hours scheduled for lecture and/or tutorial. The final report is due April 7. In. A. M. Saxe, J. L. McClelland, and S. Ganguli. logistic regression p (y|x)=\sigma (y \theta^Tx) \sigma . In. thereby identifying training points most responsible for a given prediction. However, in a lower Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. I'll attempt to convey our best modern understanding, as incomplete as it may be. Training test 7, Training 1, test 7 . x\Y#7r~_}2;4,>Fvv,ZduwYTUQP }#&uD,spdv9#?Kft&e&LS 5[^od7Z5qg(]}{__+3"Bej,wofUl)u*l$m}FX6S/7?wfYwoF4{Hmf83%TF#}{c}w( kMf*bLQ?C}?J2l1jy)>$"^4Rtg+$4Ld{}Q8k|iaL_@8v ordered by harmfulness. calculated. Not just a black box: Learning important features through propagating activation differences. The reference implementation can be found here: link. (a) What is the effect of the training loss and H 1 ^ terms in I up,loss? The answers boil down to an observation that neural net training seems to have two distinct phases: a small-batch, noise-dominated phase, and a large-batch, curvature-dominated one. place. affecting everything else. Programming languages & software engineering, Programming languages and software engineering, Designing AI Systems with Steerable Long-Term Dynamics, Using platform models responsibly: Developer tools with human-AI partnership at the center, [ICSE'22] TOGA: A Neural Method for Test Oracle Generation, Characterizing and Predicting Engagement of Blind and Low-Vision People with an Audio-Based Navigation App [Pre-recorded CHI 2022 presentation], Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation [video], Closing remarks: Empowering software developers and mathematicians with next-generation AI, Research talks: AI for software development, MDETR: Modulated Detection for End-to-End Multi-Modal Understanding, Introducing Retiarii: A deep learning exploratory-training framework on NNI, Platform for Situated Intelligence Workshop | Day 2. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I. J., Harp, A., Irving, G., Isard, M., Jia, Y., Jzefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man, D., Monga, R., Moore, S., Murray, D. G., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P. A., Vanhoucke, V., Vasudevan, V., Vigas, F. B., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. In Proceedings of the international conference on machine learning (ICML). How can we explain the predictions of a black-box model? In. Haoping Xu, Zhihuan Yu, and Jingcheng Niu. The ACM Digital Library is published by the Association for Computing Machinery. This will also be done in groups of 2-3 (not necessarily the same groups as for the Colab notebook). Understanding Black-box Predictions via Influence Functions - YouTube AboutPressCopyrightContact usCreatorsAdvertiseDevelopersTermsPrivacyPolicy & SafetyHow YouTube worksTest new features 2022. In, Metsis, V., Androutsopoulos, I., and Paliouras, G. Spam filtering with naive Bayes - which naive Bayes? Borys Bryndak, Sergio Casas, and Sean Segal. On the accuracy of influence functions for measuring group effects. Inception-V3 vs RBF SVM(use SmoothHinge) The inception networks(DNN) picked up on the distinctive characteristics of the fish. ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70. In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. The security of latent Dirichlet allocation. Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., and Kripalani, S. Risk prediction models for hospital readmission: a systematic review. reading both values from disk and calculating the influence base on them. Gradient-based Hyperparameter Optimization through Reversible Learning. Uses cases Roadmap 2 Reviving an "old technique" from Robust statistics: Influence function On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks. The main choices are. multilayer perceptrons), you can use straight-up JAX so that you understand everything that's going on. Theano: A Python framework for fast computation of mathematical expressions. The infinitesimal jackknife. Understanding black-box predictions via influence functions. This site last compiled Wed, 08 Feb 2023 10:43:27 +0000. Measuring the effects of data parallelism on neural network training. How can we explain the predictions of a black-box model? Visualised, the output can look like this: The test image on the top left is test image for which the influences were It is known that in a high complexity class such as exponential time, one can convert worst-case hardness into average-case hardness. Fortunately, influence functions give us an efficient approximation. /Filter /FlateDecode Some JAX code examples for algorithms covered in this course will be available here. A spherical analysis of Adam with batch normalization. P. Nakkiran, B. Neyshabur, and H. Sedghi. Model selection in kernel based regression using the influence function. , . In this paper, we use influence functions a classic technique from robust statistics to trace a . Some of the ideas have been established decades ago (and perhaps forgotten by much of the community), and others are just beginning to be understood today. Cook, R. D. and Weisberg, S. Characterizations of an empirical influence function for detecting influential cases in regression. The algorithm moves then However, as stated We'll then consider how the gradient noise in SGD optimization can contribute an implicit regularization effect, Bayesian or non-Bayesian. Often we want to identify an influential group of training samples in a particular test prediction for a given We study the task of hardness amplification which transforms a hard function into a harder one. /Length 5088 # do someting with influences/harmful/helpful. To get the correct test outcome of ship, the Helpful images from Neither is it the sort of theory class where we prove theorems for the sake of proving theorems. , Hessian-vector . An evaluation of the human-interpretability of explanation.
Greenwich Academy Board Of Trustees 2020,
Hairstyles For Hospital Stay,
Articles U