Talks and materials

Veronika Cheplygina

In this talk I will discuss some different strategies in limited labeled data scenarios, in particular focusing on examples from medical imaging. These strategies try to use other data, labels and/or assumptions to improve generalization. I will in particular discuss transfer learning with related datasets and/or labels, and crowdsourcing additional labels from non-expert annotators. Finally I will also discuss some general considerations when choosing/working with datasets.

Dima Damen

Title: From Holistic to Fine-Grained Video Understanding

Abstract: With the success of images, the community has now moved to tackling the challenge of video – a challenge in additional redundancy, significant increase in input size and ill-posed problems. In this talk, I’ll introduce the current SOTA techniques and problems in action recognition, detection, anticipation and retrieval, showcasing current limitations and future directions.

Gaël Varoquaux

Title: The forgotten practicalities of machine learning: dirty data and model evaluation

This course will cover important practical “details” of machine learning often overlooked, focusing here on dirty data and model evaluation.

Cleaning the data to analyze it is often reported as the number one hassle of data scientists. I will survey what “dirtyness” forces time-consuming data cleaning or curation. I will then cover two specific aspects of dirty data: non-normalized entries and missing values. I will show how, for these two problems, machine-learning practice can be adapted to work directly on a data table without curation. The normalization problem can be tackled by adapting methods from natural language processing. The missing-values problem will lead us to revisit classic statistical results in the setting of supervised learning.

Model evaluation is, in my opinion, the most overlooked step of the machine-learning pipeline. Reliably estimating a model’s performance for a given purpose is crucial and difficult. I will first discuss choosing metric informative for the application, stressing the importance of the class prevalence in classification settings. I will then discussing procedures to estimate the generalization performance, drawing a distinction between evaluating a learning procedure or a prediction rule, and discussing how to give confidence intervals to the performance estimates.

Serge Belongie

Title: Representation Learning for Narratives in Social Media

Abstract: While advances in automated fact-checking are critical in the fight against the spread of misinformation in social media, we argue that more attention is needed in the domain of unfalsifiable claims. In this talk, we outline some promising directions for identifying the prevailing narratives in shared content (image & text) and explore how the associated learned representations can be used to identify misinformation campaigns and sources of polarization.

Jens Petersen

Title: Introduction to human-in-the-loop and active learning

Abstract: Machine learning and in particular deep learning methods often need enormous amounts of human labeled data to learn to solve tasks acceptably. Providing such data can be costly and in some cases impossible. Interactive machine learning is a research field that deals with machine learning methods that interact with agents, such as humans and other machines. Human-in-the-loop learning can be seen as a subset of this field, in which the agents are human. Active learning, in addition to this, implies that the learner is actively engaging in the learning process, such as by selecting which input samples should be used for training. All three paradigms are often discussed in relation to improving the learning behavior and reducing the need for human labeled samples. In this presentation, I plan to introduce basic concepts from these fields with a particular focus on applications in image segmentation and ways of reducing the time spent labeling by providing fewer but more informative labels.

Oliver Hulme

Title: Reward learning via dopamine

Abstract: I will give a brief primer on the role of dopamine in the learning of rewards. I will chart the history of the connection between reinforcement learning and dopaminergic signaling. I will then fast forward to today’s frontier where ideas of distributional reinforcement learning is offering insight into the long-observed heterogeneity of dopamine signals. Finally, I will outline a generative modelling approach that may help interrogate these ideas in the human brain.

Dimitrios Papadopoulos

Abstract: Training a visual recognition system, such as an object class detector or an instance segmentation model, requires a large set of training images with manually annotated objects. Obtaining such data requires human annotation, which is tedious and time consuming. In this talk, I will first present alternative and efficient human annotation schemes for training such systems that reduce the annotation cost while still obtaining high quality models. I will also discuss useful human-in-the-loop and active learning strategies used when crowdsourcing data for computer vision.

Exercises

The exercises will be in form of a team challenge, where each team will get access to a set of training, validation and test data and a simple classification network. The goal is to use the topics of the summer school to achieve the best classification results using sparsely annotated and non-trivial data.
The challenge will be based on Python and PyTorch and we will divide the teams so all teams will have computers that have sufficient GPU resources.
Data, template code and further instruction will be available before the school.

Summer school on human-in-the-loop and learning with limited labels

15. – 19. August, 2022

Talks, materials and exercises

Talks and materials

Veronika Cheplygina

Related papers

Dima Damen

Related Papers

Gaël Varoquaux

Related papers

Serge Belongie

Related papers

Jens Petersen

Related papers

Oliver Hulme

Related papers

Dimitrios Papadopoulos

Related papers

Exercises