Part 2/3: Foundations

Fundamentals to Uncertainty & Robustness Methods

Neural Networks with SGD

Nearly all models find a single set of parameters to maximize the probability conditioned on data. Special case: softmax cross-entropy with L2 regularization. Optimize with SGD!

How do we get uncertainty?

  • Probabilistic approach: Estimate a full distribution for p(θ|x, y).
  • Intuitive approach: Ensembling. Obtain multiple good settings for θ*.

Probabilistic Machine Learning

Model: A probabilistic model is a joint distribution of outputs y and parameters θ given inputs x.

Part 1/3: Why uncertainty & robustness

Why uncertainty & robustness

What do we mean by Uncertainty?

Return a distribution over predictions rather than a single prediction.

  • Classification: Output label along with its confidence.
  • Regression: Output mean along with its variance.

Good uncertainty estimates quantify when we can trust the model’s predictions.

What do we mean by Out-of-Distribution Robustness?

I.I.D (Independent and Identically Distributed): In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent.

O.O.D. (Out-of-Distribution):

  • Covariate shift. Distribution of features p(x) changes and p(y|x) is fixed.
  • Open-set recognition. New classes may appear at test time.
  • Label shift. Distribution of labels p(y)…

Part III: Privacy for Federated Learning and Analytics; Part IV: Open Problems and Other Topics

Part III: Privacy for Federated Learning and Analytics

ML on sensitive data: privacy vs. utility (?)

Make achieving high privacy and utility possible with less work.

What private information might an actor learn

Part 1: What is Federated Learning?

What is Federated Learning?

The definition proposed in Advances and Open Problems in Federated Learning:

Federated learning is a machine learning setting where multiple entities (clients) collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client’s raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective.

FL terminology

  • Clients: Compute nodes also holding local data, usually belonging to one entity: IoT devices, Mobile devices, Data silos, Data centers in different geographic regions, etc.
  • Server: Additional compute nodes that coordinate the FL process but…

Papers I read when I did a “Complete the Look” project

P-Companion: A Principled Framework for Diversified Complementary Product Recommendation (2020)

Figure 3: P-Companion model architecture for CPR.

Given one product, how to recommend complementary products of different types is the key problem we tackle in this work.

  • We first conduct an analysis to correct the inaccurate assumptions adopted by existing work to show that co-purchased products are not always complementary and further propose a new strategy to generate clean distant supervision labels for CPR modeling.
  • Moreover, to bridge the gap from existing work that CPR does not only need relevance modeling but also requires diversity to fulfill the whole purchase demand, we develop a deep learning framework, P-Companion, to explicitly model both relevance and diversity.
  • The whole…

This paper is an important reference for CornerNet, CenterNet, and CornerNet-Lite because the Hourglass network is used as the backbone network in all those three methods.


We introduce a novel “stacked hourglass” network design for predicting human pose. The network captures and consolidates information across all scales of the image. Like many convolutional approaches that produce pixel-wise outputs, the hourglass network pools down to a very low resolution, then upsamples and combines features across multiple resolutions. On the other hand, the hourglass differs from prior designs primarily in its more symmetric topology.

We expand on a single hourglass by consecutively…


In the object detection era, one of the most popular methods is anchor-based, where a set of pre-defined boxes are regressed to the desired place with the help of ground-truth objects. However, these approaches often need a large number of anchors to ensure a sufficiently high IoU rate with the ground-truth objects, and the size and aspect ratio of each anchor box need to be manually designed. In addition, anchors are usually not aligned with the ground truth boxes, which is not conducive to the bounding box classification task.

To overcome the drawbacks of anchor-based approaches, a keypoint-based object detection…


CornerNet allows a simplified design that eliminates the need for anchor boxes and has achieved state-of-the-art accuracy on COCO among single-stage detectors.

However, a major drawback of CornerNet is its inference speed. It achieves an average precision (AP) of 42.2% on COCO at an inference cost of 1.1s per image, which is too slow for video applications that require real-time or interactive rates. This makes CornerNet less competitive with alternatives in terms of the accuracy-efficiency tradeoff.

We seek to improve the inference efficiency of CornerNet along with two orthogonal directions and introduce two efficient variants of CornerNet:

  • CornerNet-Saccade speeds up…

Jiangchun Li

Senior Data Scientist@ViSenze. If I don’t create, I don’t understand.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store