Pre-trained the CNN (AlexNet) on the ImageNet dataset using image-level annotations only (bounding-box labels are not available for this data).
To adapt the CNN to the new task (detection) and the new domain (warped proposal windows), the pre-trained CNN is…
Pedestrian detection, as an important object detection application, has received extensive attention in many areas such as autonomous driving, video surveillance, criminal investigation, etc. Some early time’s pedestrian detection methods laid a solid foundation for general object detection in terms of:
As the accuracy of a detector depends heavily on its feature extraction networks, we refer to the backbone networks, e.g. the ResNet and VGG, as the “engine” of a detector. Here we introduce some of the important detection engines in the deep learning era.
AlexNet, an eight-layer deep network, was the first CNN model that started the deep learning revolution in computer vision.
Among the different computational stages of an object detector, feature extraction usually dominates the amount of computation. For a sliding window based detector, the computational redundancy starts from both positions and scales, where the former one is caused by the overlap between adjacent windows, while the latter one is by the feature correlation between adjacent scales.
Feature map shared computation is to compute the feature map of the whole image only once before sliding window on it. The “image pyramid” of a traditional detector herein can be considered as a “feature pyramid”.
The early time’s object detection (before 2000) did not follow a unified detection philosophy. Detectors at that time were usually designed based on low-level and mid-level vision.
Some early researchers framed object detection as a measurement of similarity between the object components, shapes, and contours. Despite promising initial results, things did not work out well on more complicated detection problems.
Therefore, machine learning based detection methods were beginning to prosper. Machine learning based detection has gone through multiple periods, including the statistical models of appearance (before 1998), wavelet feature representations (1998–2005), and gradient-based representations (2005–2012).
Building statistical models of an…
Building larger datasets with less bias is critical for developing advanced computer vision algorithms. In object detection, a number of well-known datasets and bench- marks have been released in the past 10 years.
In the deep learning era, object detection can be grouped into two genres: “two-stage detection” and “one-stage detection”, where the former frames the detection as a “coarse-to-fine” process while the latter frames it as to “complete in one step”.
Original paper: You Only Look Once: Unified, Real-Time Object Detection
We reframe object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. Using our system, you only look once (YOLO) at an image to predict what objects are present and where they are.
In 2012, the world saw the rebirth of convolutional neural networks. As a deep convolutional network is able to learn robust and high-level feature representations of an image, a natural question is whether we can bring it to object detection? R. Girshick et al. took the lead to break the deadlocks in 2014 by proposing the Regions with CNN features (RCNN) for object detection. Since then, object detection started to evolve at an unprecedented speed.
Not all debt is bad, but all debt needs to be serviced.
Developing and deploying ML systems is relatively fast and cheap, but maintaining them over time is difficult and expensive. This dichotomy can be understood through the lens of technical debt, a metaphor introduced by Ward Cunningham in 1992 to help reason about the long-term costs incurred by moving quickly in software engineering.
Technical debt may be paid down by refactoring code, improving unit tests, deleting dead code, reducing dependencies, tightening APIs, and improving documentation. The goal is not to add new functionality, but to enable future improvements, reduce…
Senior Data Scientist@ViSenze. If I don’t create, I don’t understand.