Neural Patches and Artificial Visual Perception

published:10/4/2019 (last updated 11/1/2019)


Commissioned from artist Tomashevskaya


The High Fidelity Sub-Type Classification

The objective of this project is to advance the breadth and accuracy of artificial visual perception and sub-type classification. Rather than focus on the refining and tweaking full neural network topologies (the standard approach currently taken in the field), we focus on using a large number of small neural patches that are trained for fine-grain features, which can be logically aggregated to identify details about what is being observed and deduce the sub-type with high accuracy.


We have developed a framework with which developers can train fine-grain neural patches on top of a proprietary highly-portable implementation of a base neural network. The neural patches can be instantly applied as plugins to a companion mobile app (iOS and Android) to test in realistic dynamic settings.


The tool for performing the patch training is a self-contained single installable click-button solution which does not require advanced knowledge of neural networks or access to a high-end graphics card (any decent discrete or integrated graphics with OpenCL support is sufficient). A plugin file is generated once the training is complete, which can be deployed to mobile devices. The mobile apps also have the ability to submit to the developer sample images of objects which the neural patches got wrong, to potentially augment their training dataset with.


How the Inference Works

The free mobile app can be accessed here for iOS and Android. Note that the Android version does require a device with OpenCL support - most recent Samsung, LG, Sony and Motorola devices do.


Tapping the camera button in the top right corner of the app enables live on-device object detection with the base neural network for the 1000 ImageNet categories. A progress bar at the bottom of the app screen indicates progress through the neural network layers, and at the end of each inference the main identified object’s name is indicated above the progress bar.


There is also a plugin button (with a plus sign) in the bottom right corner of the app. Tapping this button takes you to a pre-trained neural patch download page. Long pressing this button opens a file browser for plugging in locally stored neural patches. Patches can have an optional indicator image in their definition that is displayed to notify when the patch has been triggered at the end of a base network pass. A snap button in the bottom left corner of the app can be used to take pictures of objects that are incorrectly identified by a patch. Backing out of live camera mode takes you to the start screen where the taken image can be forwarded for dataset enhancement.

The mobile app layout


For a neural patch to be triggered (after the initial pass), its potential categories need to have at least one overlap with the base neural network’s categories. Patches can have categories defined in them only for this triggering purposes, which never get identified as a category of the patch. Such categories are not present in the outcome categories when training a neural patch, but are appended to the possible outcomes of the patch only to make the patch trigger on certain base categories. This is useful for identifying super-categories from among a group of base categories (e.g. identifying big from small dogs regardless of bread).



Example of a contained dog breed patch, and a spider type patch that extends the categories


How the Training Works

Traditional neural network training frameworks, where the focus is on flexibility in defining and experimenting with the details of the topology of the network, require scripting language experience (usually Python) and familiarity with numerous support packages. Our neural patch training framework however focuses on simple predefined topologies, and thus does not require the extensive flexibility of a scripting language. Instead the focus is on ease of use, to allow the neural patch developer to focus on data collection and quickly getting up and running, through a single installable. It has a simple graphical user interface, with only the essential parameters to tweak.


The only inputs that the developer really needs to provide are the paths to the training image data and a prior network snapshot to start from. For first run download and unzip this file and point to the .txt in the folder as the base network’s configuration (note: the .txt and .data files will need to be kept in same path when loading the base network).



There are fields to set the batch size, learning rate, momentum and decay rates. That’s it, with a button to start training. The main window of the application displays the images randomly being fed to the training pipeline in real time. A side window logs the progression of the training loss. The rand-tilt-shift checkmark will enable/disable image augmentation, with random tilts and color shifts. There’s also a stop-N-show checkmark to pause training at any point. This will also display the current inference outcome, and provide the option to save a snapshot of the neural patch and export the final distributable patch.


Specifically, setting the stop-N-show checkmark at any point during the training process will display the message box below, which will indicate the outcome of the last inference. It will also ask whether to save the current snapshot or not. If No is selected it will proceed to the next inference with same message box showing again. If Yes is selected the current full snapshot (which can be used for further training) and a patch snapshot (which can be used in mobile app) will be dumped in the location of the base network files (tagged with a time-based hash). Remove the stop-N-show checkmark in main window, and then select No in the message box, to proceed with training (watch out for the message box getting hidden behind the main window).



On starting the training process for a new neural patch, with a given base network snapshot, since the number of output neurons of the patch will likely not match the 1000 of the base network, the last layer will need reconfiguring to match the number of categories of the patch. The training tool will automatically do this after displaying a dialog box about. Selecting Yes will start the training process with the last layer reconfigured for the correct number of output categories.



The developer defines the number of output neurons of the patch simply by specifying the number of different categories in the map file of the training data. The next section looks at how the training data needs to be formatted.


Structuring the Training Image Data

The training images need to be structured in a folder with a map_clsloc.txt in its root. This file defines the categories of images in the folder that will be used for training, with their displayable names, and in an order that will determine the neural network output indexes assigned to each. As reference this is the base neural network’s map_clsloc.txt file. The content of the file consists of three columns, with each row corresponding to one image class. The first column indicates the class tags, followed by an index number, and the last column is a space-less name indicator of each class. The tags of the base network are what are referred to in ImageNet as their Synset. For consistency, it is good practice to stick with this convention for the class tags when adding new classes that are not in the base network, however it is not necessary or required. All that is required of tags is that they start with the character ‘n’ and be followed with a unique integer number. The names of the images of all different categories need to start with their identified tag followed by a ‘_’ (underscore) character. Any extra identification for individual images can follow that. They can all be placed in the root of this folder, but it is advised to place each category in a separate folder for manageability. The training tool will do recursive search through all folders. This is an example of a simple image data folder (for dog bread detection).


This is a walkthrough (no audio) of how to organize the data and train a neural patch from base network, for detecting if a shark is being observed from under water or above water.


Stay tuned form more details on the desktop training application as we flesh it out more, or contact us if interested in helping with beta testing.


contact: admin-at-our-domain



* Not a funding pitch.

We don’t use cookies or collect personal data. We do use Google Analytics though just for monitoring traffic.