Converting makesense.ai JSON labels to label (mask) imagery for an image segmentation project

October 14, 2020

Annotate images on makesense.ai

makesense.ai is pretty great and the tool I generally recommend for labeling images because it:

works well and has a well designed interface
is free and open source
requires no account or uploading of data.

Press 'Get started'

Load images (in my example, I am using two pictures of coins and other objects on sand). Select 'object detection' which will give you the point, line, box, and polygon toolsets

Create a list of labels (mine are 'coin', 'sand' and 'other')

Use the polygon tool to start delineating the scene, and select the label from the drop down list for each annotation

Image two (these sorts of scenes are tricky to label because sand is really a background class)

Actions > Export annotations

Export in VGG JSON format in the polygon category

This is what your JSON format file looks like

Let's read it into python and convert it into a label mask

Create label images

Load the libraries we need

import json, os, glob
from PIL import Image, ImageDraw
import numpy as np

Define a class dictionary that allows for mapping of class string names to integers. Avoid zero - that is usually reserved for null/background for a binary segmentation. 'Other' is different in this context (rulers, and other things in the scene)

class_dict = {'coin':1, 'sand':2, 'other':3}

Load the contents of the VGG JSON file downloaded from makesense.ai into the dictionary, all_labels

json_file = 'labels_my-project-name_2020-10-15-03-40-44.json'
all_labels = json.load(open(json_file))

The keys of the dictionary are the image filenames

print(all_labels.keys())

And these are the quantities defined for each image

rawfile = '20181223_133712.jpg'

print(all_labels[rawfile].keys())

This function will strip image coordinates (X and Y) of polygons, and associated class labels (L) from

def get_data(data):
    X = []; Y = []; L=[] #pre-allocate lists to fill in a for loop
    for k in data['regions']: #cycle through each polygon
        # get the x and y points from the dictionary
        X.append(data['regions'][k]['shape_attributes']['all_points_x'])
        Y.append(data['regions'][k]['shape_attributes']['all_points_y'])
        L.append(data['regions'][k]['region_attributes']['label'])
    return Y,X,L #image coordinates are flipped relative to json coordinates

Use it to extract the polygons from the first image:

X, Y, L = get_data(all_labels[rawfile])

Open an image to get its dimensions:

image = Image.open(rawfile)

nx, ny, nz = np.shape(image)

Next we need a function that will create a label image from the polygon vector data (coordinates and labels)

def get_mask(X, Y, nx, ny, L, class_dict):
    # get the dimensions of the image
    mask = np.zeros((nx,ny))

    for y,x,l in zip(X,Y,L):
        # the ImageDraw.Draw().polygon function we will use to create the mask
        # requires the x's and y's are interweaved, which is what the following
        # one-liner does
        polygon = np.vstack((x,y)).reshape((-1,),order='F').tolist()

        # create a mask image of the right size and infill according to the polygon
        if nx>ny:
           x,y = y,x
           img = Image.new('L', (nx, ny), 0)
        elif ny>nx:
           #x,y = y,x
           img = Image.new('L', (ny, nx), 0)
        else:
           img = Image.new('L', (nx, ny), 0)
        ImageDraw.Draw(img).polygon(polygon, outline=0, fill=1)
        # turn into a numpy array
        m = np.flipud(np.rot90(np.array(img)))
        try:
            mask[m==1] = class_dict[l]
        except:
            mask[m.T==1] = class_dict[l]  

    return mask

Apply it to get the label mask for the first image

mask = get_mask(X, Y, nx, ny, L, class_dict)

Next we'll define a function that we rescale our integer codes into 8-bit integer codes that span the full range. This 8-bit scaling will facilitate creation of label images that can be viewed using ordinary operating system image viewer software

def rescale(dat,mn,mx):
    '''
    rescales an input dat between mn and mx
    '''
    m = min(dat.flatten())
    M = max(dat.flatten())
    return (mx-mn)*(dat-m)/(M-m)+mn

Rescale the mask and convert it into a greyscale Image object, then save to file

mask = Image.fromarray(rescale(mask,0,255)).convert('L')

mask.save(rawfile.replace('imagery','labels'), format='PNG')

Loop through several files at once:

for rawfile in all_labels.keys():

    X, Y, L = get_data(all_labels[rawfile])

    image = Image.open(rawfile)

    nx, ny, nz = np.shape(image)

    mask = get_mask(X, Y, nx, ny, L, class_dict)

    mask = Image.fromarray(rescale(mask,0,255)).convert('L')

    mask.save(rawfile.replace('.jpg','_label.jpg'), format='PNG')

Here are the label images:

Clearly, I'm a careless labeller. How could you make these labels better? Read on ...

Refine label images with a CRF

A CRF is a model that we will introduce and use in Week 3 and is useful for pre-processing manual labels, such as here, or post-processing model estimates.

It works by examining the label in each pixel of the label image, and assessing the likelihood of it, given the distribution of image values that it observes in the same and other classes in the scene. It is a probabilistic assessment based on both image features that it extracts, append

A CRF is not a deep learning model, or a neural network at all, but it is a network-based (or so-called graphical model). You can read more about it in this paper, where it was used as a post-processing rather than a pre-processing step.

These are the extra python libraries we need (within the mlmondays conda environment)

import pydensecrf.densecrf as dcrf
from pydensecrf.utils import create_pairwise_bilateral, unary_from_labels

Next we define a function that will use the CRF to process the label with respect to the image, and provide a new refined label

def crf_refine(label, img):
    """
    "crf_refine(label, img)"
    This function refines a label image based on an input label image and the associated image
    Uses a conditional random field algorithm using spatial and image features
    INPUTS:
        * label [ndarray]: label image 2D matrix of integers
        * image [ndarray]: image 3D matrix of integers
    OPTIONAL INPUTS: None
    GLOBAL INPUTS: None
    OUTPUTS: label [ndarray]: label image 2D matrix of integers
    """
    H = label.shape[0]
    W = label.shape[1]
    U = unary_from_labels(label,1+len(np.unique(label)),gt_prob=0.51)
    d = dcrf.DenseCRF2D(H, W, 1+len(np.unique(label)))
    d.setUnaryEnergy(U)

    # to add the color-independent term, where features are the locations only:
    d.addPairwiseGaussian(sxy=(3, 3),
                 compat=3,
                 kernel=dcrf.DIAG_KERNEL,
                 normalization=dcrf.NORMALIZE_SYMMETRIC)
    feats = create_pairwise_bilateral(
                          sdims=(100, 100),
                          schan=(2,2,2),
                          img=img,
                          chdim=2)

    d.addPairwiseEnergy(feats, compat=120,kernel=dcrf.DIAG_KERNEL,normalization=dcrf.NORMALIZE_SYMMETRIC)
    Q = d.inference(10)
    return np.argmax(Q, axis=0).reshape((H, W)).astype(np.uint8)

Now we modify the get_mask function from before with the post-processing step


def get_mask_crf(X, Y, nx, ny, L, class_dict, image):
    # get the dimensions of the image
    mask = np.zeros((nx,ny))

    for y,x,l in zip(X,Y,L):
        # the ImageDraw.Draw().polygon function we will use to create the mask
        # requires the x's and y's are interweaved, which is what the following
        # one-liner does
        polygon = np.vstack((x,y)).reshape((-1,),order='F').tolist()

        # create a mask image of the right size and infill according to the polygon
        if nx>ny:
           x,y = y,x
           img = Image.new('L', (nx, ny), 0)
        elif ny>nx:
           #x,y = y,x
           img = Image.new('L', (ny, nx), 0)
        else:
           img = Image.new('L', (nx, ny), 0)
        ImageDraw.Draw(img).polygon(polygon, outline=0, fill=1)
        # turn into a numpy array
        m = np.flipud(np.rot90(np.array(img)))
        try:
            mask[m==1] = class_dict[l]
        except:
            mask[m.T==1] = class_dict[l]

    mask = crf_refine(np.array(mask, dtype=np.int), np.array(image, dtype=np.uint8))

    return mask

And use a similar loop as before to apply this CRF processing

for rawfile in all_labels.keys():

    X, Y, L = get_data(all_labels[rawfile])

    image = Image.open(rawfile)

    nx, ny, nz = np.shape(image)

    mask = get_mask_crf(X, Y, nx, ny, L, class_dict, image)

    mask = Image.fromarray(rescale(mask/255,0,255)).convert('L')

    mask.save(rawfile.replace('.jpg','_label_crf.jpg'), format='PNG')

Here are the CRF-refined label images. Now there is no black (0) background class. The black (0) class is class 1; class 2 is 127; and class 3 is 255.