.. _tutorial:

MUSCIMA++ Tutorial
==================

This is a tutorial for using the ``muscima`` package to work with the
MUSCIMA++ dataset.

We assume you have already gone through the README and downloaded the
dataset. Let's load it.

.. code:: python

    import os
    from muscima.io import parse_cropobject_list
    
    # Change this to reflect wherever your MUSCIMA++ data lives
    CROPOBJECT_DIR = os.path.join(os.environ['HOME'], 'data/MUSCIMA++/v0.9/data/cropobjects')
    
    cropobject_fnames = [os.path.join(CROPOBJECT_DIR, f) for f in os.listdir(CROPOBJECT_DIR)]
    docs = [parse_cropobject_list(f) for f in cropobject_fnames]

Let's do something straightforward: symbol classification.

Symbol Classification
---------------------

Let's try to tell apart quarter notes from half notes.

However, notes are recorded as individual primitives in MUSCIMA++, so we
need to extract notehead-stem pairs from the data using their
relationships. Quarter notes are all ``full-notehead``-``stem`` pairs
with no beam or flag. Half-notes are all ``empty-notehead``-``stem``
pairs.

After we extract the note classes, we will need to compute features for
classification. To do that, we first need to "draw" the symbols in the
appropriate relative positions. Then, we can extract whatever features
we need.

Finally, we train a classifier and evaluate it.

Extracting notes
~~~~~~~~~~~~~~~~

.. code:: python

    # Bear in mind that the outlinks are integers, only valid within the same document.
    # Therefore, we define a function per-document, not per-dataset.
    
    def extract_notes_from_doc(cropobjects):
        """Finds all ``(full-notehead, stem)`` pairs that form
        quarter or half notes. Returns two lists of CropObject tuples:
        one for quarter notes, one of half notes.
        
        :returns: quarter_notes, half_notes
        """
        _cropobj_dict = {c.objid: c for c in cropobjects}
        
        notes = []
        for c in cropobjects:
            if (c.clsname == 'notehead-full') or (c.clsname == 'notehead-empty'):
                _has_stem = False
                _has_beam_or_flag = False            
                stem_obj = None
                for o in c.outlinks:
                    _o_obj = _cropobj_dict[o]
                    if _o_obj.clsname == 'stem':
                        _has_stem = True
                        stem_obj = _o_obj
                    elif _o_obj.clsname == 'beam':
                        _has_beam_or_flag = True
                    elif _o_obj.clsname.endswith('flag'):
                        _has_beam_or_flag = True
                if _has_stem and (not _has_beam_or_flag):
                    # We also need to check against quarter-note chords.
                    # Stems only have inlinks from noteheads, so checking
                    # for multiple inlinks will do the trick.
                    if len(stem_obj.inlinks) == 1:
                        notes.append((c, stem_obj))
        
        quarter_notes = [(n, s) for n, s in notes if n.clsname == 'notehead-full']
        half_notes = [(n, s) for n, s in notes if n.clsname == 'notehead-empty']
        return quarter_notes, half_notes
    
    qns_and_hns = [extract_notes_from_doc(cropobjects) for cropobjects in docs]

Now, we don't need the ``objid`` anymore, so we can lump the notes from
all 140 documents together.

.. code:: python

    import itertools
    qns = list(itertools.chain(*[qn for qn, hn in qns_and_hns]))
    hns = list(itertools.chain(*[hn for qn, hn in qns_and_hns]))
    
    len(qns), len(hns)


.. parsed-literal::

    (4320, 1181)


It seems that we have some 4320 isolated quarter notes and 1181 isolated
half-notes in our data. Let's create their images now.

Creating note images
~~~~~~~~~~~~~~~~~~~~

Each notehead and stem CropObject has its own mask and its bounding box
coordinates. We need to combine these two things, in order to create a
binary image of the note.

.. code:: python

    import numpy
    
    def get_image(cropobjects, margin=1):
        """Paste the cropobjects' mask onto a shared canvas.
        There will be a given margin of background on the edges."""
        
        # Get the bounding box into which all the objects fit
        top = min([c.top for c in cropobjects])
        left = min([c.left for c in cropobjects])
        bottom = max([c.bottom for c in cropobjects])
        right = max([c.right for c in cropobjects])
        
        # Create the canvas onto which the masks will be pasted
        height = bottom - top + 2 * margin
        width = right - left + 2 * margin
        canvas = numpy.zeros((height, width), dtype='uint8')
        
        for c in cropobjects:
            # Get coordinates of upper left corner of the CropObject
            # relative to the canvas
            _pt = c.top - top + margin
            _pl = c.left - left + margin
            # We have to add the mask, so as not to overwrite
            # previous nonzeros when symbol bounding boxes overlap.
            canvas[_pt:_pt+c.height, _pl:_pl+c.width] += c.mask
            
        canvas[canvas > 0] = 1
        return canvas
    
    qn_images = [get_image(qn) for qn in qns]
    hn_images = [get_image(hn) for hn in hns]

Let's visualize some of these notes, to check whether everything worked.
(For this, we assume you have matplotlib. If not, you can skip this
step.)

.. code:: python

    import matplotlib.pyplot as plt
    
    def show_mask(mask):
        plt.imshow(mask, cmap='gray', interpolation='nearest')
        plt.show()
    
    def show_masks(masks, row_length=5):
        n_masks = len(masks)
        n_rows = n_masks // row_length + 1
        n_cols = min(n_masks, row_length)
        fig = plt.figure()
        for i, mask in enumerate(masks):
            plt.subplot(n_rows, n_cols, i+1)
            plt.imshow(mask, cmap='gray', interpolation='nearest')
        # Let's remove the axis labels, they clutter the image.
        for ax in fig.axes:
            ax.set_yticklabels([])
            ax.set_xticklabels([])
            ax.set_yticks([])
            ax.set_xticks([])
        plt.show()
        

.. code:: python

    show_masks(qn_images[:25])
    show_masks(hn_images[:25])


.. image:: output_12_0.png


.. image:: output_12_1.png


It seems that the extraction went all right.

Feature Extraction
~~~~~~~~~~~~~~~~~~

Now, we need to somehow turn the note images into classifier inputs.

Let's get some inspiration from the setup of the HOMUS dataset. In their
baseline classification experiments, the authors just resized their
images to 20x20. For notes, however, this may not be such a good idea,
because it will make them too short. Let's instead resize to 40x10.

.. code:: python

    from skimage.transform import resize
    
    qn_resized = [resize(qn, (40, 10)) for qn in qn_images]
    hn_resized = [resize(hn, (40, 10)) for hn in hn_images]
    
    # And re-binarize, to compensate for interpolation effects
    for qn in qn_resized:
        qn[qn > 0] = 1
    for hn in hn_resized:
        hn[hn > 0] = 1

How do the resized notes look?

.. code:: python

    show_masks(qn_resized[:25])
    show_masks(hn_resized[-25:])


.. image:: output_17_0.png


.. image:: output_17_1.png


Classification
~~~~~~~~~~~~~~

We now need to add the output labels and make a train-dev-test split out
of this.

Let's make a balanced dataset, to keep things simpler.

.. code:: python

    # Randomly pick an equal number of quarter-notes.
    n_hn = len(hn_resized)
    import random
    random.shuffle(qn_resized)
    qn_selected = qn_resized[:n_hn]

Now, create the output labels and merge the data into one dataset.

.. code:: python

    
    Q_LABEL = 1
    H_LABEL = 0
    
    qn_labels = [Q_LABEL for _ in qn_selected]
    hn_labels = [H_LABEL for _ in hn_resized]
    
    
    notes = qn_selected + hn_resized
    # Flatten data
    notes_flattened = [n.flatten() for n in notes]
    labels = qn_labels + hn_labels

Let's use the ``sklearn`` package for experimental setup. Normally, we
would do cross-validation on data of this small size, but for the
purposes of the tutorial, we will stick to just one train/test split.

.. code:: python

    from sklearn.model_selection import train_test_split
    
    X_train, X_test, y_train, y_test = train_test_split(
        notes_flattened, labels, test_size=0.25, random_state=42,
        stratify=labels)


What could we use to classify this data? Perhaps a k-NN classifier might
work.

.. code:: python

    from sklearn.neighbors import KNeighborsClassifier
    
    K=5
    
    # Trying the defaults first.
    clf = KNeighborsClassifier(n_neighbors=K)
    clf.fit(X_train, y_train)


.. parsed-literal::

    KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
               metric_params=None, n_jobs=1, n_neighbors=5, p=2,
               weights='uniform')


Let's run the classifier now and evaluate the results.

.. code:: python

    y_test_pred = clf.predict(X_test)

.. code:: python

    from sklearn.metrics import classification_report
    print(classification_report(y_test, y_test_pred, target_names=['half', 'quarter']))


.. parsed-literal::

                 precision    recall  f1-score   support
    
           half       0.98      0.87      0.92       296
        quarter       0.88      0.98      0.93       295
    
    avg / total       0.93      0.93      0.93       591
    

NOT BAD.
^^^^^^^^

Apparently, most mistakes happen when half-notes are classified as
quarter-notes. Also, remember that we made the train/test split
randomly, so there are almost certainly notes from each writer both in
the test set and in the training data. This is ripe picking for the kNN
classifier.

Can we perhaps quantify that effect?

...and that is beyond the scope of this tutorial.