Home Artificial Intelligence Roman Numeral Evaluation with Graph Neural Networks Graph Neural Networks Examples of ChordGNN predictions Conclusions References Resources

Roman Numeral Evaluation with Graph Neural Networks Graph Neural Networks Examples of ChordGNN predictions Conclusions References Resources

Roman Numeral Evaluation with Graph Neural Networks
Graph Neural Networks
Examples of ChordGNN predictions

An Introductory Guide

Towards Data Science

In this text, I would love to elucidate my journey in developing a model for automatic harmonic evaluation. Personally, I’m fascinated with understanding music deeply. Questions like: “Why are things structured the best way they’re?” and “What was the composer or artist considering when writing the piece?” are necessary to me. Naturally, the approach to start was for me to analyse the underlying harmony of a bit.

Scavenging my old notebooks back from the conservatory I stabled upon the technique we were using to annotate and analyze small musical excerpts. It is known as Roman Numeral evaluation. The concept could be a bit complicated in the event you never heard about it before but please bare with me.

My goal is to construct a system that may robotically analyze musical scores. Given a rating then the system will return the identical rating with an additional staff containing the chords in Roman numeral notation. This could work mainly for classical tonal music but is just not necessarily limited to that.

In the remaining of this text, I’ll introduce the concepts of Roman Numerals, Graph Neural Networks, and discuss some details in regards to the model I developed and the outcomes. I hope you enjoy!

Introduction to Roman Numerals

Roman Numeral evaluation is a technique used to know and analyze the chords and harmonic progressions in music, particularly in Western classical music and popular music. Chords are represented using Roman numerals as an alternative of traditional musical notation.

In Roman Numeral evaluation, you see, each chord is assigned a Roman numeral based on its position and performance inside a given key. The Roman numerals represent the dimensions degrees of the important thing, with uppercase numerals representing major chords and lowercase numerals representing minor chords.

For instance, in the important thing of C major, the C major chord can be represented by the Roman numeral “I” (uppercase “I” denotes a significant chord). The D minor chord can be represented by “ii” (lowercase “ii” denotes a minor chord). The G major chord can be represented by “V” (uppercase “V” denotes a significant chord) since it is the fifth chord in the important thing of C major.

A Roman Numeral evaluation example for 2 bars for four-part harmony in C major.

Roman numerals are all the time relative to a key. Then if the secret is C major then the Roman numeral “V” can be the dominant or the G major chord. But chords do have different qualities for instance minor or major. In Roman numerals, capital letters stand for major quality and lowercase for minor quality.

In music evaluation, often the bottom note is some extent of reference in regards to the character of a chord. Roman numerals are capable of convey this information too. In the instance above, the bass (lowest chord note) of the second chord is F sharp, but the foundation of the chord is D due to this fact the chord is in 1 inversion, indicated with the number 6.

One other interesting notation capability of Roman numerals is said to borrowed chords. This effect is known as secondary degree, implicitly every Roman numeral (primary) has a secondary degree of the tonic (i.e. I or i), nevertheless, when the secondary degree is annotated then we’re informed which scale degree is acting because the tonic momentarily. The third chord, in the instance above, has a dominant seven as its primary degree and the dominant of C major as its secondary degree. The V65 indicates a significant with a seven quality in second inversion.

Roman Numeral evaluation helps musicians and music theorists understand the structure and relationships between chords in a bit of music. It allows them to discover common chord progressions, analyze harmonic patterns, and make comparisons between different musical compositions. It’s a useful gizmo for composers, arrangers, and performers to know the underlying harmony and make musical decisions based on that knowledge.

Automatic Roman Numeral Evaluation

Now that now we have a basis for what Roman Numeral evaluation looks like in practice we will discuss how you can automate it. In this text, we’ll cover a way to predict Roman Numeral from symbolic music, i.e. digital scores (MusicXML, MIDI, Mei, Kern, MuseScore, etc.). Please note which you could obtain a few of these formats from any rating editor software resembling Finale, Sibelius, MuseScore, or every other. Often, the software allows for an export to a musicxml (uncompressed) format. Nevertheless, for in the event you don’t have any of those editors I suggest using MuseScore.

Let’s now discuss the representations in additional depth. In contrast to audio representations where music could be seen as a digital sequence within the waveform level or a 2-D spectrogram within the frequency domain, the symbolic representation has individual note events carrying information resembling onset time, duration, and pitch spelling (names of notes). The symbolic representations have often been treated as a pseudo-audio representation separating the rating into quantized time frames, for instance, a pianoroll (just like the figure shown below). Nevertheless, recently some works proposed a graph representation of a rating where every note represents a vertex within the graph and edges represent relations between notes. For the latter, scores could be transformed on this graph structure which is especially useful when a Machine Learning model is involved.

Different representations of the rating excerpt are shown in the center. Top: quantized timeframe representation, bottom: graph representation.

So given a symbolic rating, the graph is constructed by modelling 3 relationships between notes.

  • Notes starting at the identical time, i.e. same onset.
  • Note starting when the opposite ends, i.e. consecutive notes.
  • Notes starting while the opposite is sounding, i.e. during connection.

The graph of the rating could be used as input to a Graph Neural Network which implicitly learns by propagating the knowledge along the perimeters of the graph. But before we explain how a model works on scores, let’s first briefly explain how Graph Neural Networks work.

So, what exactly are Graph Neural Networks? At their core, GNNs are a category of deep learning models designed to handle data represented as graphs. Similar to real-world networks, graphs consist of interconnected nodes or vertices, each with its own unique features. GNNs leverage this interconnectedness to capture wealthy relationships and dependencies, enabling them to perform evaluation and prediction tasks.

But how do GNNs work? Imagine a musical rating where each note is a node, and note relations represent the connections between them. Traditional models would treat each note instance individually, ignoring the musical context. Nevertheless, GNNs embrace this context by considering each the person’s features (e.g., pitch spelling, duration) and their relationships (same onset, consecutive) concurrently. By aggregating information from neighbouring nodes, GNNs empower us to know not only individual notes but additionally the dynamics and patterns inside the whole network.

To attain this, GNNs employ a series of iterative message-passing steps. During each step, nodes gather information from their neighbours, update their very own representations, and propagate these updated features further through the network. This iterative process allows GNNs to capture and refine information from nearby nodes, progressively constructing a comprehensive understanding of the whole graph.

The message-passing process when done iteratively within the network is typically called graph convolution. A well-liked graph convolution block that we also utilized in our music evaluation model is known as SageConv, from the famous GraphSAGE paper. We won’t cover the particulars here but there are lots of sources covering the functionality of GraphSAGE, resembling this one.

The fantastic thing about GNNs lies of their ability to extract meaningful representations from graph data. By learning from the local context and mixing it with global information, GNNs can uncover hidden patterns, make accurate predictions, and even generate latest insights. This makes them invaluable in a big selection of domains, from social network evaluation to drug discovery, traffic prediction to fraud detection, and now to music evaluation.

The model used for Roman Numeral evaluation is known as ChordGNN.
Because the name suggests, ChordGNN is a model for automatic Roman Numeral evaluation based on Graph Neural Networks. A particularity of this model is that’s leverages note-wise information but produces onset-wise prediction, i.e. a Roman Numeral is predicted for every unique onset event of the rating. That signifies that multiple notes at the identical onset will share the identical Roman Numeral identical to when annotating a musical rating. Nevertheless, by utilizing Graph Convolution information from every note is propagated through the neighboring notes and onsets.

ChordGNN model architecture illustration.

ChordGNN is predicated on a Graph Convolutional Recurrent Neural Network Architecture and it consists of stacked GraphSAGE Convolutional Blocks that operate on the note level.

The Graph Convolution is followed by an Onset-Pooling Layer that contracts the note representations to the onset level, thus leading to a vector embedding for every unique onset of the rating. That is a vital step because it moves the representation from a graph to a sequence.

The embeddings obtained by the Onset-Pooling, that are also ordered by time, are then fed to a Sequential model, resembling a GRU stack. Finally, easy Multi-Layer Perceptron Classifiers are added for every one in every of the attributes that describe a Roman Numeral. Subsequently, ChordGNN can be a Multi-Task model.

ChordGNN does indirectly predict the Roman numeral for each position of the rating but quite predicts the degree, local key, quality, inversion and root as an alternative. The predictions of every attribute task are combined right into a single Roman Numeral prediction by analyzing the predictions for every of the tasks. Let’s see what the output predictions looked like.

On this section, we’ll have a look at a few of ChordGNN’s predictions and even compare them with an evaluation done by a human. Below is an example of the primary bars from Haydn’s string quartet op.20 №3 movement 4.

A comparison between the human annotation and ChordGNN on a passage of Haydn’s string
quartet op.20 №3 movement 4.

In this instance, we will view several things. In measure 2, the human annotation marks a tonic in first inversion; nevertheless, the viola at that time is lower than the cello and due to this fact the chord is definitely in root position. ChordGNN is capable of predict this appropriately. Subsequently, ChordGNN predicts a harmonic rhythm of eighth notes, which disagrees with the annotator’s half-note marking. Analyzing the underlying harmony in that passage, we will justify our ChordGNN’s decisions.

The human annotation suggests that the whole second half of the 2nd measure represents a viio chord. Nevertheless, it shouldn’t be in the primary inversion, because the cello plays an F# as the bottom note (which is the foundation of viio). Nevertheless, there are two conflicting interpretations of the segment. First, the viio on the third beat is seen as a passing chord between the encompassing tonic chords, resulting in a dominant chord in the subsequent measure. Alternatively, the viio could already be a part of a chronic dominant harmony (with passing chords on the offbeats) resulting in the V7. The ChordGNN solution accommodates each interpretations because it doesn’t try to group chords at the next level, treating each eighth note as a person chord quite than a passing event.

A comparison between the human annotation and ChordGNN on a passage of Mozarts’s Piano Sonata K279 movement 1. Image by the creator

Above is one other example comparing the predictions of ChordGNN with the unique evaluation of a Mozart Piano Sonata. On this case, ChordGNN’s evaluation is a little more simplistic, selecting to omit some chords. This is going on on two different occasions with the dominant seven in 4 inversion (V2). That is an affordable assumption for ChordGNN for the reason that bass is missing. One other disagreement between the annotation and the prediction occurs on the half cadence towards the tip. ChordGNN is treating the C# of the melody as a passing note where the annotator chooses to specify the extension of #11.

In this text, we discussed a brand new method for automating Roman Numeral Evaluation using Graph Neural Networks. We discussed how the ChordGNN model works and showcased a few of its predictions.

E. Karystinaios, G. Widmer. Roman Numeral Evaluation with Graph Neural Networks: Onset-wise Predictions from Note-wise Features. Proceedings of International Society of Music Information Retrieval Conference (ISMIR), 2023.

All images and graphics in this text are created by the creator.


Please enter your comment!
Please enter your name here