# Supervised approach of rhythm quantification based on tree series enumeration

A new quantization system is currently under development. It proposes a new interactive approach, which breaks with the previous single-solution systems. It is run through a dedicated interface, that allows to visualize and edit the transcriptions of the input sequence. It is implemented in the newest version of the "rhythm" library.

## Download

To download the sources, go to : https://forge.ircam.fr/p/omlibraries/downloads/

Download the zip file, uncompress it, and move it to you library folder (either OM X.X/libraries , or the file specified in Preferences/Libraries)

**NOTE** : This library is compatible with OM 6.10 and further. Make sure you have a compatible version.

## The rq Class

The new quantization system is implemented in the form of a new class : ** rq**. The class has two inputs :

: to load a**chord-seq**`chord-seq`

into the`quant-system`

: to load segmentation marks into the**marks**`quant-system`

By double-clicking the `quant-system`

, the quantization interface can be displayed.

The interface has three views :

**chord-seq view**(top-left) : It shows the input`chord-seq`

along with the segmentation marks.**k-best view**(right) : It shows the various transcriptions given by the algorithm, among which the user can choose.**editor view**(bottom-left) : It shows the chosen transcription, and allows to edit it.

## The k-best parsing algorithm

The quantization algorithm is a recursive, dynamic-programming, lazy algorithm. It works by recursively subdividing segments into equal parts, and then aligning the input points to the closest segment boundary. The algorithm enumerates the subdivisions that give the best results, ranked according to a criterion that combines two quality measures :

: it describes how close the output is to the input. It is given by the sum of the differences between each input point and the point to which it is aligned*dist*: it describes the*comp**complexity*of the output, in other words, how easy it is to read. It takes into account the size of the output rhythm tree, the arities of the tree nodes, and the number of grace notes in the final notation.

See the Further Reading section for more details about its in-depth functioning.

## Example patch

A simple example patch can be found here : patch exemple quant-system.omp

## Typical workflow

#### Charge a ''chord-seq'' into the quant-system

To do so, plug the `chord-seq`

you want to quantify into the "chord-seq" entry of the `rq`

box, then evaluate it.

The input `chord-seq`

should be monophonic. That means that the various `chord`

objects inside the `chord-seq`

should not overlap (if they do, the algorithm will remove the overlap). But the system will perform correctly when the `chord-seq`

contains `chord`

objects that contain more than one note. If the notes inside a chord are not all of same length, the algorithm will consider the `chord`

's length to be that of the longest note.

#### Segment the input

You can then either :

- Provide a list of segmentation marks as an input to the
`quant-system`

before evaluating it - Run an automatic segmentation algorithm by clicking the "Segmentation" button
- Place the segmentation marks directly in the chord-seq view (either by selecting a chord and hit "s", or with CMD+click)

Segmentation marks can be placed anywhere, not necessarily on a chord. The "Slur" field indicates if the last note of the previous segment overlaps with the current, and if so, displays the amount of overlap (in ms).

The segments should not be too long (about the size of a bar), and should correspond to constant-tempo regions for best results. Each segment will correspond to a bar in the final transcription.

#### Click "quantify"

Click "quantify" to run the algorithm on each segment. Computations can take a long time when the segments are long, or when the schema is big. See section How to make it faster.

#### Select in the k-best panel

Select in the k-best panel the transcriptions you want to keep. To display the right panel, tick "Show solutions". To navigate through the solutions you can either :

- Click on the k-best panel, and use the arrow keys (up-down to change solutions, left-right to change segments).
- Double-click a particular segment in the chord-seq panel or in the editor panel to display its transcriptions in the k-best panel and double-click a transcription in the k-best panel.

The chosen solution will be automatically updated in the editor view.

#### Edit the transcription

Double clicking a chord or a group will display a list of other possible transcriptions for this chord/group. You can then choose a solution from this list by double-clicking it. You can also compute more solutions by clicking "More".

You can also replace a subtree by any valid OM rhythm tree by selecting a chord or group, press "e" and type the subtree you want.

**Be careful** of the cursor mode you are in : chord to select a single chord, group to select a sub-tree.

Edition is only available when in "Edit mode".

#### Retrieve the final transcription

To get the final transcription as an independent `voice`

object, use the function

, which takes as an input a **get-voice**`quant-system`

and outputs a `voice`

. It has an optional input to state if you want the transcription as displayed in Edition mode or in Render mode (default : Render).

Do not forget to block the `rq`

box before evaluating, otherwise you will lose your work !

#### Video demo

A video of the system being used following the previous workflow can be found here : Video presentation.

## Tempo smoothing

Our system performs a tempo estimation on each segment independently, in order to find for each segment the tempo that gives the best solution. As a result, the tempo is likely to change tremendously from one measure to another, which is not satisfactory in many cases. In order to solve this problem, we added a tempo-smoothing algorithm, in order to find solutions that keep an approximately constant tempo over all segments, or at least over sequences of segments that are as long as possible.

The fitness of a sequence of tempi is assessed according to two criteria :

- The sum of absolute differences between successive tempi
- The sum of the weights (as defined in the k-best parsing algorithm) of the chosen solutions

The balance between these two criteria can be set with the **Smoothing** parameter.

For each segment, the algorithm determines if there is a tempo that is close to that of the previous segment.
If, in the current segment, no tempo is close enough to the previous, the algorithm can change the tempo and start a new sequence of tempi : we call that a "tempo jump".
The **Tempo jump penalty** parameter allows to set how frequent one wants these tempo jumps to be. The higher it is, the less frequent tempo jumps are going to be.

## Parameters

The "Parameters" button allows to edit global quantization parameters (for all segments). There are various categories of parameters :

#### Segment parameters

These parameters are segment-dependant and can be edited for each segment :

**Tempo**: it specifies the tempo or the range of tempos used to quantize the durations.- If tempo is
*nil*, the algorithm will try various values of tempo by cutting each segment into equal parts, considering each part as a beat, such that the tempo is between 40 and 200 bpm - You can specify a particular tempo value by setting the tempo value as a number. The input tempo value will be approximated so that the length of the segment is an integer multiple of the length of a beat.
- You can specify the range of tempos by setting the tempo value as a 2-element list (
*e.g*(30 300) )

**Schema**: it specifies which subdivisions or series of subdivisions are allowed in each beat. For more details, see further.**Nb. solutions**: it specifies the number of solutions computed and displayed in the k-best panel for each tempo.**Precision**: it specifies how the balance between*dist*and*comp*. If precision=0, the solutions are ranked only according to*comp*. If precision=1, the solutions are ranked only according to*dist*.**Grace pen.**: it specifies the penalty given to grace notes. When it is 0, grace notes are not penalized at all.**Merge tempi**: when this option is ticked, the solutions are ranked across all tempi. The best solution thus gives the best tempo. However, it is not possible to compute more solutions for one particular tempo. When this option is not ticked, the solutions are ranked for each tempo independently. There is no estimation os which tempo is best, but it is possible to compute more solutions for one particular tempo. Both cases are equivalent in terms of computation time.**Segments = beats**: Use this option when the input has been segmented beforehand in beats. The algorithm will consider each segment as a beat (or any number of beats), and will not run the tempo estimation phase.**Nb. beats/segment**: This option is used to specify how long each segment is supposed to last (in beats). This parameter is only used when "Segment = beats" is enabled.

To edit the quantization parameters for one segment only, double-click the corresponding segmentation mark. You can the re-run the algorithm on this segment only by select it and press "q", which can save a lot of computation time.

#### Segmentation parameters

This parameter concerns the segmentation algorithm.

**New-segment penalty**: Sets the penalty given to the creation of a new segment in the segmentation algorithm. The higher it is, the longer the segments are going to be.

#### Tempo smoothing parameters

These parameters concern the tempo-smoothing algorithm.

**Smoothing**: Sets the balance between the smoothness of the tempi and the quality of the solutions. When smoothing=1, the variations of tempo only are taken into account, the best sequence of tempi is chosen, regardless of the fitness of the solutions. When smoothing=0, the weight only is taken into account, the solutions chosen are all the best ones, regardless of their tempi.**Tempo jump penalty**: Sets the penalty given to a jump of tempo. When it is high, the algorithm will find the solutions such that an average tempo is kept over all the segments. When it is low, big tempo changes will be more frequent.

#### Display parameters

The **Edit/Render** switch allows to display the solution in two modes : Edit mode, where the display might not be ideal, but edition of the solution is permitted, and Render mode, where the display is prettier, but edition is not permitted.

The **Color mode** option displays with a color code (red : bad, green : good) the precision of the current transcription for the selected chord or group in the editor view.

The **Show pulses** option displays in the chord-seq view the quantization grid on which onsets are aligned.

The **Show solutions** switch displays the k-best panel on the right.

## How to make it faster

The quantification algorithm requires a lot of calculations, which can take a lot of time, especially with older computers. Fortunately, there are ways to make it faster.

**Do not compute less solutions**. What requires the most calculations is computing the best solution. Once it is computed, getting more solutions is much faster. For instance, it should take about the same time to compute 3 solutions instead of 10.

**Use a smaller schema**(cf section Subdivision Schema). Try to remove from the schema any kind of subdivision you already know you don't want.**Use a smaller tempo range**, or specify a tempo. That makes the tempo estimation step easier and saves a lot of computation.**Use the option "Segments = beats"**to skip the tempo estimation step.**Segment into smaller segments**. The longer the segments are, the more possible tempo values there will be. By cutting into smaller segments, you can limit the number of possible tempi and thus the computation time.

**Select a segment and press "q"**to re-run the algorithm on this segment only. It is useful when you have changed the parameters on one segment only and you don't want to recompute all the segments.

## Export functions

To export one or multiple transcriptions from the quant-system to OpenMusic objects, two functions are available :

: This function exports the chosen transcription into a`get-voice`

`voice`

: This function exports one or multiple transcriptions among those in the k-best panel. See internal documentation for more information.`get-k-best-list`

## Subdivision Schemas

The subdivision schema specifies which subdivision or series of subdivisions are allowed in each beat. It is given as a list.

*For example, the schema (2 2 3) specifies that each beat can be cut in 2, then each part can be cut in 2, then each part can be cut again in 3.
*

Each segment can be cut again, or left as is.

*For example, with the previous schema, the first half can be cut in 2 while the second half is left uncut. Again, the first quarter can be cut in 3, while the second is not.*

Nested lists can be used in order to describe alternate possibilities.

*For example, the schema (2 (2 3) 3) specifies that each beat can be cut in 2, then each part can be cut in 2 *or

*in 3, then each part can be cut again in 3.*

When a choice is left between various subdivisions, the choice is made in each segment independently of the choices in the other segments.

*For example, with the previous schema, the first half can be cut in 2 or 3, and the second half can be cut in 2 or 3 independently, the choices can be different in each part.*

Deeper nested lists can be used in order to describe alternate series of subdivisions. More generally speaking, when the current depth is odd, the list describes *successive* choices, and when it is even, the list *alternative* choices.

*For example, the schema (2 ( (2 3) (3 2) ) ) specifies that each beat can be cut in 2, then each part can be cut in 2 *then

*3,*or

*in 3*then

*2.*

*The schema ( ( ( (2 3) (3 2) 2) (5 (2 3) ) 7) ) is equivalent to leaving the choice between the following schemas : (2 3 2), (3 2 2), (5 2), (5 3), (7)*

More examples and further explanations can be found in the references given in the Further Reading section.

## Further reading

To know more about this transcription system :

- It is described in : Adrien Ycart, Florent Jacquemard, Jean Bresson, Slawek Staworko, Une approche interactive pour la transcription rythmique dans OpenMusic, JIM 2016
- and in Adrien Ycart, Florent Jacquemard, Jean Bresson, Slawek Staworko, A Supervised Approach for Rhythm Transcription based on Tree Series Enumeration, ICMC 2016.
- More in-depth description of the algorithm can be found here : Adrien Ycart, Quantification rythmique dans OpenMusic, Master thesis, IRCAM.
- The algorithm is based on the algorithms described in this article : Liang Huang, David Chiang, Better k-best parsing,
*Proceedings of the Ninth International Workshop on Parsing Technology (pp. 53-64)*. Association for Computational Linguistics.

- The segmentation algorithm is described in this article : A.C. Yang, Elaine Chew, Anja Volk, A dynamic programming approach to adaptive tatum assignment for rhythm transcription,
*Seventh IEEE International Symposium on Multimedia*. IEEE, 2005

*For any inquiries, please contact : adrien[dot]ycart[at]ircam.fr *