A new quantization system is currently under development. It proposes a new interactive approach, which breaks with the previous single-solution systems. It is run through a dedicated interface, that allows to visualize and edit the transcriptions of the input sequence. It is implemented in the newest version of the "rhythm" library.
To download the sources, go to : https://forge.ircam.fr/p/omlibraries/downloads/
Download the zip file, uncompress it, and move it to you library folder (either OM X.X/libraries , or the file specified in Preferences/Libraries)
NOTE : This library is compatible with OM 6.10 and further. Make sure you have a compatible version.
The new quantization system is implemented in the form of a new class :
rq. The class has two inputs :
chord-seq: to load a
marks: to load segmentation marks into the
By double-clicking the
quant-system, the quantization interface can be displayed.
The interface has three views :
- chord-seq view (top-left) : It shows the input
chord-seqalong with the segmentation marks.
- k-best view (right) : It shows the various transcriptions given by the algorithm, among which the user can choose.
- editor view (bottom-left) : It shows the chosen transcription, and allows to edit it.
The quantization algorithm is a recursive, dynamic-programming, lazy algorithm. It works by recursively subdividing segments into equal parts, and then aligning the input points to the closest segment boundary. The algorithm enumerates the subdivisions that give the best results, ranked according to a criterion that combines two quality measures :
- dist : it describes how close the output is to the input. It is given by the sum of the differences between each input point and the point to which it is aligned
- comp : it describes the complexity of the output, in other words, how easy it is to read. It takes into account the size of the output rhythm tree, the arities of the tree nodes, and the number of grace notes in the final notation.
See the Further Reading section for more details about its in-depth functioning.
A simple example patch can be found here : patch exemple quant-system.omp
To do so, plug the
chord-seq you want to quantify into the "chord-seq" entry of the
rq box, then evaluate it.
chord-seq should be monophonic. That means that the various
chord objects inside the
chord-seq should not overlap (if they do, the algorithm will remove the overlap). But the system will perform correctly when the
chord objects that contain more than one note. If the notes inside a chord are not all of same length, the algorithm will consider the
chord's length to be that of the longest note.
You can then either :
- Provide a list of segmentation marks as an input to the
quant-systembefore evaluating it
- Run an automatic segmentation algorithm by clicking the "Segmentation" button
- Place the segmentation marks directly in the chord-seq view (either by selecting a chord and hit "s", or with CMD+click)
Segmentation marks can be placed anywhere, not necessarily on a chord. The "Slur" field indicates if the last note of the previous segment overlaps with the current, and if so, displays the amount of overlap (in ms).
The segments should not be too long (about the size of a bar), and should correspond to constant-tempo regions for best results. Each segment will correspond to a bar in the final transcription.
Click "quantify" to run the algorithm on each segment. Computations can take a long time when the segments are long, or when the schema is big. See section How to make it faster.
Select in the k-best panel the transcriptions you want to keep. To display the right panel, tick "Show solutions". To navigate through the solutions you can either :
- Click on the k-best panel, and use the arrow keys (up-down to change solutions, left-right to change segments).
- Double-click a particular segment in the chord-seq panel or in the editor panel to display its transcriptions in the k-best panel and double-click a transcription in the k-best panel.
The chosen solution will be automatically updated in the editor view.
Double clicking a chord or a group will display a list of other possible transcriptions for this chord/group. You can then choose a solution from this list by double-clicking it. You can also compute more solutions by clicking "More".
You can also replace a subtree by any valid OM rhythm tree by selecting a chord or group, press "e" and type the subtree you want.
Be careful of the cursor mode you are in : chord to select a single chord, group to select a sub-tree.
Edition is only available when in "Edit mode".
To get the final transcription as an independent
voice object, use the function
get-voice, which takes as an input a
quant-system and outputs a
voice. It has an optional input to state if you want the transcription as displayed in Edition mode or in Render mode (default : Render).
Do not forget to block the
rq box before evaluating, otherwise you will lose your work !
A video of the system being used following the previous workflow can be found here : Video presentation.
Our system performs a tempo estimation on each segment independently, in order to find for each segment the tempo that gives the best solution. As a result, the tempo is likely to change tremendously from one measure to another, which is not satisfactory in many cases. In order to solve this problem, we added a tempo-smoothing algorithm, in order to find solutions that keep an approximately constant tempo over all segments, or at least over sequences of segments that are as long as possible.
The fitness of a sequence of tempi is assessed according to two criteria :
- The sum of absolute differences between successive tempi
- The sum of the weights (as defined in the k-best parsing algorithm) of the chosen solutions
The balance between these two criteria can be set with the Smoothing parameter.
For each segment, the algorithm determines if there is a tempo that is close to that of the previous segment. If, in the current segment, no tempo is close enough to the previous, the algorithm can change the tempo and start a new sequence of tempi : we call that a "tempo jump". The Tempo jump penalty parameter allows to set how frequent one wants these tempo jumps to be. The higher it is, the less frequent tempo jumps are going to be.
The "Parameters" button allows to edit global quantization parameters (for all segments). There are various categories of parameters :
These parameters are segment-dependant and can be edited for each segment :
- Tempo : it specifies the tempo or the range of tempos used to quantize the durations.
- If tempo is nil, the algorithm will try various values of tempo by cutting each segment into equal parts, considering each part as a beat, such that the tempo is between 40 and 200 bpm
- You can specify a particular tempo value by setting the tempo value as a number. The input tempo value will be approximated so that the length of the segment is an integer multiple of the length of a beat.
- You can specify the range of tempos by setting the tempo value as a 2-element list (e.g (30 300) )
- Schema : it specifies which subdivisions or series of subdivisions are allowed in each beat. For more details, see further.
- Nb. solutions : it specifies the number of solutions computed and displayed in the k-best panel for each tempo.
- Precision : it specifies how the balance between dist and comp. If precision=0, the solutions are ranked only according to comp. If precision=1, the solutions are ranked only according to dist.
- Grace pen. : it specifies the penalty given to grace notes. When it is 0, grace notes are not penalized at all.
- Merge tempi : when this option is ticked, the solutions are ranked across all tempi. The best solution thus gives the best tempo. However, it is not possible to compute more solutions for one particular tempo. When this option is not ticked, the solutions are ranked for each tempo independently. There is no estimation os which tempo is best, but it is possible to compute more solutions for one particular tempo. Both cases are equivalent in terms of computation time.
- Segments = beats : Use this option when the input has been segmented beforehand in beats. The algorithm will consider each segment as a beat (or any number of beats), and will not run the tempo estimation phase.
- Nb. beats/segment : This option is used to specify how long each segment is supposed to last (in beats). This parameter is only used when "Segment = beats" is enabled.
To edit the quantization parameters for one segment only, double-click the corresponding segmentation mark. You can the re-run the algorithm on this segment only by select it and press "q", which can save a lot of computation time.
This parameter concerns the segmentation algorithm.
- New-segment penalty : Sets the penalty given to the creation of a new segment in the segmentation algorithm. The higher it is, the longer the segments are going to be.
These parameters concern the tempo-smoothing algorithm.
- Smoothing : Sets the balance between the smoothness of the tempi and the quality of the solutions. When smoothing=1, the variations of tempo only are taken into account, the best sequence of tempi is chosen, regardless of the fitness of the solutions. When smoothing=0, the weight only is taken into account, the solutions chosen are all the best ones, regardless of their tempi.
- Tempo jump penalty : Sets the penalty given to a jump of tempo. When it is high, the algorithm will find the solutions such that an average tempo is kept over all the segments. When it is low, big tempo changes will be more frequent.
The Edit/Render switch allows to display the solution in two modes : Edit mode, where the display might not be ideal, but edition of the solution is permitted, and Render mode, where the display is prettier, but edition is not permitted.
The Color mode option displays with a color code (red : bad, green : good) the precision of the current transcription for the selected chord or group in the editor view.
The Show pulses option displays in the chord-seq view the quantization grid on which onsets are aligned.
The Show solutions switch displays the k-best panel on the right.
The quantification algorithm requires a lot of calculations, which can take a lot of time, especially with older computers. Fortunately, there are ways to make it faster.
- Do not compute less solutions. What requires the most calculations is computing the best solution. Once it is computed, getting more solutions is much faster. For instance, it should take about the same time to compute 3 solutions instead of 10.
- Use a smaller schema (cf section Subdivision Schema). Try to remove from the schema any kind of subdivision you already know you don't want.
- Use a smaller tempo range, or specify a tempo. That makes the tempo estimation step easier and saves a lot of computation.
- Use the option "Segments = beats" to skip the tempo estimation step.
- Segment into smaller segments. The longer the segments are, the more possible tempo values there will be. By cutting into smaller segments, you can limit the number of possible tempi and thus the computation time.
- Select a segment and press "q" to re-run the algorithm on this segment only. It is useful when you have changed the parameters on one segment only and you don't want to recompute all the segments.
To export one or multiple transcriptions from the quant-system to OpenMusic objects, two functions are available :
get-voice: This function exports the chosen transcription into a
get-k-best-list: This function exports one or multiple transcriptions among those in the k-best panel. See internal documentation for more information.
The subdivision schema specifies which subdivision or series of subdivisions are allowed in each beat. It is given as a list.
For example, the schema
(2 2 3) specifies that each beat can be cut in 2, then each part can be cut in 2, then each part can be cut again in 3.
Each segment can be cut again, or left as is.
For example, with the previous schema, the first half can be cut in 2 while the second half is left uncut. Again, the first quarter can be cut in 3, while the second is not.
Nested lists can be used in order to describe alternate possibilities.
For example, the schema
(2 (2 3) 3) specifies that each beat can be cut in 2, then each part can be cut in 2 or in 3, then each part can be cut again in 3.
When a choice is left between various subdivisions, the choice is made in each segment independently of the choices in the other segments.
For example, with the previous schema, the first half can be cut in 2 or 3, and the second half can be cut in 2 or 3 independently, the choices can be different in each part.
Deeper nested lists can be used in order to describe alternate series of subdivisions. More generally speaking, when the current depth is odd, the list describes successive choices, and when it is even, the list alternative choices.
For example, the schema
(2 ( (2 3) (3 2) ) ) specifies that each beat can be cut in 2, then each part can be cut in 2 then 3, or in 3 then 2.
( ( ( (2 3) (3 2) 2) (5 (2 3) ) 7) ) is equivalent to leaving the choice between the following schemas :
(2 3 2), (3 2 2), (5 2), (5 3), (7)
More examples and further explanations can be found in the references given in the Further Reading section.
To know more about this transcription system :
- It is described in : Adrien Ycart, Florent Jacquemard, Jean Bresson, Slawek Staworko, Une approche interactive pour la transcription rythmique dans OpenMusic, JIM 2016
- and in Adrien Ycart, Florent Jacquemard, Jean Bresson, Slawek Staworko, A Supervised Approach for Rhythm Transcription based on Tree Series Enumeration, ICMC 2016.
- More in-depth description of the algorithm can be found here : Adrien Ycart, Quantification rythmique dans OpenMusic, Master thesis, IRCAM.
- The algorithm is based on the algorithms described in this article : Liang Huang, David Chiang, Better k-best parsing, Proceedings of the Ninth International Workshop on Parsing Technology (pp. 53-64). Association for Computational Linguistics.
- The segmentation algorithm is described in this article : A.C. Yang, Elaine Chew, Anja Volk, A dynamic programming approach to adaptive tatum assignment for rhythm transcription, Seventh IEEE International Symposium on Multimedia. IEEE, 2005
For any inquiries, please contact : adrien[dot]ycart[at]ircam.fr