Musical Gesture Recognition Using Machine Learning and Audio Descriptors

Paul Best, Jean Bresson, Diemo Schwarz
Submitted poster at CBMI'18, International Conference on Content-Based Multimedia Indexing

This page presents the complete results from the experiments presented in the paper.

Training and testing with Mel Frequency cepstral coeffients and common descriptors

The table below presents the performance of XMM models trained with the 12 Mel frquency cepstral coefficients and the 9 ircam descriptors (Frequency, Energy, Periodicity, AC1, Loudness, Centroid, Spread, Skewness, and Kurtosis)

Number of hidden states Relative regu Absolute regu test set accuracy training set accuracy
200.10.050.0210526330.058803257
200.20.050.070467840.073694654
200.30.050.486878660.8526838
200.40.050.499378650.84136385
200.50.050.504239740.83066
300.10.050.512171030.8752297
300.20.050.533808470.88536966
300.30.050.57065060.87174184
300.40.050.555847950.8657477
300.50.050.544444450.8692982
400.10.050.5331140.8823726
400.20.050.570540970.9025585
400.30.050.57152780.902005
400.40.050.57638890.8949039
400.50.050.56984650.881203
500.10.050.53351610.88831455
500.20.050.5601243 0.88062865
500.30.050.5486111 0.8776838
500.40.050.5541667 0.86996657
500.50.050.5483187 0.86873436

Training and testing with Mel Frequency cepstral coeffients

The table below presents the performance of XMM models trained with the 12 Mel frquency cepstral coefficients

Number of hidden states Relative regu Absolute regu test set accuracy training set accuracy
100.10.050.46242690.7771825
100.150.10.446345030.79201127
100.050.010.434831860.7683062
150.10.050.490314330.8258876
150.150.10.53866960.82710946
150.050.010.447331880.82301587
200.10.050.538377170.84613616
200.150.10.52587720.85326857
200.050.010.48574560.85562867
250.10.050.56100150.85326857
250.150.10.56597220.87166876
250.050.010.559831860.8538429

Training and testing with common descriptors

The table below presents the performance of XMM models trained with 9 common descriptors : Frequency, Energy, Periodicity, AC1, Loudness, Centroid, Spread, Skewness, and Kurtosis

Number of hidden states Relative regu Absolute regu test set accuracy training set accuracy
10 0.1 0.05 0.42372075 0.654198
100.150.10.43552630.62569964
100.050.010.438523380.66724104
150.10.050.466118430.7314745
150.150.10.481615480.7285192
150.050.010.498282160.7605785
200.10.050.52715640.79918546
200.150.10.510489770.79144735
200.050.010.513048240.8128655
250.10.050.50661550.8170008
250.150.10.47861840.8098684
250.050.010.52825290.830117

Training and testing with 1 descriptor

The table below presents the performance of XMM models trained with only one descriptor.
The models were tested with 10 hidden states and (0.1 0.05) as regularization values.

Descriptor Test set accuracy Training set accuracy
Coeff. MFCC #1 0.20785819 0.29530075
Coeff. MFCC #2 0.24002193 0.28645572
Coeff. MFCC #3 0.243019 0.3143066
Coeff. MFCC #4 0.16776316 0.26617587
Coeff. MFCC #5 0.10829678 0.23765664
Coeff. MFCC #6 0.12408626 0.23354219
Coeff. MFCC #7 0.1941886 0.29765037
Coeff. MFCC #8 0.11962719 0.19603175
Coeff. MFCC #9 0.13461258 0.22757937
Coeff. MFCC #10 0.15080409 0.22455096
Coeff. MFCC #11 0.15157163 0.22398706
Coeff. MFCC #12 0.13442983 0.20447995
Freqency 0.22675438 0.2922619
Energy 0.0439693 0.05229741
Periodicity 0.064729534 0.12179407
AC1 1) 0.038121347 0.0641604
Loundess 0.23516082 0.30132625
Centroid 0.20946637 0.34698203
Spread 0.21717836 0.34344193
Skewness 0.1439693 0.22039473
Kurtosis 0.11140351 0.17880117

Training and testing with combinations of 2 descriptors

The table below presents the performance of XMM models trained with combinations of 2 audio descriptors.
The models were tested with 10 hidden states and (0.1 0.05) as regularization values.

Numeric notation used for descriptors :

  • [0-11] : MFCC Coefficients
  • [12-20] : Frequency, Energy, Periodicity, AC1, Loudness, Centroid, Spread, Skewness, Kurtosis
Descriptors Test set accuracy Training set accuracy
0 1 0.4263889 0.64530075
0 2 0.31944445 0.5430138
0 3 0.4434576 0.6631683
0 4 0.2886696 0.52935464
0 5 0.4060307 0.59417296
0 6 0.33088452 0.55374897
0 7 0.39342105 0.5989662
0 8 0.39967105 0.6162072
0 9 0.42339182 0.6078634
0 10 0.41217107 0.5727966
0 11 0.39320177 0.5871136
0 12 0.39312866 0.6008041
0 13 0.28351608 0.41594613
0 14 0.39152047 0.5359545
0 15 0.31100145 0.44684628
0 16 0.4168494 0.5864453
0 17 0.43165204 0.60970134
0 18 0.3938231 0.6137949
0 19 0.41476607 0.5662281
0 20 0.3756579 0.5662281
1 2 0.3941155 0.6066625
1 3 0.40434942 0.63168335
1 4 0.39283624 0.6001984
1 5 0.40650585 0.59429825
1 6 0.40336257 0.6149332
1 7 0.36136696 0.6036863
1 8 0.35650584 0.60718465
1 9 0.43165204 0.5757936
1 10 0.35201022 0.57103175
1 11 0.38399124 0.6168233
1 12 0.47207603 0.6387218
1 13 0.28739035 0.4332498
1 14 0.43622077 0.5983187
1 15 0.31114766 0.49738932
1 16 0.47404972 0.63402254
1 17 0.38490498 0.5989244
1 18 0.46381578 0.64654345
1 19 0.36282894 0.57943815
1 20 0.36195177 0.5431391
2 3 0.3232456 0.5353592
2 4 0.2847222 0.5092523
2 5 0.32810673 0.5585526
2 6 0.29038742 0.5448935
2 7 0.29027778 0.51158107
2 8 0.27986112 0.53714496
2 9 0.32284358 0.52642024
2 10 0.26875 0.48665413
2 11 0.31103802 0.5442878
2 12 0.4119152 0.5954261
2 13 0.25314328 0.4057853
2 14 0.28501463 0.49788013
2 15 0.26980993 0.40582708
2 16 0.39312866 0.5906955
2 17 0.41787282 0.6560777
2 18 0.45489767 0.64714915
2 19 0.33665934 0.5330096
2 20 0.3096491 0.5056913
3 4 0.24539474 0.5057018
3 5 0.25782165 0.47225356
3 6 0.30745614 0.5234858
3 7 0.29206872 0.48433584
3 8 0.24312866 0.4961362
3 9 0.2699196 0.44088346
3 10 0.3030702 0.50393695
3 11 0.2690424 0.5133772
3 12 0.3030702 0.51280284
3 13 0.20003656 0.32439432
3 14 0.24214182 0.43970343
3 15 0.20657894 0.37192982
3 16 0.43293127 0.606746
3 17 0.36016083 0.5336571
3 18 0.4150585 0.62102134
3 19 0.2736111 0.49083126
3 20 0.22467105 0.4522243
4 5 0.24758773 0.48847118
4 6 0.28611112 0.46521512
4 7 0.21710527 0.48133877
4 8 0.18841374 0.43016917
4 9 0.25621346 0.4830514
4 10 0.23505117 0.445071
4 11 0.23230994 0.48011696
4 12 0.3449927 0.54486216
4 13 0.13062866 0.30956557
4 14 0.22306288 0.40111738
4 15 0.1622076 0.35050124
4 16 0.4057383 0.5602861
4 17 0.3349415 0.568066
4 18 0.39510235 0.5871867
4 19 0.2918494 0.47947994
4 20 0.2302266 0.40701753
5 6 0.34645468 0.50982666
5 7 0.23855995 0.5002297
5 8 0.22722954 0.4818609
5 9 0.24312866 0.4307853
5 10 0.20599416 0.4509712
5 11 0.23190789 0.45098162
5 12 0.29663742 0.5227757
5 13 0.12916667 0.31429616
5 14 0.16885965 0.39813074
5 15 0.21016082 0.3689745
5 16 0.40621346 0.59181285
5 17 0.33483186 0.49913326
5 18 0.35899124 0.5923977
5 19 0.21483918 0.44627193
5 20 0.23618421 0.4224833
6 7 0.21027047 0.49375522
6 8 0.2758772 0.4652047
6 9 0.31878656 0.4677005
6 10 0.2381579 0.50149334
6 11 0.24261697 0.500919
6 12 0.32284358 0.5460318
6 13 0.2424342 0.35944027
6 14 0.27569443 0.44911236
6 15 0.25453216 0.39570802
6 16 0.32452485 0.5062552
6 17 0.37050438 0.5698204
6 18 0.39184943 0.6281224
6 19 0.29762426 0.47361112
6 20 0.24144738 0.4034461
7 8 0.21980994 0.465236
7 9 0.19477339 0.44684628
7 10 0.20698099 0.45513785
7 11 0.23687865 0.42543858
7 12 0.26546052 0.49490392
7 13 0.16330409 0.28107768
7 14 0.17342836 0.38495195
7 15 0.1622076 0.31551796
7 16 0.36849415 0.57637847
7 17 0.29097223 0.5086988
7 18 0.36224416 0.5371658
7 19 0.23062866 0.4462406
7 20 0.20957603 0.39448622
8 9 0.18464913 0.395165
8 10 0.1941886 0.4254908
8 11 0.18424708 0.40580618
8 12 0.30716375 0.50075186
8 13 0.11396199 0.27444655
8 14 0.12975146 0.34225145
8 15 0.09689327 0.28816834
8 16 0.38399124 0.5924499
8 17 0.29923245 0.510401
8 18 0.3436038 0.5739557
8 19 0.2516813 0.3927736
8 20 0.19718567 0.3559315
9 10 0.19177632 0.41705304
9 11 0.200731 0.4473893
9 12 0.3249269 0.5085631
9 13 0.16330409 0.26504803
9 14 0.19517544 0.3588868
9 15 0.18336989 0.32086465
9 16 0.35796782 0.5859127
9 17 0.3116228 0.49920633
9 18 0.34546784 0.590048
9 19 0.21542397 0.41236424
9 20 0.16637427 0.35587928
10 11 0.24272661 0.44026732
10 12 0.29623538 0.47293234
10 13 0.17244153 0.32088554
10 14 0.2124269 0.36305347
10 15 0.20201023 0.37552214
10 16 0.37902048 0.56390977
10 17 0.32174706 0.5419695
10 18 0.35062134 0.55736214
10 19 0.1825658 0.41652048
10 20 0.15526316 0.354741
11 12 0.24809942 0.4913847
11 13 0.15548246 0.27572054
11 14 0.20639619 0.35948205
11 15 0.16688597 0.29947788
11 16 0.39210525 0.5561508
11 17 0.3311769 0.5544068
11 18 0.38358918 0.563868
11 19 0.22605995 0.39813074
11 20 0.21553363 0.38147452
12 13 0.21513158 0.38970342
12 14 0.26516813 0.43611112
12 15 0.2558114 0.3885756
12 16 0.45526317 0.65716374
12 17 0.35599417 0.54845447
12 18 0.45328948 0.62575186
12 19 0.32313597 0.5056391
12 20 0.27975145 0.46232247
13 14 0.2066886 0.24714913
13 15 0.080409356 0.13373016
13 16 0.3255117 0.4046366
13 17 0.26165935 0.3933062
13 18 0.33881578 0.48668545
13 19 0.17185673 0.27215958
13 20 0.07923976 0.19906016
14 15 0.16699562 0.25198412
14 16 0.388962 0.53189224
14 17 0.34945175 0.52765245
14 18 0.44119152 0.5645259
14 19 0.22595029 0.4105681
14 20 0.12719299 0.34281537
15 16 0.34159356 0.44505012
15 17 0.31260964 0.40579575
15 18 0.38190788 0.49975982
15 19 0.19320175 0.30777988
15 20 0.12251462 0.24415206
16 17 0.43680555 0.62981415
16 18 0.45427632 0.61908937
16 19 0.3963816 0.5751566
16 20 0.35939327 0.5556182
17 18 0.5193348 0.6892962
17 19 0.2939693 0.50273604
17 20 0.3058845 0.48013785
18 19 0.38172513 0.5858605
18 20 0.31600878 0.52052004
19 20 0.112682745 0.22402883

Training and testing with combinations of 3 descriptors

The table below presents the performance of XMM models trained with combinations of 3 audio descriptors selected from this set : (0 1 2 3 12 14 16 17 18).
The models were tested with 18 hidden states and (0.1 0.05) as regularization values.

Numeric notation used for descriptors :

  • [0-11] : MFCC Coefficients
  • [12-20] : Frequency, Energy, Periodicity, AC1, Loudness, Centroid, Spread, Skewness, Kurtosis
Descriptors Test set accuracy Training set accuracy
0 1 2 0.53877926 0.75701756
0 1 3 0.5505848 0.77959484
0 1 12 0.5346857 0.77181495
0 1 14 0.48439327 0.71718884
0 1 16 0.4802997 0.727924
0 1 17 0.463231 0.7076963
0 1 18 0.5342105 0.74685675
0 2 3 0.46480262 0.7475042
0 2 12 0.47335526 0.7338659
0 2 14 0.41546053 0.6595865
0 2 16 0.4572734 0.68871135
0 2 17 0.51604534 0.7243943
0 2 18 0.46659356 0.7010965
0 3 12 0.50252194 0.7582289
0 3 14 0.5197003 0.7510756
0 3 16 0.50292397 0.7249478
0 3 17 0.50679827 0.7415727
0 3 18 0.5313962 0.7486216
0 12 14 0.46451023 0.7017335
0 12 16 0.5565424 0.7278822
0 12 17 0.47108918 0.71013994
0 12 18 0.51820177 0.7219716
0 14 16 0.50303364 0.674363
0 14 17 0.45 0.68392855
0 14 18 0.4811769 0.6981725
0 16 17 0.4652047 0.71359647
0 16 18 0.5410453 0.71424395
0 17 18 0.4743421 0.7403404
1 2 3 0.52032167 0.78437764
1 2 12 0.48804826 0.7242899
1 2 14 0.4621345 0.72081244
1 2 16 0.5322368 0.7456767
1 2 17 0.5137427 0.7635547
1 2 18 0.55105997 0.77901
1 3 12 0.49597952 0.71129907
1 3 14 0.54035086 0.75884504
1 3 16 0.5623903 0.7599833
1 3 17 0.52090645 0.77901
1 3 18 0.5468933 0.760495
1 12 14 0.4824927 0.70232875
1 12 16 0.5517544 0.78429407
1 12 17 0.4564693 0.72127194
1 12 18 0.5639985 0.75157685
1 14 16 0.51147664 0.73029447
1 14 17 0.48048246 0.7385547
1 14 18 0.54989034 0.73383457
1 16 17 0.45259503 0.6899018
1 16 18 0.53548974 0.7391395
1 17 18 0.5715278 0.7878655
2 3 12 0.4604532 0.7094507
2 3 14 0.35570174 0.6643588
2 3 16 0.4755117 0.6921888
2 3 17 0.47108918 0.77008147
2 3 18 0.56458337 0.7867481
2 12 14 0.41944444 0.663137
2 12 16 0.49597952 0.7249269
2 12 17 0.44495615 0.6862155
2 12 18 0.520614 0.74446536
2 14 16 0.46878654 0.68865914
2 14 17 0.44674706 0.7260756
2 14 18 0.5483918 0.7469716
2 16 17 0.49269006 0.7095342
2 16 18 0.5140351 0.7106412
2 17 18 0.5531798 0.7843463
3 12 14 0.363231 0.5911967
3 12 16 0.5022295 0.7309106
3 12 17 0.4125 0.66609234
3 12 18 0.4746345 0.7160192
3 14 16 0.52068717 0.73203844
3 14 17 0.40394738 0.6608187
3 14 18 0.4811769 0.73144317
3 16 17 0.50679827 0.7249373
3 16 18 0.49824563 0.72426904
3 17 18 0.5432383 0.77310986
12 14 16 0.50946635 0.71424395
12 14 17 0.41447368 0.6257728
12 14 18 0.4686769 0.6957811
12 16 17 0.46461988 0.7023601
12 16 18 0.5018275 0.7047097
12 17 18 0.52189327 0.7475355
14 16 17 0.44802633 0.6939432
14 16 18 0.4763158 0.7136487
14 17 18 0.56567985 0.7742795
16 17 18 0.47931287 0.7522661

Confusion matrix

This confusion matrix represents the accuracy for a model tested without markers (on each frame of 100ms). This models was trained with 49 hidden states, regularization of (0.42, 0.045), with the following descriptors : Mel Frequency Cepstral Coefficients #1 #2 #3 #4 #6 #8 #12, Frequency, Energy, Periodicity, AC1, and Loudness.

Z00000000000010/61091/12200011/1220
Q01/321/3221/323/320001/3203/32001/3200001/320
A0019/10049/1002/25001/20001/1000001/20001/1003/100
C01/7141/34201/2381/1191/7141/71429/357005/357002/3571/3575/714001/3570
B0011/1071/10775/107008/107000011/10700001/10700
E00050/2014/20180/20110/2017/67001/675/2011/672/2010001/675/675/201
F00017/94013/9416/470000000000011/9421/94
G03/340019/34003/850259/3401/1361/34013/6805/136003/851/680003/136
P0000010/113011/22662/11301/2261/226002/11311/2260039/22615/226
I0006/2900033/145000006/1450000076/145
R013/1921/57641/1921/2880015/320095/5761/641/5767/57600001/19213/288
H000124/673010/673059/67358/673020/673289/6732/67369/6736/6737/67302/67313/67314/673
J000007/171037/11400023/34270/17113/3427/1710005/34211/171
K0003/400003/161/160005/32091/1601/16000000
L0007/7100037/21300034/213017/7167/2131/21302/21300
M00041/12504/12507/1250013/12514/12500039/125001/1256/125
S000000025/4900024/4900000000
N000005/670000000000039/67023/67
O000007/710000001/71001/2130057/7117/213
T01/35000031/35003/1758/1753/350017/35001/35009/1750019/35017/25
ZQACBEFGPIRHJKLMSNOT
1) AC1: first-order autocorrelation coefficient
 


paco/cbmi18.txt · Dernière modification: 2018/06/05 12:28 par Paul Best