Práctica 2. Continuación

SOM como un Clasificador

Los mapas auto-organizativos de Kohonen también se pueden utilizar como clasificadores mediante un aprendizaje supervisado que utiliza parte de los datos para entrenar la red y ajustar los pesos de cada neurona de manera que se minimize el error de salida y deja otra parte de los datos para comprobar si el entrenamiento ha sido eficaz.

Ejemplo 1. Selección de un conjunto de entrenamiento

Elige de manera aleatoria un conjunto de entrenamiento que contenga un 70% de los datos de som.wine (manteniendo la primera columna que contiene la información de la bodega de origen). A partir de este conjunto, creamos el conjunto de entrenamiento y el conjunto de comprobación escalados.

Para que la comparación sea adecuada, el segundo se debe escalar con referencia al conjunto de entrenamiento. A partir de éstos, creamos dos listas donde detallamos el conjunto que se va comprobar respecto a que atributo, en nuestro caso, las bodegas (Cultivars)

Las bodegas son:

Cultivars grapes = (Nebbiolo, Barberas, Grignolino)

library("kohonen")
data(wines)
Swine<-scale(wines)
train.wine = sample(nrow(Swine), 125)
train.wine
  1. 61
  2. 54
  3. 91
  4. 71
  5. 38
  6. 119
  7. 70
  8. 128
  9. 33
  10. 170
  11. 60
  12. 56
  13. 173
  14. 112
  15. 105
  16. 30
  17. 81
  18. 103
  19. 58
  20. 154
  21. 53
  22. 177
  23. 15
  24. 153
  25. 24
  26. 145
  27. 10
  28. 166
  29. 152
  30. 167
  31. 169
  32. 75
  33. 69
  34. 101
  35. 146
  36. 159
  37. 45
  38. 114
  39. 64
  40. 57
  41. 31
  42. 106
  43. 55
  44. 68
  45. 19
  46. 21
  47. 72
  48. 39
  49. 168
  50. 95
  51. 108
  52. 123
  53. 99
  54. 43
  55. 157
  56. 97
  57. 160
  58. 87
  59. 22
  60. 28
  61. 109
  62. 118
  63. 32
  64. 4
  65. 92
  66. 136
  67. 67
  68. 1
  69. 20
  70. 155
  71. 89
  72. 37
  73. 135
  74. 172
  75. 12
  76. 93
  77. 139
  78. 176
  79. 142
  80. 175
  81. 34
  82. 73
  83. 137
  84. 125
  85. 25
  86. 79
  87. 11
  88. 2
  89. 84
  90. 115
  91. 174
  92. 111
  93. 36
  94. 127
  95. 85
  96. 156
  97. 5
  98. 150
  99. 23
  100. 120
  101. 63
  102. 46
  103. 102
  104. 90
  105. 116
  106. 148
  107. 3
  108. 17
  109. 6
  110. 140
  111. 129
  112. 158
  113. 49
  114. 130
  115. 8
  116. 138
  117. 82
  118. 132
  119. 121
  120. 144
  121. 124
  122. 163
  123. 151
  124. 14
  125. 133
Xtrain.wine = scale(wines[train.wine, ])
Xtest.wine = scale(wines[-train.wine, ], 
                   center = attr(Xtrain.wine, "scaled:center"), 
                   scale = attr(Xtrain.wine, "scaled:scale"))
head(Xtest.wine)
A matrix: 6 × 13 of type dbl
alcoholmalic acidashash alkalinitymagnesiumtot. phenolsflavonoidsnon-flav. phenolsproanthcol. int.col. hueOD ratioproline
1.3296123-0.2385973 0.78284260-0.67864943 1.40814300.54499680.5422879-0.4872614-0.5992395-0.02575895 0.46269191.359328001.7616771
1.0806026-0.9436719-0.51832324-1.15556399-0.15133741.16893871.1914430-1.1982460 0.4961288 0.85534002 0.25633941.318476260.9668342
2.1886959-0.6087614-0.05908824-2.52669337-0.62596191.36597301.7391676 0.4607180 2.2487182 0.11635379 1.24683160.201862061.3006682
1.6284240-0.4413062 1.20380802 0.03672242 1.34033950.87338731.1812999-0.3292649 0.7152025 0.44118290 0.50396240.092924091.7139865
1.4914687-0.7321495 0.28533801-1.00652819 0.52669751.69436341.9826007-0.4082632 0.5143850 1.45627388 1.16429060.324417282.9857352
0.4954298-0.5735077 0.82111218-1.12575683-0.49035490.95548490.9784390-0.2502666-0.2341167-0.12726805-0.11509520.869107131.4437399
trainingdata = list(measurements = Xtrain.wine,
                    vintages = vintages[train.wine])
trainingdata$vintages
  1. Grignolino
  2. Barolo
  3. Grignolino
  4. Grignolino
  5. Barolo
  6. Grignolino
  7. Grignolino
  8. Grignolino
  9. Barolo
  10. Barbera
  11. Grignolino
  12. Barolo
  13. Barbera
  14. Grignolino
  15. Grignolino
  16. Barolo
  17. Grignolino
  18. Grignolino
  19. Barolo
  20. Barbera
  21. Barolo
  22. Barbera
  23. Barolo
  24. Barbera
  25. Barolo
  26. Barbera
  27. Barolo
  28. Barbera
  29. Barbera
  30. Barbera
  31. Barbera
  32. Grignolino
  33. Grignolino
  34. Grignolino
  35. Barbera
  36. Barbera
  37. Barolo
  38. Grignolino
  39. Grignolino
  40. Barolo
  41. Barolo
  42. Grignolino
  43. Barolo
  44. Grignolino
  45. Barolo
  46. Barolo
  47. Grignolino
  48. Barolo
  49. Barbera
  50. Grignolino
  51. Grignolino
  52. Grignolino
  53. Grignolino
  54. Barolo
  55. Barbera
  56. Grignolino
  57. Barbera
  58. Grignolino
  59. Barolo
  60. Barolo
  61. Grignolino
  62. Grignolino
  63. Barolo
  64. Barolo
  65. Grignolino
  66. Barbera
  67. Grignolino
  68. Barolo
  69. Barolo
  70. Barbera
  71. Grignolino
  72. Barolo
  73. Barbera
  74. Barbera
  75. Barolo
  76. Grignolino
  77. Barbera
  78. Barbera
  79. Barbera
  80. Barbera
  81. Barolo
  82. Grignolino
  83. Barbera
  84. Grignolino
  85. Barolo
  86. Grignolino
  87. Barolo
  88. Barolo
  89. Grignolino
  90. Grignolino
  91. Barbera
  92. Grignolino
  93. Barolo
  94. Grignolino
  95. Grignolino
  96. Barbera
  97. Barolo
  98. Barbera
  99. Barolo
  100. Grignolino
  101. Grignolino
  102. Barolo
  103. Grignolino
  104. Grignolino
  105. Grignolino
  106. Barbera
  107. Barolo
  108. Barolo
  109. Barolo
  110. Barbera
  111. Grignolino
  112. Barbera
  113. Barolo
  114. Barbera
  115. Barolo
  116. Barbera
  117. Grignolino
  118. Barbera
  119. Grignolino
  120. Barbera
  121. Grignolino
  122. Barbera
  123. Barbera
  124. Barolo
  125. Barbera
Levels:
  1. 'Barbera'
  2. 'Barolo'
  3. 'Grignolino'
testdata = list(measurements = Xtest.wine, vintages = vintages[-train.wine])

Ejercicio 1. Clasificación del conjunto de entrenamiento y cálculo de errores

Utiliza un SOM del mismo tamaño y malla que en la sección anterior y el comando supersom del paquete kohonen para obtener la clasificación del conjunto testdata.

Halla la matriz de confusion y dertermina el error cometido en la clasificación.

mygrid = somgrid(5, 5, "hexagonal")
som.wines = supersom(trainingdata, grid = mygrid)
som.prediction = predict(som.wines, newdata = testdata)
som.prediction$predictions
$measurements
A matrix: 52 × 13 of type dbl
alcoholmalic acidashash alkalinitymagnesiumtot. phenolsflavonoidsnon-flav. phenolsproanthcol. int.col. hueOD ratioproline
1 0.66351132-0.61316816 0.93592094-0.52961362 0.67360512 0.5778359 0.70795769-0.02643809-0.01504306-0.26329024 0.93042438 0.4401639 0.940339424
2 1.59418518-0.65282860-0.26000355-1.34931053-0.40560050 0.4218504 0.70711244-0.68475716 0.46874464 0.06052378 0.64840923 0.6205924 1.566940577
3 1.59418518-0.65282860-0.26000355-1.34931053-0.40560050 0.4218504 0.70711244-0.68475716 0.46874464 0.06052378 0.64840923 0.6205924 1.566940577
4 0.71598123-0.51684992 0.91405260-0.27412368 0.58481483 1.0985693 1.12623766-0.46469050 0.81169926 0.46612514 0.69262763 0.4002848 1.584540670
5 1.30649001-0.55210366-0.04268699-1.01930269 0.50732512 1.5559703 1.50877547-0.88225288 1.17682205 0.84721929 0.48627508 0.6317780 2.207697504
6 0.27665692-0.07743734-0.09189074-0.93413937-0.06416148 0.4324058 0.64806541-0.76939818 0.01364516-0.44107617 0.45090036 1.1647959 0.558057838
7 1.59418518-0.65282860-0.26000355-1.34931053-0.40560050 0.4218504 0.70711244-0.68475716 0.46874464 0.06052378 0.64840923 0.6205924 1.566940577
8 1.59418518-0.65282860-0.26000355-1.34931053-0.40560050 0.4218504 0.70711244-0.68475716 0.46874464 0.06052378 0.64840923 0.6205924 1.566940577
9 0.27665692-0.07743734-0.09189074-0.93413937-0.06416148 0.4324058 0.64806541-0.76939818 0.01364516-0.44107617 0.45090036 1.1647959 0.558057838
10 1.33850555 0.34938463-0.11922615-1.15556399 1.13692900 1.1079519 1.03350122-1.07410586 0.49091281 0.14071597 0.04409105 1.1297801 0.623916250
11 0.27665692-0.07743734-0.09189074-0.93413937-0.06416148 0.4324058 0.64806541-0.76939818 0.01364516-0.44107617 0.45090036 1.1647959 0.558057838
12 1.33850555 0.34938463-0.11922615-1.15556399 1.13692900 1.1079519 1.03350122-1.07410586 0.49091281 0.14071597 0.04409105 1.1297801 0.623916250
13 0.27665692-0.07743734-0.09189074-0.93413937-0.06416148 0.4324058 0.64806541-0.76939818 0.01364516-0.44107617 0.45090036 1.1647959 0.558057838
14 1.30649001-0.55210366-0.04268699-1.01930269 0.50732512 1.5559703 1.50877547-0.88225288 1.17682205 0.84721929 0.48627508 0.6317780 2.207697504
15 0.71598123-0.51684992 0.91405260-0.27412368 0.58481483 1.0985693 1.12623766-0.46469050 0.81169926 0.46612514 0.69262763 0.4002848 1.584540670
16 1.30649001-0.55210366-0.04268699-1.01930269 0.50732512 1.5559703 1.50877547-0.88225288 1.17682205 0.84721929 0.48627508 0.6317780 2.207697504
17 1.59418518-0.65282860-0.26000355-1.34931053-0.40560050 0.4218504 0.70711244-0.68475716 0.46874464 0.06052378 0.64840923 0.6205924 1.566940577
18 1.30649001-0.55210366-0.04268699-1.01930269 0.50732512 1.5559703 1.50877547-0.88225288 1.17682205 0.84721929 0.48627508 0.6317780 2.207697504
19-0.72649655-0.84546507-1.37118824-0.13786237-0.83905856-0.7028869-0.21554266-0.52111785-0.48448665-0.76300502 0.53933717 0.5364573-0.799079640
20-0.72649655-0.84546507-1.37118824-0.13786237-0.83905856-0.7028869-0.21554266-0.52111785-0.48448665-0.76300502 0.53933717 0.5364573-0.799079640
21-0.74712879-0.30910472-0.96225041-0.57134365-0.91073654 0.7256116 0.70863389-0.66105767 0.84664673-0.80128845 1.18905290 0.5368463-0.919804922
22-0.74712879-0.30910472-0.96225041-0.57134365-0.91073654 0.7256116 0.70863389-0.66105767 0.84664673-0.80128845 1.18905290 0.5368463-0.919804922
23-1.61804031-0.16147972 0.44798374-0.03779548-0.59206011 0.6681432 0.77304225-0.88225288 1.49108845-0.88351082-0.50716504 1.0359184-0.602980542
24-0.72649655-0.84546507-1.37118824-0.13786237-0.83905856-0.7028869-0.21554266-0.52111785-0.48448665-0.76300502 0.53933717 0.5364573-0.799079640
25-0.77950005-0.38666293-0.64078591-0.10635195-0.96497933-1.3728035-0.68907810 0.85570945-0.25967533-0.81752991-0.03585582-0.5416396-0.659732325
26-0.81187132-0.93926517-1.64727595-0.58922795 3.81516703-0.1528329-0.20322611-1.11924774 2.39476736-0.96979356 1.12302009 0.3652690 0.275320863
27-0.72649655-0.84546507-1.37118824-0.13786237-0.83905856-0.7028869-0.21554266-0.52111785-0.48448665-0.76300502 0.53933717 0.5364573-0.799079640
28-0.77950005-0.38666293-0.64078591-0.10635195-0.96497933-1.3728035-0.68907810 0.85570945-0.25967533-0.81752991-0.03585582-0.5416396-0.659732325
29-1.55060018-0.58085224 0.71905996 0.98061583-0.58075953-0.3991258-0.07981902 0.94787412-0.26758632-0.96302628 1.61138778 0.1088109-0.751616164
30-1.04220532-0.02487151-0.16432959 0.78935322-1.30399681-0.2185110-0.10940291 0.87545903-0.51252286-1.06724229-0.14604808 0.6614441-0.841433412
31-0.74712879-0.30910472-0.96225041-0.57134365-0.91073654 0.7256116 0.70863389-0.66105767 0.84664673-0.80128845 1.18905290 0.5368463-0.919804922
32-0.81187132-0.93926517-1.64727595-0.58922795 3.81516703-0.1528329-0.20322611-1.11924774 2.39476736-0.96979356 1.12302009 0.3652690 0.275320863
33-0.74712879-0.30910472-0.96225041-0.57134365-0.91073654 0.7256116 0.70863389-0.66105767 0.84664673-0.80128845 1.18905290 0.5368463-0.919804922
34-0.72649655-0.84546507-1.37118824-0.13786237-0.83905856-0.7028869-0.21554266-0.52111785-0.48448665-0.76300502 0.53933717 0.5364573-0.799079640
35-0.72649655-0.84546507-1.37118824-0.13786237-0.83905856-0.7028869-0.21554266-0.52111785-0.48448665-0.76300502 0.53933717 0.5364573-0.799079640
36-0.77950005-0.38666293-0.64078591-0.10635195-0.96497933-1.3728035-0.68907810 0.85570945-0.25967533-0.81752991-0.03585582-0.5416396-0.659732325
37-1.61804031-0.16147972 0.44798374-0.03779548-0.59206011 0.6681432 0.77304225-0.88225288 1.49108845-0.88351082-0.50716504 1.0359184-0.602980542
38-1.55060018-0.58085224 0.71905996 0.98061583-0.58075953-0.3991258-0.07981902 0.94787412-0.26758632-0.96302628 1.61138778 0.1088109-0.751616164
39-0.99684997-0.23104289-1.04863033 0.24963071-0.75188264 0.2142607 0.14091302-0.11484094 0.02668526-1.16440100-0.26838566 0.5170040-1.095669591
40-0.14577029 2.12780949-0.53745803 0.70738353-0.45645312 0.7502408 0.63357534-0.05277086 0.73345866-1.02663865 0.04998684 0.7261260-1.015504007
41-0.99684997-0.23104289-1.04863033 0.24963071-0.75188264 0.2142607 0.14091302-0.11484094 0.02668526-1.16440100-0.26838566 0.5170040-1.095669591
42-0.25159943-0.01532362-0.17389699 0.38447263 0.57189987-1.3377752-0.80673745-1.22457879-1.29297283-0.07312986-0.88547804-1.7272478-0.458584082
43-0.37921691 0.59096712-0.37959600 0.20438770 0.17072920-0.9738091-1.29994315 0.95445731-0.72703250 1.36593078-1.28098709-1.2398639 0.009048492
44-0.07106738 0.65596619 0.26620322 0.38447263-0.03833158-1.1106384-1.43560642 0.73721202-1.33557048-0.13741896-0.82357228-0.8421268-0.381749268
45 0.22359414 2.22916397-0.03995345 0.43415123-0.68246477-1.2091556-1.47448810 1.36919833-1.03434418 0.29568653-1.02304641-1.3981644-0.397646126
46 0.22359414 2.22916397-0.03995345 0.43415123-0.68246477-1.2091556-1.47448810 1.36919833-1.03434418 0.29568653-1.02304641-1.3981644-0.397646126
47-0.37921691 0.59096712-0.37959600 0.20438770 0.17072920-0.9738091-1.29994315 0.95445731-0.72703250 1.36593078-1.28098709-1.2398639 0.009048492
48-0.07106738 0.65596619 0.26620322 0.38447263-0.03833158-1.1106384-1.43560642 0.73721202-1.33557048-0.13741896-0.82357228-0.8421268-0.381749268
49-0.07106738 0.65596619 0.26620322 0.38447263-0.03833158-1.1106384-1.43560642 0.73721202-1.33557048-0.13741896-0.82357228-0.8421268-0.381749268
50 0.85649386 0.97765649 0.19604232 0.38447263-0.37734905-0.6974138-1.22091190 0.97420689-0.24020211 1.58823570-1.42887308-1.3278086-0.355254505
51 0.22359414 2.22916397-0.03995345 0.43415123-0.68246477-1.2091556-1.47448810 1.36919833-1.03434418 0.29568653-1.02304641-1.3981644-0.397646126
52-0.37921691 0.59096712-0.37959600 0.20438770 0.17072920-0.9738091-1.29994315 0.95445731-0.72703250 1.36593078-1.28098709-1.2398639 0.009048492
$vintages
  1. Barolo
  2. Barolo
  3. Barolo
  4. Barolo
  5. Barolo
  6. Barolo
  7. Barolo
  8. Barolo
  9. Barolo
  10. Barolo
  11. Barolo
  12. Barolo
  13. Barolo
  14. Barolo
  15. Barolo
  16. Barolo
  17. Barolo
  18. Barolo
  19. Grignolino
  20. Grignolino
  21. Grignolino
  22. Grignolino
  23. Grignolino
  24. Grignolino
  25. Grignolino
  26. Grignolino
  27. Grignolino
  28. Grignolino
  29. Grignolino
  30. Grignolino
  31. Grignolino
  32. Grignolino
  33. Grignolino
  34. Grignolino
  35. Grignolino
  36. Grignolino
  37. Grignolino
  38. Grignolino
  39. Grignolino
  40. Grignolino
  41. Grignolino
  42. Barbera
  43. Barbera
  44. Barbera
  45. Barbera
  46. Barbera
  47. Barbera
  48. Barbera
  49. Barbera
  50. Barbera
  51. Barbera
  52. Barbera
Levels:
  1. 'Barbera'
  2. 'Barolo'
  3. 'Grignolino'
confusion<-table(vintages[-train.wine],som.prediction$predictions[["vintages"]])
confusion
            
             Barbera Barolo Grignolino
  Barbera         11      0          0
  Barolo           0     18          0
  Grignolino       0      0         23
error_rate <- sum(sum(confusion-diag(diag(confusion))))/sum(colSums(confusion))
error_rate
0

Ejercicio

Repite la práctica (clustering y clasificación) usando:

  • subconjuntos de los datos wine (nota: scatter matrix)

  • variando los parámetros del SOM (neuronas, funciones) Compara los resultados obtenidos.