Mapas auto-organizativos. Self self organizing map (SOM)

# pip install susi

Carga de librerías necesarias

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import susi
from susi.SOMPlots import plot_nbh_dist_weight_matrix, plot_umatrix, plot_estimation_map

Carga de datos. Datos vitivinícolas

ColNames = ["Cultivars","Alcohol","Malic_acid","Ash","Alcalinity_of_ash",
"Magnesium","Total_phenols","Flavanoids","Nonflavanoid_phenols",
"Proanthocyanins","Color intensity","Hue","OD280/OD315","Proline"]
df = pd.read_csv('./data/wine.csv', header=None, names=ColNames)
df.head()
Cultivars Alcohol Malic_acid Ash Alcalinity_of_ash Magnesium Total_phenols Flavanoids Nonflavanoid_phenols Proanthocyanins Color intensity Hue OD280/OD315 Proline
0 1 14.23 1.71 2.43 15.6 127 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065
1 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050
2 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185
3 1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480
4 1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735

Se estándariza el conjunto de datos de entrada

Para lo que se hace uso de la librería StandardScaler de sklearn. Si además hubiese algún atributo etiquetado sin valores continuos previamente a la estandarización se etiquetarían estas columnas del dataset haciendo uso de la librería LabelEncoder de sklearn.preprocessing.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
df[["Alcohol","Malic_acid","Ash","Alcalinity_of_ash", "Magnesium","Total_phenols","Flavanoids","Nonflavanoid_phenols",
"Proanthocyanins","Color intensity","Hue","OD280/OD315","Proline"]] = sc.fit_transform(df[["Alcohol",
"Malic_acid","Ash","Alcalinity_of_ash", "Magnesium","Total_phenols","Flavanoids",
"Nonflavanoid_phenols", "Proanthocyanins","Color intensity","Hue","OD280/OD315","Proline"]])
df.head()
Cultivars Alcohol Malic_acid Ash Alcalinity_of_ash Magnesium Total_phenols Flavanoids Nonflavanoid_phenols Proanthocyanins Color intensity Hue OD280/OD315 Proline
0 1 1.518613 -0.562250 0.232053 -1.169593 1.913905 0.808997 1.034819 -0.659563 1.224884 0.251717 0.362177 1.847920 1.013009
1 1 0.246290 -0.499413 -0.827996 -2.490847 0.018145 0.568648 0.733629 -0.820719 -0.544721 -0.293321 0.406051 1.113449 0.965242
2 1 0.196879 0.021231 1.109334 -0.268738 0.088358 0.808997 1.215533 -0.498407 2.135968 0.269020 0.318304 0.788587 1.395148
3 1 1.691550 -0.346811 0.487926 -0.809251 0.930918 2.491446 1.466525 -0.981875 1.032155 1.186068 -0.427544 1.184071 2.334574
4 1 0.295700 0.227694 1.840403 0.451946 1.281985 0.808997 0.663351 0.226796 0.401404 -0.319276 0.362177 0.449601 -0.037874
X = df.values[:,1:14]
y = df.values[:,0]
X.shape, y.shape
((178, 13), (178,))

Se clasifica

som = susi.SOMClassifier(
    n_rows=25,
    n_columns=25,
    n_iter_unsupervised=1000,
    n_iter_supervised=1000,
    random_state=0)
som.fit(X, y)
y_pred = som.predict(X)
print("Accuracy: {0:.1f} %".format(som.score(X, y)*100))
Accuracy: 91.0 %

Se imprime la matriz U

u_matrix = som.get_u_matrix()
plot_umatrix(u_matrix, 25, 25)
plt.show()
_images/06MapasAutoOrganizativos_SOM_12_0.png

Se imprime la matriz de vecindad

plot_nbh_dist_weight_matrix(som)
plt.show()
_images/06MapasAutoOrganizativos_SOM_14_0.png

Se imprime el mapa de estimaciones

estimation_map = som.get_estimation_map().squeeze()
plot_estimation_map(estimation_map)
plt.show()
_images/06MapasAutoOrganizativos_SOM_16_0.png