Unsupervised Learning

  • Clustering
  • Dimensionality Reduction: Principal Component Analysis (PCA)

Applications of clustering

  • Market segmentation
  • Social network analysis
  • Organize computing clusters
  • Astronomical data analysis

K-means algorithm

Two step:

  1. cluster assignment
  2. move cluster centroid

Input:

  • $K$ (number of clusters)
  • Training set ${x^{(1)},x^{(2)},…,x^{(m)}}$

$x^{(i)}\in\mathbb{R}^n$ (drop $x_0=1$ convention)

Randomly initialize $K$ cluster centroids $\mu_1,\mu_2,…,\mu_K\in\mathbb{R}^n$

Repeat{

    for $i=1$ to $m$

        $c^{(i)}:=$ index (from 1 to $K$) of cluster centriod closest to $x^{(i)}$

    for $k=1$ to $K$

        $\mu_k:=$ average (mean) of points assigned to cluster $k$