Cluster Analysis is an umbrella term for a number of techniques and algorithms to group similar observations together. It can be an excellent tool for reducing the complexity of data as part of the data-mining process. There are numerous real-world applications of cluster analysis: marketers use it to identify distinct groups in their customer base to develop targeted programs, underwriters use it to group policy holders by claim cost, health-care organizations identify high-risk patients, and police forces use it to pinpoint high-crime areas.
However, let’s focus on a more important reason for clustering, the National Football League. With the playoffs well underway, a retrospective look at how the teams shake out based on regular season data might be of interest; not focusing on wins, but pure statistical performance. Having compiled team statistics from the web, let’s get started on the analysis using R.