Introducing distVis() – calculating distances and clustering numeric data

In version 1.2, we also have a new function for creating a distance visualization of a matrix of numeric data. This figure was developed and contributed by Joseph Paulson and Florin Chelaru from the University of Maryland Center for Bioinformatics and Computational Biology (CBCB).

A distance matrix can be generated from a matrix of numeric values by choosing a distance metric to essentially say how “far” one value is from another. Once these distances are calculated, the data can be clustered by distance. The distVis() function allows for six different distance metrics (e.g. Euclidean, Manhattan) and seven different clustering methods (e.g. average, centroid, Ward), so that users can see how their data reorganizes under different distance calculations and clusterings.

In this example, we take the mtcars dataset in R, convert it to a matrix using as.matrix() , and visualize using distVis():

testData <- as.matrix(mtcars)
distVis(testData)

The result looks like this.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s