Finding Outliers in a Data Set

Introduction

Outliers are data points that dont fit the pattern of rest of the numbers. They are the extremely high or extremely low values in the data set.

A simple way to find an outlier is to examine the numbers in the data set. We will see that most numbers are clustered around a range and some numbers are way too low or too high compared to rest of the numbers. Such numbers are known as outliers.

Other definition of an outlier

A data point that is distinctly separate from the rest of the data. One definition of outlier is any data point more than 1.5 interquartile ranges IQRs below the first quartile or above the third quartile. The interquartile range IQR is the difference between the third quartile and the first quartile of the data set.

Example 1

Find the outliers for the data 0, 2, 5, 6, 9, 12, 35.

Solution

For given data set, we have the following five-number summary.

minimum = 0

first quartile = 2

median = 6

third quartile = 12

maximum = 35

IQR = 12 2 = 10, so 1.5IQR = 15.

To determine if there are outliers we must consider the numbers that are 1.5IQR or 15 beyond the quartiles.

first quartile 1.5IQR = 2 15 = 13

third quartile + 1.5IQR = 12 + 15 = 27

Since 35 is outside the interval from 13 to 27, 35 is the outlier in this data set.

Example 2

Find the outliers in the given data set below.

28, 26, 29, 30, 81, 32, 37

Solution

Step 1:

The data that is different from other numbers in the given set is 81

Step 2:

So the outlier for this data set is 81

Example 3

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *