3 Diagrammatic representations
3.1 The histogram
Notice that the data we are dealing with is continuous , a measurement can take any value. Values are not restricted to whole number (integer) values for example. When we are using continuous data we normally represent frequency distributions pictorially by means of a histogram .
The class intervals are plotted on the horizontal axis and the frequencies on the vertical axis. Strictly speaking, the areas of the blocks forming the histogram represent the frequencies since this gives the histogram the necessary flexibility to deal with frequency tables whose class intervals are not constant. In our case, the class intervals are constant and the heights of the blocks are made proportional to the frequencies.
Sometimes the approximate shape of the distribution of data is indicated by a frequency polygon which is formed by joining the mid-points of the tops of the blocks forming the histogram with straight lines. Not all histograms are presented along with frequency polygons.
The complete diagram is shown in Figure 1.
Figure 1
Task!
The following data are the heights (to the nearest tenth of a centimetre) of a second sample of 30 students studying engineering statistics. Another way of determining class intervals is as follows.
Class intervals may be taken as (for example)
Class Interval (cm) | Tally | Frequency |
145 - | ||
150 - | ||
155 - | ||
160 - | ||
165 - | ||
170 - | ||
175 - | ||
180 - | ||
185 - |
The intervals are read as ‘145 cm. up to but not including 150 cm’, then ‘150 cm. up to but not including 155 cm’ and so on. The class intervals are chosen in such a way as to cover the data but still give a reasonable number of classes.
Organise the data into classes using the above method of defining class intervals and draw up a frequency table of the data. Use your table to represent the data diagrammatically using a histogram.
Hint:- All the data values lie in the range 145-190.
155.3 | 177.3 | 146.2 | 163.1 | 161.8 | 146.3 | 167.9 | 165.4 | 172.3 | 188.2 |
178.8 | 151.1 | 189.4 | 164.9 | 174.8 | 160.2 | 187.1 | 163.2 | 147.1 | 182.2 |
178.2 | 172.8 | 164.4 | 177.8 | 154.6 | 154.9 | 176.3 | 148.5 | 161.8 | 178.4 |
Class Interval (cm) | Tally | Frequency |
145 - | 1111 | 4 |
150 - | 111 | 3 |
155 - | 1 | 1 |
160 - | 11111 11 | 7 |
165 - | 11 | 2 |
170 - | 111 | 3 |
175 - | 11111 1 | 6 |
180 - | 1 | 1 |
185 - | 111 | 3 |
The histogram is shown below.
3.2 The bar chart
The bar chart looks superficially like the histogram, indeed, many people confuse the two. However, there are important differences that you should be aware of. Firstly, the bar chart is usually used to represent discrete data or categorical data. Secondly, the length of a bar is directly proportional to the frequency it represents. Remember that in the case of the histogram, the area of a bar is directly proportional to the frequency it represents and that the histogram is normally used to represent continuous data. To be clear, discrete data is data that can only take specific values. An example would be the amount of money you have in your pocket. The amount can only take certain values, you cannot, for example, have 34.229 pence in your pocket. Categorical data is, as you might expect, data which is organized by category. Favourite foods (pies, chips, pizzas, cakes and fruit for example) or preferred colours for cars (red, blue, silver or black for example).
Absenteeism can be a problem for some engineering firms. The following discrete data represents the number of days off taken by 50 employees of a small engineering company. Note that in the context of this example, the term discrete means that the data can only take whole number values (number of days off), nothing in between.
6 | 4 | 4 | 5 | 0 | 4 | 3 | 6 | 1 | 3 |
8 | 3 | 6 | 1 | 0 | 6 | 11 | 5 | 10 | 8 |
2 | 4 | 6 | 6 | 6 | 6 | 5 | 13 | 11 | 6 |
4 | 8 | 4 | 7 | 7 | 6 | 8 | 3 | 3 | 6 |
3 | 2 | 3 | 6 | 2 | 2 | 3 | 2 | 4 | 0 |
In order to construct a bar chart we follow a simple set of instructions akin to those to form a frequency distribution.
- Find the range of values covered by the data (0 - 13 in this case).
- Tally the number of absentees corresponding to each number of days taken off work.
- Draw a diagram with the range (0 - 13) on one axis and the number of days corresponding to each value (number of days off) on the other. The length of each bar is proportional to the frequency (that is proportional to the number of staff taking that number of days off).
The results appear as follows:
Figure 2
It is perfectly possible and acceptable to draw the bar chart with the bars appearing as vertically instead of horizontally.
Task!
The following data give the number of rejects in fifty batches of engine components delivered to a motor manufacturer. Draw two bar charts representing the data, one with the bars vertical and one with the bars horizontal. Draw one chart manually and one using a suitable computer package.
2 | 3 | 5 | 6 | 8 | 1 | 2 | 0 | 3 | 4 |
8 | 3 | 6 | 1 | 0 | 6 | 11 | 5 | 10 | 8 |
3 | 5 | 9 | 12 | 3 | 8 | 5 | 11 | 15 | 3 |
4 | 8 | 4 | 7 | 7 | 6 | 8 | 3 | 3 | 6 |
3 | 2 | 3 | 6 | 2 | 2 | 3 | 2 | 4 | 0 |
3.3 The pie chart
One of the more common diagrams that you must have seen in magazines and newspapers is the pie chart , examples of which can be found in virtually any text book on descriptive statistics. A pie chart is simply a circular diagram whose sectors are proportional to the quantity represented. Put more accurately, the angle subtended at the centre of the pie by a sector of the circle is proportional to the size of the subset of the whole set represented by the sector. The whole set is, of course, represented by the whole circle.
Pie charts demonstrate percentages and proportions well and are suitable for representing categorical data. The following data represents the time spent weekly on a variety of activities by the full-time employees of a local engineering company.
Hours spent on: | Males | Females |
Travel to and from work | 10.5 | 8.4 |
Paid activities in employment | 47.0 | 37.0 |
Personal sport and leisure activities | 8.2 | 3.6 |
Personal development | 5.6 | 6.4 |
Family activities | 8.4 | 18.2 |
Sleep | 56.0 | 56.0 |
Other | 32.3 | 28.4 |
To construct a pie chart showing how the male employees spend their time we proceed as follows. Note that the total number of hours spent is 168 ( $7\times 24$ ).
- Express the time spent on any given activity as a proportion of the total time spent;
- Multiply the number obtained by 360 thus converting the proportion to an angle;
- Draw a chart consisting of (in this case) 6 sectors having the angles given in the chart below subtended at the centre of the circle.
Hours spent on: | Males | Proportion of Time | Sector Angle |
Travel to and from work | 10.5 | $\frac{10.5}{168}$ | $\frac{10.5}{168}\times 360=22.5$ |
Paid activities in employment | 47.0 | $\frac{47}{168}$ | $\frac{47}{168}\times 360=100.7$ |
Personal sport and leisure activities | 8.2 | $\frac{8.2}{168}$ | $\frac{8.2}{168}\times 360=17.6$ |
Personal development | 5.6 | $\frac{5.6}{168}$ | $\frac{5.6}{168}\times 360=12$ |
Family activities | 8.4 | $\frac{8.4}{168}$ | $\frac{8.4}{168}\times 360=18$ |
Sleep | 56.0 | $\frac{56}{168}$ | $\frac{56}{168}\times 360=120$ |
Other | 32.3 | $\frac{32.3}{168}$ | $\frac{32.3}{168}\times 360=69.2$ |
The pie chart obtained is illustrated below.
Figure 3
Task!
Construct a pie chart for the female employees of the company and use both it and the pie chart in Figure 3 to comment on any differences between male and female employees that are illustrated.
Hours spent on: | Feales | Proportion of Time | Sector Angle |
Travel to and from work | 10.5 | $\frac{8.4}{168}$ | $\frac{8.4}{168}\times 360=18$ |
Paid activities in employment | 47.0 | $\frac{37}{168}$ | $\frac{37}{168}\times 360=79.3$ |
Personal sport and leisure activities | 8.2 | $\frac{3.6}{168}$ | $\frac{3.6}{168}\times 360=7.7$ |
Personal development | 5.6 | $\frac{6.4}{168}$ | $\frac{6.4}{168}\times 360=13.7$ |
Family activities | 8.4 | $\frac{18.2}{168}$ | $\frac{18.2}{168}\times 360=39$ |
Sleep | 56.0 | $\frac{56}{168}$ | $\frac{56}{168}\times 360=120$ |
Other | 38.4 | $\frac{38.4}{168}$ | $\frac{38.4}{168}\times 360=82.3$ |
Comments : Proportionally less time spent travelling, more on family activities etc.
3.4 Quartiles and the ogive
Later in this Workbook we shall be looking at the statistics derived from data which are placed in rank order . Ranking data simply means that the data are placed in order from the highest to the lowest or from the lowest to the highest. Three important statistics can be derived from ranked data, these are the Median, the Lower Quartile and the Upper Quartile. As you will see the Ogive or Cumulative Frequency Curve enables us to find these statistics for large data sets. The definitions of the three statistics referred to are given below.
Key Point 1
The Median ; this is the central value of a distribution. It should be noted that if the data set contains an even number of values, the median is defined as the average of the middle pair.
The Lower Quartile ; this is the least number which has 25% of the distribution below it or equal to it.
The Upper Quartile , this is the greatest number which has 25% of the distribution above it or equal to it.
For the simple data set 1.2, 3.0, 2.5, 5.1, 3.5, 4.1, 3.1, 2.4 the process is illustrated by placing the members of the data set below in rank order:
5.1, 4.1, 3.5, 3.1, 3.0, 2.5, 2.4, 1.2
Here we have an even number of values and so the median is calculated as follows:
Median = the average of the two central values, $\frac{3.1+3.0}{2}=3.05$ The lower quartile and the upper quartile are easily read off using the definition given above:
Lower Quartile = 2.4 Upper Quartile = 4.1
It can be difficult to decide on realistic values when the distribution contains only a small number of values.
Task!
Find the median, lower quartile and upper quartile for the data set:
5.0, 4.1, 3.5, 3.1, 3.0, 2.5, 2.4, 1.2, 0.7
$\phantom{\rule{2em}{0ex}}\text{LowerQuartile}=\left(1.2+2.4\right)\u22152=1.8$
$\text{Median}=3.0$
$\phantom{\rule{2em}{0ex}}\text{UpperQuartile}=\left(4.1+3.5\right)\u22152=3.8$
Note Check the answer carefully when you have completed the exercise, finding the median is easy but deciding on the values of the upper and lower quartiles is more difficult.
In the case of larger distributions the quantities can be approximated by using a cumulative frequency curve or ogive .
The cumulative frequency distribution for the distribution of the heights of the 30 students given earlier is shown below. Notice that here, the class intervals are defined in such a way that the frequencies accumulate (hence the term cumulative frequency ) as the table is built up.
Height | Cumulative Frequency |
less than 149.5 | 0 |
less than 154.5 | 2 |
less than 159.5 | 2 |
less than 164.5 | 6 |
less than 169.5 | 14 |
less than 174.5 | 19 |
less than 179.5 | 26 |
less than 184.5 | 30 |
To plot the ogive or cumulative frequency curve, we plot the heights on the horizontal axis and the cumulative frequencies on the vertical axis. The corresponding ogive is shown below.
Figure 4
In general, ogives are ‘S’- shaped curves. The three statistics defined above can be read off the diagram as indicated. For the data set giving the heights of the 30 students, the three statistics are defined as shown below.
- The Median , this is the average of the 15th and 16th values (170.4) since we have an even number of data;
- The Lower Quartile , 25% of 30 = 7.5 and so we take the average of the 7th and 8th values (164.9) read off from the bottom of the distribution to have 25% of the distribution less than or equal to it;
- The Upper Quartile , again 75% of 30 = 22.5 and so we take the average of the 22nd and 23rd values (177) read off from the top of the distribution to have 25% of the distribution greater than or equal to it.