## What are Categorical data?

Categorical data are type of data collected to record qualities or characteristics about the individual, such as eye color(Black or Blue), gender(Male or Female), or opinion on some issue (using categories such as agree, disagree, or no opinion).

In categorical where individuals are placed into groups, such as gender or political affiliation, then they are summarized using the number of individuals in each group which is called the frequency or the percentage of individuals in each group (the relative frequency).

## How to **Summarize Categorical Data**?

You can **summarize categorical data** by first sorting the values according to the categories of the variable. Then, placing the count, amount, or percentage of each category into a **summary table** or into one of several types of charts.

### What is a **Summary Table**?

A **summary table** is a two-column table in which the category names are listed in the first column and the count, amount, or percentage of values are listed in a second column. Sometimes, there are additional columns that represents the same data in more than one way (for example, as counts and percentages).

If your data contains more than one categories, use a Contingency table. See *section 5* for more on contingency tables.

### Example

When asked about specific issues they were worried about when shopping online can be presented using a summary table:

Source: Data extracted and adapted from “The good, bad and ugly of online shopping” MarkMonitor Online Barometer,Global Online Shopping Survey 2018, p. 6.

### Interpretation

Summary tables enable you to see the big picture about a set of data. In this example, you can conclude that 37% shop online mainly for better prices and that 29% shop online mainly to avoid holiday shopping.

## Which graphs are used for **Categorical data**?

The main purpose of any data is to organize and display them correctly. In this this article , I have discribed the most common data displays that are used to summarize categorical data and helpful tips for evaluating them.

### Bar Chart

A bar chart conatins rectangles known as bars. The length of each bar represents the count, amount, or percentage of responses of one category.

#### Example

This percentage bar chart presents the data of the summary table discussed in the previous example:

#### Interpretation

A bar chart is better than a summary table at making the point that the category better prices is the single largest category for this example.

For most people, scanning a bar chart is easier than scanning a column of numbers in which the numbers are unordered, as they are in the bill payment summary table.

#### Guidelines for evaluating bar graphs

### Pie Chart

A pie chart is in a form of a circle which contains wedge-shaped areas known as pie slices. Each pie slices represent the count, amount, or percentage of each category and the entire circle or the pie represents the total.

#### Example

This pie chart presents the data of the summary table discussed in the preceding two examples:

### Interpretation

The pie chart enables you to **see each category’s portion of the whole**. You can see that more young adults shopped online for better prices or to avoid holiday shopping, a small number shopped online for better selection, and that hardly anyone shopped online because of direct shipment.

### Guidlines to evaluate a pie chart for statistical correctness:

### Pareto Chart

Pareto chart is a special type of bar chart that presents the counts, amounts, or percentages of each category in descending order left to right, and also contains a superimposed plotted line that represents a running cumulative percentage.

#### Example

Causes of Incomplete ATM Transactions

Source: Data extracted from A. Bhalla, “Don´t Misuse the Pareto Principle,” Six Sigma Forum Magazine, May 2009, pp. 15–18.

This Pareto chart uses the data of the table that immediately precedes it to highlight the causes of incomplete ATM transactions.

### Interpretation

When you have many categories, a Pareto chart enables you to focus on the most important categories by visually separating the vital few from the trivial many categories.

For the incomplete ATM transactions data, the Pareto chart shows that two categories, warped card jammed and card unreadable, account for more than 80% of all defects, and that those two categories combined with the ATM malfunctions and ATM out of cash categories account for more than 90% of all defects.

To create a pereto chart in excel refer Excel Easy.

## Contingency Table - A Two-Way Cross-Classification Table

**Contingency tables** (also called crosstabs or two-way tables) is a multicolumn table that **presents the count or percentage of responses for two categorical variables**. In a two-way table, the categories of one of the variables form the rows of the table, while the categories of the second variable form the columns.

The “outside” of the table contains a special row and a special column that contain the totals. Cross-classification tables are also known as cross-tabulation tables.

### Example

*Downloads Cross-Classified by Type of Call-to-Action Button*

This two-way cross-classification table summarizes the results of a webpage design study that investigated whether a new call to action button would increase the number of downloads. Tables showing row percentages, column percentages, and overall total percentages follow.

Downloads | Original | New | Total |
---|---|---|---|

Yes | 351 | 451 | 802 |

No | 3291 | 3105 | 6396 |

Total | 3642 | 3556 | 7198 |

### Row percentage table

Downloads | Original | New | Total |
---|---|---|---|

Yes | \frac{351}{802}\approx 44% | \frac{451}{802}\approx 56%% | \frac{802}{802}\approx 100% |

No | \frac{3291}{6396}\approx 51% | \frac{3105}{6396}\approx 49% | \frac{6396}{6396}\approx 100% |

Total | \frac{3642}{7198}\approx 50% | \frac{6}{8}\approx 75% | \frac{7198}{7198}\approx 100% |

### Column Percenatge Table

Downloads | Original | New | Total |
---|---|---|---|

Yes | \frac{351}{3642}\approx 9% | \frac{451}{3556}\approx 13%% | \frac{802}{7198}\approx 11% |

No | \frac{3291}{3642}\approx 90% | \frac{3105}{3556}\approx 49% | \frac{6396}{7198}\approx 100% |

Total | \frac{3642}{3642}\approx 100% | \frac{3556}{3556}\approx 100% | \frac{7198}{7198}\approx 100% |

### Overall Percenatge Table

Downloads | Original | New | Total |
---|---|---|---|

Yes | \frac{351}{7198}\approx 5% | \frac{451}{7198}\approx 6%% | \frac{802}{7198}\approx 11% |

No | \frac{3291}{7198}\approx 46% | \frac{3105}{7198}\approx 43% | \frac{6396}{7198}\approx 89% |

Total | \frac{3642}{7198}\approx 51% | \frac{3556}{7198}\approx 49% | \frac{6396}{7198}\approx 100% |

### Interpretation

The simplest two-way table contains a row variable that has two categories and a column variable that has two categories. This creates a table that has two rows and two columns in its inner part. Each inner cell represents the count or percentage of a pairing, or cross-classifying, of categories from each variable.

First Column category | Second Column Category | Total | |
---|---|---|---|

First Row category | Count of percenatge for first row and first column | Cound of percentage for first row and second column | Toal for first row category |

Second row catgeory | Count of percentage for second row and firts column | Count of percenatge for second row and second column | Total for second row category |

Total | Total for first column category | Total for second column category | Overall Total |

Two-way tables can reveal the combination of values that occur most often in data. In this example, the tables reveal that the new call to action button is more likely to have downloads than the original call to action button.

Because the number of visitors to each webpage was unequal in this example, you can see this pattern best in the Column Percentages table. That table shows that the new button is more likely to increase downloads than the original one.

PivotTables create worksheet summary tables from sample data and are a good way of creating a two-way table from sample data.

## General **Guidelines to follow **when creating your own charts

