Navillusgauthor information ► copyright and license information ►copyright accreditation council for graduate medical educationthis article has been cited by other articles in -type scales are frequently used in medical education and medical education research. A sizable percentage of the educational research manuscripts submitted to the journal of graduate medical education employ a likert scale for part or all of the outcome assessments. Thus, understanding the interpretation and analysis of data derived from likert scales is imperative for those working in medical education and education research. The goal of this article is to provide readers who do not have extensive statistics background with the basics needed to understand these ped in 1932 by rensis likert1 to measure attitudes, the typical likert scale is a 5- or 7-point ordinal scale used by respondents to rate the degree to which they agree or disagree with a statement (table). In an ordinal scale, responses can be rated or ranked, but the distance between responses is not measurable. Thus, the differences between “always,” “often,” and “sometimes” on a frequency response likert scale are not necessarily equal. This is in contrast to interval data, in which the difference between responses can be calculated and the numbers do refer to a measureable “something.

An example of interval data would be numbers of procedures done per resident: a score of 3 means the resident has conducted 3 procedures. Interestingly, with computer technology, survey designers can create continuous measure scales that do provide interval responses as an alternative to a likert scale. 1continuous measure exampletabletypical likert scalesthe controversyin the medical education literature, there has been a long-standing controversy regarding whether ordinal data, converted to numbers, can be treated as interval data. That is, can means, standard deviations, and parametric statistics, which depend upon data that are normally distributed (figure 2), be used to analyze ordinal data? 2a normal distributionwhen conducting research, we measure data from a sample of the total population of interest, not from all members of the population. Parametric tests make assumptions about the underlying population from which the research data have been obtained—usually that these population data are normally distributed. Nonparametric tests do not make this assumption about the “shape” of the population from which the study data have been drawn.

Descriptive statistics, such as means and standard deviations, have unclear meanings when applied to likert scale responses. Furthermore, if responses are clustered at the high and low extremes, the mean may appear to be the neutral or middle response, but this may not fairly characterize the data. Other non-normal distributions of response data can similarly result in a mean score that is not a helpful measure of the data's central e of these observations, experts over the years have argued that the median should be used as the measure of central tendency for likert scale data. Similarly, experts have contended that frequencies (percentages of responses in each category), contingency tables, χ2 tests, the spearman rho assessment, or the mann-whitney u test should be used for analysis instead of parametric tests, which, strictly speaking, require interval data (eg, t tests, analysis of variance, pearson correlations, regression). However, other experts assert that if there is an adequate sample size (at least 5–10 observations per group) and if the data are normally distributed (or nearly normal), parametric tests can be used with likert scale ordinal data. He provides compelling evidence, with actual examples using real and simulated data, that parametric tests not only can be used with ordinal data, such as data from likert scales, but also that parametric tests are generally more robust than nonparametric tests. That is, parametric tests tend to give “the right answer” even when statistical assumptions—such as a normal distribution of data—are violated, even to an extreme degree.

Thus, parametric tests are sufficiently robust to yield largely unbiased answers that are acceptably close to “the truth” when analyzing likert scale responses. And researchers also commonly create several likert-type items, group them into a “survey scale,” and then calculate a total score or mean score for the scale items. In these cases, experts suggest using the cronbach alpha or kappa test or factor analysis technique to provide evidence that the components of the scale are sufficiently intercorrelated and that the grouped items measure the underlying bottom linenow that many experts have weighed in on this debate, the conclusions are fairly clear: parametric tests can be used to analyze likert scale responses. However, to describe the data, means are often of limited value unless the data follow a classic normal distribution and a frequency distribution of responses will likely be more helpful. Furthermore, because the numbers derived from likert scales represent ordinal responses, presentation of a mean to the 100th decimal place is usually not helpful or enlightening to summary, we recommend that authors determine how they will describe and analyze their data as a first step in planning educational or research projects. Then they should discuss, in the methods section or in a cover letter if the explanation is too lengthy, why they have chosen to portray and analyze their data in a particular way. Jr you can't fix by analysis what you've spoiled by design: developing survey instruments and collecting validity evidence.

In to add this to watch 2012 // volume 50 // number 2 // tools of the trade // ing likert article provides information for extension professionals on the correct analysis of likert data. The analyses of likert-type and likert scale data require unique data analysis procedures, and as a result, misuses and/or mistakes often occur. This article discusses the differences between likert-type and likert scale data and provides recommendations for descriptive statistics to be used during the analysis. Once a researcher understands the difference between likert-type and likert scale data, the decision on appropriate statistical procedures will be analysis, likert, likert scale, @ virginia town, west the years, numerous methods have to measure character and personality traits (likert, 1932). Ulty of measuring attitudes, character, and personality in the procedure for transferring these qualities into tative measure for data analysis purposes. The rity of qualitative research techniques has relieved some burden associated with the dilemma; however, many ists still rely on quantitative measures of attitudes,Character and personality response to the difficulty of ter and personality traits, likert (1932) developed a measuring attitudinal scales. His is was based on the composite score from the series ons that represented the attitudinal scale.

While likert used a five-point scale, ions of his response alternatives are appropriate, deletion of the neutral response (clason & dormody, 1994). By the time of this article'ation, at least 12 articles published in the 2011 journal ion had used some form of a likert response. The articles published in 2011 included 4-point atives (behnke & kelly, 2011; robinson & shepard,2011), five-point likert alternatives (diker, walters,Cunningham-sabo, & baker, 2011; elizer, 2011; hines, hansen, &. 2011; kalambokidia, 2011; kroth & peutz, 2011; singletary,Emm, & hill, 2011), six-point likert alternatives (allen, varner,& sallee, 2011; beaudreault & miller, 2011; wyman et al. 2011), and a seven-point likert alternative (walker, vaught, walker,While variations of the likert ative have become common in extension research, common also created misuses or mistakes. Before we discuss the analysis of likert data, let's basic concepts of the -type versus likert and dormody (1994) described ence between likert-type items and likert scales. While ons may be used in a research instrument, there is no the researcher to combine the responses from the items into ite scale.

Likert scale, on the other hand, ed of a series of four or more likert-type items that ed into a single composite score/variable during the is process. Table 2 provides e of five questions designed to be combined into a likert ing eating likert questions create a "healthy eating" likert ly disagreedisagreeneutralagreestrongly agree. A healthy diet is important to my 's scale of likert-type and likert scale data data analysis procedures. The steven's scale consists of four categories:Nominal, ordinal, interval, and the nominal scale, observations ed to categories based on equivalence. Interval scale data also use indicate order and reflect a meaningful relative distance on the scale. A also uses numbers to indicate order and reflects a ve distance between points on the scale. Examples of ratio measures include age ing likert response properly analyze likert data, one tand the measurement scale represented by each.

Additional analysis riate for ordinal scale items include the chi-square measure ation, kendall tau b, and kendall tau scale data, on the other hand, ed at the interval measurement scale. Likert scale items d by calculating a composite score (sum or mean) from four type likert-type items; therefore, the composite score scales should be analyzed at the interval measurement ptive statistics recommended for interval scale items mean for central tendency and standard deviations ility. Additional data analysis procedures appropriate al scale items would include the pearson's r, t-test,Anova, and regression procedures. Table 3 provides examples of is procedures for likert-type and likert scale ted data ures for likert-type and likert scale -type datalikert scale l tendencymedian or ationskendall tau b or cpearson's statisticschi-squareanova, t-test, data analysis decision for likert usually made at the questionnaire development stage. Do you have of individual questions that have likert response options participants to answer or do you have a series of ons that when combined describe a personality trait de? If you have designed of questions that when combined measure a particular trait,You have created a likert scale. If you feel a need to report the individual make up the scale, only use likert-type statistical in mind that once the decision between likert-type and has been made, the decision on the appropriate statistics , k.

