Martin, J. R. 1998. Evaluating faculty based on student opinions: Problems, implications and recommendations from Deming’s theory of management perspective. Issues in Accounting Education (November): 1079-1094.

Summary by James R. Martin, Ph.D., CMA
Professor Emeritus, University of South Florida

Deming Main Page | Education Issues Main Page

The purpose of this paper is to present the case against using student opinions to evaluate teaching effectiveness and to recommend an alternative method consistent with Deming’s theory of management.¹ Briefly, the thrust of the argument is that student opinions should not be used as the basis for evaluating teaching effectiveness because these aggregated opinions are invalid measures of quality teaching, provide no empirical evidence in this regard, are incomparable across different courses and different faculty members, promote faculty gaming and competition, tend to distract all participants and observers from the learning mission of the university, and insure the sub-optimization and further decline of the higher education system. Using student opinions to evaluate, compare and subsequently rank faculty members represents a severe form of a problem Deming referred to as a deadly disease of western style management. The theme of the alternative approach is that learning on a program-wide basis should be the primary consideration in the evaluation of teaching effectiveness. Emphasis should shift from student opinion surveys to the development and assessment of program-wide learning outcomes. To achieve this shift in emphasis, the university performance measurement system needs to be redesigned to motivate faculty members to become part of an integrated learning development and assessment team, rather than a group of independent contractors competing for individual rewards.

This paper is organized into five sections. The first three sections add more specificity concerning the deficiencies of student opinion surveys as outlined above. The fourth section includes a discussion of the underlying causes of the misuse of student evaluations and the final section provides a set of recommendations concerning systemic changes in higher education derived from Deming’s theory of management perspective.

The Invalidity of Student Evaluations

The first step in selecting and validating a measurement is to develop an operational definition of the construct or phenomenon to be measured (Bipboye, Smith and Howell 1994, 59). This includes identifying the desired characteristics (e.g., inputs, processes and outputs) associated with the phenomenon involved. Subsequent steps include identifying and examining the relationships between possible measurements and these characteristics, as well as the relationships between the measurements and the characteristics of other related phenomenon. The purpose of these steps is to develop a theory concerning the relationship between the phenomenon to be measured and the measurements. For example, the underlying theory of using student opinions to evaluate faculty teaching effectiveness is that the characteristics of high quality teaching cause high student ratings and that the characteristics of low quality teaching cause low student ratings. However, if some of the characteristics of high quality teaching tend to cause low student ratings and some of the characteristics of low quality teaching tend to cause high student ratings, the theory is flawed and the measurement is invalid. Examining student evaluations from the perspective outlined above helps reveal several problems related to the invalidity of these measurements.

First, although the American Assembly of Collegiate Schools of Business (AACSB 1994, 47) requires assessments of teaching that ensure the effectiveness of instruction, and a five-dimensional model of effective teaching has been developed by the Accounting Education Change Commission and modified by an AAA committee (Calderon, Gabbin and Green, 1996), an explicit operational definition of high quality teaching does not exist. In other words, the very first step in the process of validation is missing. However, the use of student opinions to evaluate faculty implies that the mean student opinion represents the evaluator’s implicit definition of quality teaching. The underlying assumption is that collectively, any group of students will accurately define and definitively measure quality teaching, automatically.

Second, even if an explicit comprehensive operational definition of high quality teaching can be developed, it cannot be fully represented on a student evaluation form because students are not qualified to judge many of the aspects involved.² For example, what was the depth of student knowledge obtained in this class? Is that knowledge current, relevant, integrated and applicable to the solution of problems within the discipline studied? How long will the students retain this knowledge? Do the students understand the underlying theory and issues associated with the topics covered? Did the students enhance their critical thinking ability, creativity, oral and written communication skills, ability to work together and ability to learn? Did the instructor adequately conceptualize, organize and sequence the subject matter? Does the instructor keep up with the profession and relevant research in the areas associated with this class? Did the instructor integrate this course with related courses, disciplines and current research? Does the instructor share his or her teaching methods, techniques and successes with other instructors through publications and presentations at academic meetings?³ Obviously students cannot answer these questions.

A third problem with student evaluations has been identified by a variety of researchers (e.g., Newton 1988; Wright et. al. 1984; Crumbley 1995). As mentioned above, student evaluations are not valid if some characteristics of high quality teaching tend to cause low student ratings and some characteristics of low quality teaching tend to cause high student ratings. For example, eliminating the most challenging material from a course, or from the exams, and generously awarding partial credit for incorrect answers may tend to generate higher student ratings, but it is doubtful that many faculty members would classify these actions as indicative of high quality teaching.⁴ Similarly, providing entertaining lectures without much, if any substance, may produce high student ratings as in the Dr. Fox episode⁵, but this cannot logically be defined as high quality teaching. In fact, entertaining lectures, with or without much substance, may produce high student ratings in any class, but the "knowing-all-telling-all" teacher-centered approach is inconsistent with the current academic movement to transform students from passive recipients into interactive constructors of their own knowledge (Christensen, Garvin and Sweet, 1991).⁶ This new student-centered active learning model changes the orientation of a high quality teacher from the relatively narrow master of content and dynamic oration to an infinitely more complex master of content, organization, facilitation and guidance in the process of building the students’ capacity for self-discovery.⁷ Moving from the old teacher-centered lecture mode to the student-centered active learning mode requires experimentation which is extremely risky in an environment where faculty are evaluated, compared and ranked on the basis of student opinions.

The Incomparability of Performance Evaluations

Now, to consider a different issue, assume that a valid measure of quality teaching could be obtained for each faculty member. Then, it would be appropriate to evaluate, compare and subsequently rank the faculty sequentially on the basis of these measurements. Right? Wrong! The misconception that a group of faculty members (or any group of workers) can be meaningfully compared and ranked is based on two faulty assumptions. The first assumption is that each worker can control his or her actual performance. However, system variation (frequently referred to as random variation) is inevitably present in any process, operation or activity. Deming (1986, 315) estimated that 94 percent of the variation in any system is attributable to the system, not to the people working in the system. The second assumption is that any system variation will be equally distributed across workers. According to Deming (1986, 353) there is no basis for this assumption in real life experiences. The source of the confusion comes from statistical (probability) theory where random numbers are used to obtain samples from a known population. When random numbers are used in an experiment, there is only one source of variation, so the randomness tends to be equally distributed. This is because samples based on random numbers are not influenced by such things as the characteristics of the inputs and other real world phenomena. However, in real life experiences, there are many identifiable causes of variation, as well as a great many others that are unknown. The interaction of these forces will produce unbelievably large differences between people (Deming 1986, 110) and there is no logical basis for assuming that these differences will be equally distributed.⁸ Evaluating, comparing and subsequently ranking a group of workers is a common misapplication of the concept of random numbers. Deming taught that ranking a group of workers is merely an exercise in ranking the effects of the system on the workers.

Confounding Variables in an Academic System

Now, consider some of the variables in an academic system. Table 1 provides a variety of variables that are related to the environment and individual course. Although no attempt was made to produce an all-inclusive list of environmental and course related variables, twenty-six easily came to mind.

Many of the variables listed in Table 1 have been the subject of research, while many others have not been studied. For example, several researchers have attempted to measure the relationship between student evaluations and class size, class level and class time (e.g., Wright et. al. 1984; Mulford and Schneider 1988). Others have attempted to measure the effect on student evaluations caused by different courses within the accounting major (Deberg and Wilson 1990).

Table 1: Confounding Factors in an Academic System
Environment and Course Related Variables

Environment Related

1. Campus, e.g., location, convenience, attractiveness.
2. Building, e.g., location, convenience, attractiveness, presence of snack bar, etc.
3. Room lighting, glare etc.
4. Type of seating, e.g., tables with chairs or desks, level of comfort, etc.
5. Seating arrangement, e.g., tiered levels, wide or shogun style.
6. Acoustics.
7. Internal noise, e.g., from air handler.
8. External noise, e.g., from other classes, etc.
9. Overhead projector availability and quality.
10. Screen availability and quality.
11. Black or white board availability and quality.
12. Ability to adjust air flow and temperature.
13. Other equipment availability and quality, e.g., computers, software etc.

Course Related

14. Size of class (Nichols & Soper 1972 cited in Wright et. al. 1984; Mulford & Schneider 1988).
15. Size of class relative to room size.
16. Required or elective (Mulford & Schneider 1988).
17. Level, i.e., freshman through graduate (Kau & Rubin 1976 cited in Wright et. al. 1984).
18. Level of difficulty within the university.
19. Level of difficulty within the college.
20. Level of difficulty within the major.
21. Stated or implied importance within major.
22. Time of class, e.g., regular semester, summer, morning, afternoon, night (Nichols & Soper 1972 cited in Wright et al. 1984)
23. Length of class time.
24. Relationship with other classes, e.g., prerequisites and other interdependence (Deberg & Wilson 1990).
25. Relationship of course to a professional exam, e.g., CPA.
26. Relationship of course to local employment opportunities.

The variables listed in Table 1 are perhaps less influential than those related to the students and the teacher. Consider the representative list of student characteristics that appears in Table 2. Some studies have examined the effects of student grade expectations (e.g., Wright et. al.1984 cite four) and male versus female preferences (Wright et. al. 1984 cite one), but it appears that most of the student characteristics listed in Table 2 have not been studied even though behavioral researchers have found cultural differences, such as masculinity and individualism, to be significant determinates of human behavior (e.g., Baker 1976; Hofstede 1986; McGregor 1960).⁹ In addition, student characteristics, such as knowledge of prerequisite material, learning styles and attitudes and biases toward the course and the instructor, are perhaps critical to the student ratings. These are system-related variables and the effects are virtually impossible to measure.

Table 2: Student Characteristics

1. Race.
2. Sex (Bledsoe 1971 cited in Wright et. al., 1984).
3. Age.
4. Health.
5. Introvert or extrovert.
6. Cultural background, family values, ethics etc.
7. Cultural characteristics, e.g., masculinity, individualism etc. (Hofstede 1991).
8. Orientation toward McGregor’s (1960) Theory X or Theory Y.
9. Marital status.
10. Academic background.
11. Academic major.
12. Level of preparation during course.
13. Current employment status.
14. Campus resident or nonresident.
15. Distance nonresidents live from campus.
16. Military status.
17. Learning style.
18. GPA (Kau & Rubin 1976, cited in Wright et. al. 1984).
19. GPA in major.
20. Grades in prerequisite courses.
21. Expected grade (Kau & Rubin 1976, Nichols & Soper 1972, Stewart and Malpass 1966; and Weaver 1960, all cited in Wright et. al. 1984).
22. Amount learned in previous courses, particularly prerequisite courses.
23. Amount retained from previous courses, particularly prerequisite courses.
24. Attitude and bias toward the course obtained from other students or faculty.
25. Bias toward the instructor obtained from other students or faculty.
26. Proportion of classes attended.
27. Whether the student is present when student evaluations are given.
28. Characteristics in relation to those of the instructor.

Most of the research associated with student evaluations is directed toward teacher-related variables such as those presented in Tables 3 and 4 below. Many of the teacher-related characteristics listed in Table 3 are uncontrollable, or only partially controllable by the instructor. Most of these variables have not received much, if any, attention from researchers, although some have received considerable attention. For example, the personality of the instructor has been found to have a significant effect on student evaluations. Some researchers have suggested "that a lecturer’s authority, wit and personality can seduce students into the illusion of having learned, even when the educational content of the lecture is missing" (Wright, Whittington and Whittenburg 1984). Others have reported a connection between the instructor’s sex and student ratings (Bledsoe 1971 cited by Wright et. al. 1984). Even though all of the characteristics listed in Table 3 may influence student ratings of faculty, most of these variables have not been researched.

Table 3: Teacher Related Characteristics

1. Race
2. Sex (Bledsoe 1971 cited in Wright et. al., 1984).
3. Age (Rayder 1968 cited in Wright et. al., 1984; Mulford & Schneider 1988).
4. Attractiveness.
5. Health.
6. Voice, e.g., deep, powerful, weak, squeaky, accented, monotone.
7. Verbal fluency (Coffman 1954 cited in Wright et. al. 1984).
8. Expressiveness (Meier & Feldhusen 1979 cited in Wright et. al. 1984).
9. Weight.
10. Energy level.
11. Personality, e.g., serious, humorous, jovial, laid back, outgoing, dynamic, caring, sensitive (Wright & Wotruba 1978 and Naflulin, Ware & Donnelly 1973 both cited in Wright et. al. 1984).
12. Number of years teaching.
13. Number of times teaching the course evaluated.
14. Innate intelligence and capacity for learning.
15. The number of different course preparations during a semester and over a longer term basis.
16. The number of different levels of courses taught during a semester and on a long term basis
17. Dress.
18. Self discipline.
19. Religious views.
20. Political views.
21. Enthusiasm for the course evaluated.
22. General knowledge, e.g., degrees, extent of continuous learning.
23. Extent of relevant practical experience.
24. Knowledge of course within subject area.
25. Current knowledge of subject area.
26. Knowledge of related areas.
27. Ability and willingness to bring knowledge of related areas into class.
28. Ability and willingness to keep the course (i.e., student knowledge) level constant when the teacher’s knowledge level is increasing.
29. Preparation for class (Mulford & Schneider 1988).
30. Ability and willingness to learn and use student’s names.
31. Acting ability.
32. Willingness to act, e.g., play roles etc.
33. The extent that the teacher is involved in service related work, e.g., committees etc.
34. Textbook authorship (McDaniel & Feldhusen 1971 cited in Wright et. al. 1984).

A separate list of variables that tend to be controllable by the teacher appears in Table 4. The purpose of this table is not to display an all-inclusive list of practices, but to suggest that a very large number of possible practices are available that might significantly influence student ratings. Most of the practices listed in Table 4 are related to how students are evaluated and graded. Although the relationship between student evaluations and grades has received more attention in the literature than any other variable discussed, the emphasis in most research studies has been on expected grades, actual grades and grading leniency (Wright et. al. 1984 cite six studies in this category), not on specific testing and grading practices. However, there are a wide variety of ways to be more lenient without appearing to be too lenient. For example, consider the effect of using non cumulative or non comprehensive exams throughout a course rather than using a more challenging group of comprehensive exams. Also consider the effect of using exams patterned after practice exams that are given to students in advance, or generously awarding partial credit for incorrect answers.¹⁰

Table 4: Controllable Teacher Practices Related to Course and Students

1. Whether the objectives of the course are clearly stated.
2. Pedagogy used in the course, e.g., use of lecture, discussion format, Socratic method, cases, cooperative learning, team assignments, presentations, team presentations, papers, team papers, computer assignments, outside readings.
3. The extent that prerequisite course materials or topics from previous courses are reviewed during class time.
4. The extent that prerequisite course materials or topics from previous courses are reviewed outside of class, e.g., during office hours.
5. The extent to which the material used in the class is canned or developed by the teacher.
6. Whether the materials used in the course can be sold to the bookstore when the class ends.
7. The extent that related research of the teacher is brought into class.
8. Willingness to entertain, tell jokes, show cartoons, be dynamic or evangelical in class.
9. Time spent on entertainment or ratio of substance to puff and fluff.
10. Whether quizzes are used.
11. Relationship of the quizzes to the test.
12. Whether and how homework is counted.
13. Whether a class session is dedicated to reviewing material prior to each test.
14. Whether the tests are cumulative.
15. The ratio of difficult to easy questions on the tests.
16. The ratio of conceptual to quantitative or mechanical questions on the test.
17. The extent that critical thinking is tested, e.g., comparing and evaluating concepts, unstructured case situation questions without definitive answers, questions that require the student to choose a position and defend it.
18. Time spent teaching the test, i.e., teaching what appears on the test rather than general course content.
19. Willingness to omit the more difficult material from the tests.
20. Willingness to give out practice tests.
21. Relationship of practice tests to the actual tests, e.g., random sample of material or exact replicas.
22. The type of tests given, e.g., objective, short answer, essay.
23. Whether the tests are completed in class or taken home.
24. The number of tests given.
25. Whether the sequence of tests become more or less difficult, e.g., whether the first test is designed to weed-out low achievers or students with weak backgrounds.
26. The time pressure placed on students during the test.
27. Whether partial credit is given for incorrect answers.
28. Whether the test grades are scaled in some way.
29. The relationship between the level of difficulty taught and tested, e.g., teach at a higher level than tested.
30. Whether class participation is graded.
31. Whether credit is given for effort and improvement.
32. Whether students are allowed to keep their tests.
33. Whether retest are given.
34. The extent that old tests are available to students, e.g., placed on reserve in the library.
35. How the various assignments are graded and weighted in obtaining the student’s final grade.
36. Whether extra credit projects are provided.
37. Whether the teacher’s style is directed to the high achievers or the low achievers in the class.
38. Willingness and ability to critically evaluate students in various soft areas such as speaking and writing skills.
39. Tolerance, e.g., for student tardiness, absences, dishonesty, excuses, lack of background and lack of preparation.
40. Time spent with students during office hours.
41. Willingness to be friendly.
42. Willingness to socialize with students.
43. Time spent socializing with students.
44. Willingness to give students home phone number and return their calls.
45. Willingness to give "I" grades to students who fail the course and then let them retake the course (free) the following semester.

Gaming to Improve Student Ratings

The term "gaming" as used here refers to attempts by faculty to influence the student ratings in their favor by using practices that distract from learning rather than enhance learning. There are two issues related to faculty gaming. The first issue is that although gaming techniques represent low quality teaching, faculty use of the techniques may improve the student opinion ratings. This potential inverse relationship between good teaching and good student ratings is the invalidity issue discussed above. The second issue is that using student opinion surveys as the basis for evaluating teaching effectiveness causes the very problem that a good system’s design would prevent, i.e., a decline in the quality of teaching.

Many authors have observed that evaluating teaching effectiveness based on student opinions tends to motivate faculty gaming to improve the student ratings (e.g., Newton 1988; Wright et. al. 1984; Crumbley 1995). Perhaps many of the practices listed in Table 4 fall into the gaming category, but it is unlikely that a very large group of faculty could reach a consensus on very many of the items listed. Although most faculty would probably agree that frequently dismissing class early to take one’s students on a bar hopping escapade is gaming of the worst kind (Crumley 1995), it might be more difficult to achieve a consensus on giving non comprehensive exams, "I" grades and retesting students when they fail a test, or keeping grade expectations high until the student surveys are obtained and then "hosing" the students with a hard final exam (Crumbley 1995). Even so, as more faculty use a particular technique, even more are encouraged by students to follow suit. Many students use what one faculty member referred to as "a disgruntled consumer approach" (Wiesenfeld 1996). Student comments such as "My other teachers let me take exams early, or take retest, or drop the low exam grade, or give "I" grades for failing, or give practice exams or take home exams, or no final exam" are fairly common. The point of this discussion is simply that evaluating teaching effectiveness based on student opinions encourages game playing, which becomes more and more competitive and compelling as more gaming techniques are used, many to the detriment of quality education. Using student opinions to evaluate faculty promotes a system where students move forward unprepared for the next level. This promotes even more gaming behavior at the higher levels. Unfortunately, this behavior is inadvertently driven by an indeterminable number of university administrators who pretend to evaluate teaching effectiveness with little or no empirical evidence.

It is important to note that opinions do not qualify as empirical evidence.¹¹ In fact, opinions, particularly anonymous opinions, cannot even be criticized because the basis for the opinion is unknown. Opinions do not require a theory as a justification and may be based on a gut feeling, whim, bias, infatuation, revenge, or psychic phenomenon. The realization that opinion is unreliable evidence was frequently reinforced by Deming who said, "In God we trust. All others require data" (Walton 1986, 96). Deming’s "management by data" concept means that accurate and timely data concerning the output of a process is needed to evaluate and subsequently improve the process.

Underlying Causes of the Misuse of Student Evaluations

The misuse of student evaluations is a symptom of a much larger and far more serious problem. The higher education system is based on a whole set of flawed assumptions that include the view that concentrating on the performances of individuals is the key to optimizing the performance of the system¹², that each individual is responsible for any variations in his or her performance, that individual performances are independent and can be measured separately (Scholtes 1998), and that these measurements can be compared to form a ranking of the individuals within the system. Additional misconceptions include the view that people mainly work for money and that individual incentive (zero-sum) pay plans improve the performance of individuals (Pfeffer 1998). These faulty assumptions and misconceptions are the antithesis of Deming’s theory of management and "systems thinking" in general (Senge 1990, 78).

Recommendations

So what is the solution? How should teaching effectiveness be measured so that the evaluation method promotes the optimization and continuous improvement of the higher education system? Unfortunately, there are no quick or easy solutions. Before a solution can be developed and implemented, the underlying flaws in the system must be recognized and eliminated. Optimizing the higher education system (as any other system) depends on the concepts of interdependence and synergy, not on the concept of individualism.

The following recommendations are based on Deming’s theory of management. First, the university should conduct a series of seminars on Deming’s theory of management and the issues addressed in this paper. Most universities have some faculty that are familiar with these concepts, but for some reason they have no credibility within their own institution. Therefore, outside help may be needed.¹³ These seminars should emphasize what teamwork is, why teamwork is important and why the university performance evaluation system must be changed to promote teamwork.

Second, and a prerequisite for all that follows, the university performance measurement system must be changed dramatically to conform to Deming’s theory. This means that administrators should stop evaluating, comparing and ranking the faculty because ranking promotes competition, destroys cooperation, and accomplishes nothing positive. Judging and ranking people is not leadership. It does not provide a method for improvement. Leadership involves helping people to improve, and providing an environment where people see themselves as part of a system and are motivated to help each other to optimize the system (Deming 1993, 128-131).

Third, the faculty and administrators need to stop treating each course, each faculty member and each functional area as if they were self-contained private¹⁴ stovepipes.¹⁵

Fourth, faculty members must take collective responsibility for the overall educational outcome (Bailey 1994, 7; AACSB 1994, 23 & 48). This should involve frequent faculty collaboration to improve the content, pedagogy and delivery within each course and across courses. Student opinion surveys can be incorporated into the collaboration and review process if these instruments are designed to identify potential improvements and do not include a summary question concerning the overall rating of the faculty member.¹⁶ The faculty collaboration process should definitely not include faculty committees that are used to produce an annual evaluation and subsequent ranking of each faculty member for merit pay or other salary adjustments.¹⁷ To do so is to commit one of Deming’s seven deadly diseases (See appendix). Faculty members should not judge each other. Instead they should frequently interact, share, council, facilitate, teach and coach each other for the improvement of the system.

Fifth, cross-functional teams of faculty members (including other business and perhaps nonbusiness faculty) should identify as many of the interdependencies between courses as possible so that these relationships can be considered in the continuous improvement process.

Sixth, ranking students within each course is counterproductive and should be stopped. An alternative with considerable precedent from a wide variety of professional exams is simply to use a pass-fail system where the pass line is drawn at a demanding, but achievable level. Students should be able to demonstrate that they know and can perform whatever the collective faculty determines is required. With these changes, emphasis can be placed on maximizing learning in each class with a view toward the student’s overall program, not on each student’s position within the class ranking.¹⁸ Then instructors can and should stop pushing unprepared students to the next level, and perhaps students will learn to cooperate with each other, rather than competing for a better position in the ranking.¹⁹

Seventh, the cross functional faculty teams need to operationally define what students should know and be able to do at each stage of their program and at the end of their university program. A student’s acceptance into upper level courses should require the satisfactory completion of an entrance exam on the foundation material. Defining requirements across courses may require that faculty members sit in on other courses and team-teach some courses so that they have a better grasp of the entire major.

Eighth, the cross functional teams should develop comprehensive oral and written field exams and administer them at the end of the student’s program.²⁰ This step is time-consuming, but it focuses attention on the student’s entire program rather than the individual parts, and it forces students to think in terms of their education holistically as a cumulative building process rather than as a series of isolated hurdles. This step also shifts the emphasis from a time-based program (120 or 150 hours) to a competency-based program²¹ and provides a way to determine whether changes in the education process actually represent improvements in the learning outcomes (i.e., "management by data").

Finally, the cooperative efforts of administrators, faculty, students, and all other stakeholders will be needed in these efforts to change the higher education system from a competitive individualistic system to a system that promotes cooperation and teamwork. From Deming’s perspective, the current system produces approximately three percent of the potential of a system redesigned around the concepts he recommended. In Deming’s system, everyone wins and the potential benefits are quite extraordinary.

Footnotes

¹ Deming explained his theory of management in two books (1986 and 1993). The first book places emphasis on fourteen points that management should implement along with seven deadly diseases and some other obstacles that management should eliminate. In the second book, Deming consolidated his theory into four types of knowledge needed to manage a system. A summary of Deming’s theory of management as it relates to a higher education system is provided in an appendix to this article. For more, see the Deming 93 Summary.

² Apparently 75 to 80 percent of academics would agree with this statement (Bures, DeRidder and Tong 1990; Reckers 1996) but many, if not most administrators act as if they disagreed. Many researchers routinely support the statement (e.g., see Caskin 1983; Selden 1984; Newton 1988 DeMong, Lindgren Jr. and Perry 1994), although there are some who appear to disagree (Hooper and Page, 1986).

³ The last four questions were paraphrased from an AAA committee report indicating characteristics of effective teaching. See Calderon, Gabbin and Green, (1996) Table 1.

⁴Several papers refer to the faculty belief that lenient grading produces higher student ratings and that faculty act accordingly (Winsor 1977; Powell 1977; Worthington and Wong 1979; Stumpf and Freedman 1979; Yunker and Marlin 1984). In one faculty survey, at least 1/3 of the respondents indicated that they had substantially decreased the level of difficulty and grading standards for their courses (Ryan, Anderson and Birchler 1980).

⁵A number of authors comment on the "Dr. Fox Effect" where a professional actor received the highest ratings for delivering a very entertaining, but senseless lecture (e.g., Wright et. al.1984).

⁶ Some researchers have suggested that the lecture is more effective than many critics believe (Burke and Day 1986).

⁷ Although a recent interest in promoting active learning is relatively new, Whitehead (1929) advocated similar ideas many years ago. Some of these ideas are referred to by Christensen, Garvin and Sweet (1991).

⁸ Deming was critical of probability theory and central limit theory in his seminars. For example, in response to a comment from the audience that the mean number of defects in the red bead experiment would be ten based on these concepts, Deming said "I think it is necessary to think and not to assume what you don’t know" (Walton 1986, 49). See Martin, J. R. Not dated. What is the red bead experiment? Management And Accounting Web. (Summary).

⁹ For example, Baker (1976) found that U.S. accounting students placed higher values on being clean and responsible and lower values on being imaginative than other students.

¹⁰ A physics professor at Georgia Tech noted that "a physics major could obtain a degree without ever answering a written exam question completely" simply by obtaining partial and extra credit (Wiesenfeld 1996).

¹¹ Empirical evidence is derived from observation or experiment. Although someone’s opinion may be based on empirical evidence, the opinion itself does not represent empirical evidence. Many researchers incorrectly refer to opinion surveys as empirical research. For clarification and an excellent description of the various types of research methodology, see Buckley, Buckley and Chiang (1976). (Summary).

¹² Apparently, this idea evolved from Adam Smith’s (1937) concept of the "invisible hand" that, presumably, guides individuals who automatically seek their own self interest to act in the best interests of the whole system.

¹³ The W. Edwards Deming Institute is perhaps the best place to seek help.

¹⁴ Bailey (1994) argues that "A focus on learning demands that faculty must eliminate the privacy normally associated with what goes on in the classroom."

¹⁵ Perhaps Elliot (1992) first popularized the term "stovepipe" in reference to organizations where each responsibility center, from divisions of companies to individual workers is viewed as a separate entity to be optimized. (Summary).

¹⁶ The AAA Teaching and Curriculum Section committee report includes an example of what the committee referred to as a "well-designed student evaluation instrument" (Calderon, Gabbin, and Green 1996, Table 3). It is commendable that this instrument does not include a summary question of the student’s overall opinion of the instructor.

¹⁷ When faculty committees are used to produce annual evaluations of the faculty, the members of the committee tend to use student evaluations as the basis for the committee’s recommendations. This practice makes the problem worse because the implication is that the faculty have been evaluated by the students, the faculty committee and the chairman of the department when student opinions are the basis of the whole charade.

¹⁸ What other industry or organization rates its products from A through F and worries about grade inflation ( e.g., Addy and Herring 1996)?

¹⁹ One accounting practitioner recently noted that higher education is the only industry where 100 percent of the finished products need rework. Steve Albrecht (AAA President) relayed this idea to the audience during the Western AAA meeting plenary session, May 1, 1998.

²⁰ Using comprehensive program, or field exams is not a new idea. Prior to 1960 both field exams and thesis requirements were common in bachelors and masters degree programs. In addition, Demong, Lindgren Jr. and Perry (1994) recommend the use of standardized exams as part of an assessment program, but point out that students may not take them seriously if the exams do not affect their grades, transcript, or ability to graduate.

²¹ Moore (1997) briefly discusses the development of competency based models by corporate universities and some traditional business schools. Some readers will question how students can be compared for graduate school and employment in a competency based pass fail system. The results of the program field exams will provide useful information for selecting graduate students. This information can be supplemented with other information, e.g., faculty recommendations and GMAT results. However, screening people for employment should not be a goal of a university system. In the time based system that we have today, the high achievers are slowed down and the low achievers are speeded up ignoring the fact that people learn in different ways and at different speeds. We pick up the slack with grades. The variability in the quality of our graduates is very large, and the mean quality level is low. A competency based pass fail system would reduce the variability and improve the mean quality of the university’s product. This is what changing to a continuous improvement system means.

Appendix: Summary of Dening's Theory of Management*
Knowledge Needed to Manage a System	Relation to the 14 Points	Relation to the 7 Deadly Diseases
Knowledge of the system. A leader must understand the system he or she is attempting to manage. Without this understanding the system can not be managed or improved. A system cannot understand itself or manage itself. Optimization of the parts does not optimize the whole. System optimization requires coordination and cooperation of the parts which requires leadership.	A knowledge of the system promotes points 1. to create constancy of purpose which means to constantly attempt to optimize the system. Everybody in the system needs to understand how their effort or output fits into the system and each person is viewed in terms of how they contribute towards optimizing the system, 2. adopt the new philosophy, 5. improve constantly, 6. institute training, 7. institute leadership, 8. drive out fear, 9. break down barriers between departments, 11. eliminate numerical quotas, 12. remove barriers to pride in workmanship, 13. institute education & training in teamwork and statistical methods.	A lack of knowledge of the system promotes diseases 1. lack of constancy of purpose, 2. emphasis on short term performance, 3. evaluation by annual performance reviews and merit ratings, 4. management (and faculty) mobility and 5. running the organization on visible figures alone. These diseases all distract from the purpose of higher education and prevent optimization of the system.
Knowledge of variation . Deming is referring to Shewhart’s concept of separating common or system causes of variation from assignable or special causes of variation. This relates to blaming people for variation caused by the system.	A knowledge of variation promotes points 5. improve constantly, 6. institute training, 7. institute leadership, 11. eliminate work standards, quotas & management by objective. A knowledge of variation helps one understand the system so that it can be managed and improved.	A lack of knowledge of variation promotes disease 3. annual reviews and ranking faculty members . A manager who understands variation would not rank people because he or she would understand that ranking people merely ranks the effect of the system on the people.
Knowledge of the Theory of Knowledge. Knowledge is derived from theory. Information is not knowledge. Experience teaches nothing without theory. Copying examples does not lead to knowledge.	An understanding of the theory of knowledge promotes points 5. improve constantly, 6. institute training, and 7. institute leadership. The emphasis is on teaching people how to think on a continuous basis and not to assume any two problems are identical.	A lack of understanding of the theory of knowledge promotes seeking examples to follow rather than thinking. Theory leads to questions which lead to answers which leads to knowledge and subsequent improvement.
Knowledge of Psychology. Leaders must understand human behavior to motivate, coordinate and manage people to optimize the system.	An understanding of psychology promotes points 7. institute leadership - helping people do a better job, rather than ranking them, 8. drive out fear, 9. break down barriers between departments -so that they cooperate rather than compete, 12. remove barriers to pride in workmanship.	A lack of knowledge of psychology promotes 3. evaluations with annual reviews, merit ratings and ranking people. People need a method to improve, not objectives, quotas & rankings.

* Developed by the author based on the works of W. Edwards Deming (1986 and 1993).

____________________________________________

References and summaries

American Assembly of Collegiate School of Business. 1994. Achieving Quality and Continuous Improvement through Self-Evaluation and Peer Review.

Addy, N., and C. Herring. 1996. Grade inflation effects of administrative policies. Issues in Accounting Education (Spring): 1-13.

Bailey, A. R.1994. Accounting education: Gradual transition or paradigm shift. Issues in Accounting Education (Spring): 1-10.

Baker. C. R. 1976. An investigation of differences in values: Accounting majors vs. non-accounting majors. The Accounting Review (October): 886-893.

Bipboye, R. L., C. S. Smith, and W. C. Howell. 1994. Understanding Industrial And Organizational Psychology. Fort Worth, TX: Harcourt Brace College Publishers.

Bledsoe, J. C. 1971. Factors related to pupil observation and attitudes toward their teacher. Journal of Educational Research (November): 119-126.

Buckley, J. W., M. H. Buckley, and H. Chiang. 1976. Research Methodology & Business Decisions. National Association of Accountants. (Summary).

Burke, M. J., and R. R. Day. 1986. A cumulative study of the effectiveness of managerial training. Journal of Applied Psychology: 232-245.

Bures, A. L., J. J. DeRidder, and H. M. Tong. 1990. An empirical study of accounting faculty evaluation systems. The Accounting Educators’ Journal (Summer): 68-76.

Calderon, T., A. L. Gabbin, and B. P. Green. 1996. Summary of promoting and evaluating effective teaching. The Accounting Educator. The Newsletter of the Teaching and Curriculum Section, American Accounting Association. Supplement.

Caskin, W. E. 1983. Concerns about using students ratings in community colleges. In A. Smith (Ed.), Evaluating Faculty and Staff: New Directions For Community Colleges. San Francisco, CA: Jossey-Bass.

Christensen, C. R., D. A. Garvin, and A. Sweet. (Eds.) 1991. Education for Judgment. Boston, MA: Harvard Business School Press.

Coffman, W. E. 1954. Determining student’s concepts of effective teaching and their ratings of instructors. Journal of Educational Psychology: 277-286.

Crumbley, D. L. 1995. The dysfunctional atmosphere of higher education: Games professors play. Accounting Perspectives.

Deberg, C. L., and J. R. Wilson. 1990. An empirical investigation of the potential confounding variables in student evaluation of teaching. Journal of Accounting Education (Spring): 37-62.

Deming, W. E. 1986. Out Of The Crisis. Cambridge, MA: Massachusetts Institute of Technology Center for Advanced Engineering Study.

Deming, W. E. 1993. The New Economics For Industry, Government, Education. Cambridge, MA: Massachusetts Institute of Technology Center for Advanced Engineering Study. (Summary).

DeMong, R. F., J. H. Lindgren Jr., and S. E. Perry.1994. Designing an assessment program for accounting. Issues in Accounting Education (Spring): 11-27.

Elliott, R. K. 1992. The third wave breaks on the shores of accounting. Accounting Horizons (June): 61-85. (Summary).

Hofstede, G. 1986. The cultural context of accounting. Accounting and Culture. Annual meeting plenary session papers. American Accounting Association: 1-11. (Summary).

Hooper, P., and J. Page. 1986. Measuring teaching effectiveness by student evaluation. Issues in Accounting Education (Spring): 56-64.

Kau, J. B., and P. H. Rubin. 1976. Measurement techniques, grades and ratings of instructors. Journal of Economic Education: 59-62.

Martin, J. R. Not dated. What is the red bead experiment? Management And Accounting Web. Demings Redbeads

McDaniel, E., and J. F. Feldhusen. 1971. College teaching effectiveness. Today’s Education (March): 27.

McGregor, D. 1960. The Human Side of Enterprise. New York, NY: McGraw-Hill. (1957 Article Summary).

Meier, R. S., and J. F. Feldhusen. 1979. Another look at Dr. Fox: Effect of stated purpose for evaluation, lecture expressiveness, and density of lecture content on student ratings. Journal of Educational Psychology (June): 339-345.

Moore, T. E. 1997. The corporate university: Transforming management education. Accounting Horizons (March): 77-85.

Mulford, C. W., and A. Schneider. 1988. An empirical study of structural and controllable factors affecting faculty evaluations. Advances in Accounting: 205-215.

Naflulin, D. H., J. E. Ware, and F. A. Donnelly. 1973. The Dr. Fox lecture: A paradigm of educational seduction. Journal of Medical Education: 630-635.

Newton, J. D. 1988. Using student evaluation of teaching in administrative control: The validity problem. Journal of Accounting Education: 1-14.

Nichols, A., and J. C. Soper. 1972. Economic man in the classroom. Journal of Political Economy: 1069-1073.

Pfeffer, J. 1998. Six dangerous myths about pay. Harvard Business Review (May-June): 109-119.

Powell, R. W. 1977. Grades, learning, and student evaluation of instruction. Research in Higher Education. 7: 193-205.

Rayder, N. F. 1968. College student ratings of instruction. Journal of Experimental Education (Winter): 76-81

Reckers, M. P. 1996. Know thy customer. Journal of Accounting Education (Summer): 179-185.

Ryan, J. J., J. A. Anderson, and A. B. Birchler. 1980. Student evaluations: The faculty responds. Research in Higher Education. 12(4): 317-333.

Scholtes, P. R. 1998. Total quality or performance appraisal: Choose one. www.iqpic.org/per.htm.

Selden, P. 1984. Changing Practices in Faculty Evaluation. San Francisco. Jossey-Bass.

Senge, P. M. 1990. The Fifth Discipline: The Art & Practice of The Learning Organization. New York, NY: Doubleday Dell Publishing Group Incorporated. (Note).

Smith, A.1937. An Inquiry into the Nature and Causes of the Wealth of Nations, edited by E. Cannan, New York: Random House. Modern Library edition.

Stewart, C. T., and L. F. Malpass. 1966. Estimates of achievement and ratings of instructors. Journal of Educational Research (April): 347-350.

Stumpf, S. A., and Freedman, R. D. 1979. Expected grade covariation with student ratings of instructors. Journal of Educational Psychology. 71: 273-302.

Walton, M. 1986. The Deming Management Method. New York, NY: The Putnam Publishing Company.

Weaver, C. H.1960. Instructor ratings by college students. Journal of Educational Psychology: 21- 45.

Whitehead, A. N. 1929. The Aims of Education and Other Essays. New York, NY: Free Press.

Wiesenfeld, K.1996. Making the grade. Newsweek (June 17): 16.

Winsor, J. L. 1977. A’s , B’s, but not C’s: A comment. Contemporary Education. 48: 82-84

Worthington, A. G., and Wong, P. T. P. 1979. Effects of earned and assigned grades on student evaluations of an instructor. Journal of Educational Psychology. 71: 764-775.

Wright, P., R. Whittenton, and G. E. Whittenburg. 1984. Student ratings of teaching effectiveness: What the research reveals. Journal of Accounting Education (Fall): 5-30.

Wright, P. W., and T. Wotruba. 1978. Models of student evaluations. Paper presented to A.A.C.T. E., Chicago, February.

Yunker, J. A. and J. W. Marlin. 1984. Performance evaluation of college and university faculty: An economic perspective. Educational Administration Quarterly (Winter):9-37.