ABSTRACT
The purpose of this study was to estimate multiple sources of variation and score dependability in Economics essay test using generalizability theory. The study adopted fully crossed research design. The population for the study comprised one thousand eight hundred and thirty five (1835) SSII Economics students in the 47 public secondary schools in Aguata education zone of Anambra State. A sample size of 328 SSII Economics students was used for the study. The sample was obtained using purposive sampling technique. Economics Essay Test (EET) was used to collect data for this study. The instrument (EET) was validated by experts in Social Science Education (Economics Unit) and Science Education (Measurement and Evaluation), Faculty of Education, University of Nigeria, Nsukka. The instrument was trial tested on a sample of twenty five (25) SSII Economics students in Nsukka education zone to determine its internal consistency and reliability coefficient of 0.85 was obtained using Kendall’ coefficient of concordance. The data obtained for the study were analyzed using a computer program (EduG version 6.1-e) specifically designed for generalizability theory to answer the research questions. The findings of the study revealed that: the largest variance component recorded was the residual (that is, SIR interaction) component which accounts for 77.6 % of the total variance in the G-study followed by student [σ2 (S)] component which account for 15.1 % of the total variance. The third largest variance component was
that of SR followed by the rater (R). Variances for item [σ2 (I)], student-by-rater [σ2
(SR)] and the item-by-rater interaction effect [σ2 (IR)] were found to be zero (0.0); increasing the number of items produces a more generalizability and reliability coefficient than increasing the number of raters; the relative and absolute error variance
influencing the facets and their interactions were found to be 0.3590 and 0.3642 respectively. Based on the findings, it was recommended among others that test item writers should endeavour to increase the number of items of their questions to produce a better reliability and generalizability coefficient of the items.
CHAPTER ONE
INTRODUCTION
Background of the Study
Economics is one of the subjects offered by students at senior secondary school level in Nigeria. Economics is the science that deals with production, exchange and consumption of various commodities in economic systems. It shows how scarce resources can be used to increase wealth and human welfare. According to Mankiw (2001), Economics is the study of how society manages its scarce resources. Hence, the central focus of Economics is on scarcity of resources and choices among alternative uses. Similarly, Egunjobi and Egwuakhide (2010) see Economics as the study of human endeavours in respect of production, distribution, exchange and consumption. According to Okafor (2008), Economics as a subject helps the individual to be relevant in everyday life and could prepare students for an entrepreneurial career in the future. The study of Economics enables the leaders as well as the citizens to understand the basic economics concepts, principles as well as to understand, appreciate and seek to improve the economic situation for their own social good (Obemeata, 1991). The definitions and explanation of Economics indicate that Economics is an important subject to both the students and society at large, because it cuts across all spheres of human endeavours.
The main goals and objectives of studying Economics in senior secondary school, are to enable the students to: (i) understand basic economic principles and concepts as tools for sound economic analysis; (ii) contribute intelligently to discourse on economic
reforms and development as they affect or would affect the generality of Nigerians; (iii) understand the structure and functioning of economic institution; (iv) appreciate the role of public policies on national economy; (v) develop the skills and also appreciates the basis for national economic decisions; (vi) become sensitized to participate actively in National Economic advancement through entrepreneurship, capital market and so on; (vii) understand the role and status of Nigeria and other African countries in the international economic relationships; and (viii) appreciate the problems encountered by developing countries in their efforts towards economic advancement (Nigerian Educational Research and Development Council (NERDC), 2008).
These objectives of Economics form the bedrock on which all efforts to ensure proper and adequate implementation of the programme goals are based. In classroom setting for example, the teacher’s efforts are to ensure adequate achievement of the set goals or objectives of the lesson content. The extent to which the teacher is able to achieve the goals of any lesson content is a reflection of the degree of achievement by the teacher in attaining the expected change(s) in the behaviours of the learners after instruction (Ugwuja & Igbokwe, 2009). These expected changes in the behaviours of the students are measured using different assessment tools or procedures.
Assessment is the process of gathering information about students’ abilities or behaviors for the purpose of making decisions about the students (Elliot, Kratochmill, Cook & Travers, 2000). Different assessment tools such as objective tests, essay tests, checklists, socio-metric techniques among others are utilized by the teacher depending on
the objectives of the measurement. Essay test is one of the assessment tools used in testing or assessing students’ academic achievement in any given instruction, especially when teachers want the students to originate, organize, express, and integrate ideas in a given problem. Essay test, according to Onunkwo (2002), is a test in which students are required to provide answers to questions and therefore offers students the opportunity to organize and express their ideas in writing. Hence, it is particularly useful in evaluating objectives which deal with selection, arrangement, organization and expression of ideas. In the light of this, essay tests usually provoke critical thinking skill; provide authentic experience, originality and ingenuity (Reiner, Bothell, Sudweeks & Wood, 2002).
An essay test consists of items in any subject to which students are required to produce either short responses or extended responses. It is the amount of freedom given that determines whether the test is a short-response type or the extended response type (Nwana, 2007). Short response essay type is the type of essay test in which students are required to provide written answers of a few lines in length to some brief questions (Onunkwo, 2002). Extended response type, according to Nwana (2007), requires the student to provide a long, comprehensive and written answer of two or more pages to a question. These two types of essay test challenge students to create responses rather than to simply select a response (Reiner et al, 2002). That is, they usually require a student to compose an answer rather than selecting an answer. Onunkwo (2002) opined that teachers use essay tests because they have the potential to reveal students’ abilities to reason, create, analyze, synthesize, and evaluate. This implies that it will enable the
teacher to assess students’ understanding of and ability to think about subject matter content. The essay type requires the students to demonstrate their reasoning and thinking skills, which give teachers the opportunity to detect problems students may have with the reasoning processes.
However, assessment that requires test-takers to provide extended constructed responses, such as Economics essay test often lack score reliability across tasks because it depends on subjective rater for scoring the students’ responses (Cumming, Kantor, Powers, Santos, & Taylor, 2000; Miller & Linn, 2000). Fulcher (2003) opined that the major problem that must be faced in essay testing is controlling the reliability (dependability) of scores in different rating or grading system. Therefore, the reliability of the scores from observations is one of the most important issues, which needs attention.
Reliability is the confidence that a measurement reflects a real or stable trait, that is, it is the degree of consistency with which a test is measuring whatever it is measuring. It entails the question of whether assessment results will vary when the assessment is repeated under the same conditions. Ezeh (2003) opined that if an instrument yields consistent measure each time it is scored, it is said to be reliable. Kim and Wilson (2009) referred to reliability of behavioural measures as the accuracy of generalizing from a person’s observed score on a measure or a test to the score that the person who has received averaged over all possible conditions.
Reliability issues in psychology and education have been addressed principally using Classical Test Theory (CTT), which postulates that an observed score (X) can be decomposed into a “true” score (T) and a single undifferentiated random error term (E) (Brennan, 2001). Within classical test theory, various coefficients are available for investigating single source of error variance. Consequently, it is possible to examine only a single source of measurement error at any given time. This poses a serious problem for researchers, since in reality several types of measurement errors can exist concurrently (Erkuş, 2003). However, in classical test theory, consideration of multiple sources of error variance within one analysis is unavailable (Crowley, Thompson & Worchol, 1994) This indicates that under classical test theory, stability (test-retest) measures of reliability regard occasion as the source of error; parallel-forms measures of reliability regard the form as the source of error; and internal consistency reliability measures regard items as the source of error (Eason, 1989; Webb, Rowley & Shavelson, 1988). Hence, each type of reliability estimate can be used to determine the degree to which true scores deviate from observed scores and equally estimate only one source of error.
The problem, however, is that classical test theory is unable to examine inconsistencies in test forms, raters, items, or occasions simultaneously (Brennan, 2001). This implies that with CTT, all sources of inconsistency are considered as a whole and not differentiated. The inability of classical test theory to analyze more than one source of error variance at a time, consider the possible, completely independent, or separate interaction effects of the measurement error variance and also determine which source contributes the most to the inconsistency, limits the theory as a psychometric technique in
analyzing Economics essay test (Yelboğa & Tavşancıl, 2010). This is because Economics essay test involves more than one major facet. The word “facet” according to Hecker (2013) is a set of conditions that contribute unwanted variation (measurement error) to observations in a study or the potential source of error, such as, test item variance, rater variance, subject variance, and levels of the facets are called conditions (Cook & Beckam, 2006). The major facets in essay test include, at least, tasks and raters as major sources of score variability. Essay test clearly requires employing a multifaceted analysis known as generalizability theory that can analyze more than one measurement facet simultaneously, in addition to the object of measurement (that is, examinees) (Cronbach, Gleser, Nanda, & Rajaratnam, 1972).
Generalizability theory (G theory) according to Brennan (2001) is a conceptual and statistical framework for analyzing more than one facet in investigating measurement error and score reliability. G theory decomposes, estimates, and reveals all measurement errors, which are termed variance components. Unlike CTT, G-theory provides a flexible alternative that allows multiple sources of error to be estimated separately and also allows the impact of a variety of different types of sources of error on the reliability of measurements to be examined within a unified framework (Shavelson, Webb & Rowley,
1989). More specifically for instance, Economics essay testing can be viewed as a sample of students achievement drawn from a complex universe defined by a combination of all possible tasks (items) and raters (judge). Moreover, task facet can be viewed to be representative of the contents in Economics essay test, that is, the subject matter domain, and rater facet includes all possible Economics teachers who could be trained to score or
rate Economics essay test reliably. According to Brennan (2001), Generalizability theory is a powerful analytical tool that provides information about how much variation is explicable by different facets and how score reliability changes if rating designs are altered by increasing the number of conditions in each facet.
The choice and number of facets in a G-study may vary according to the interests of the researcher. In this study however, the researcher focuses on the items (I) and the raters (R) because of the subjectivity in scoring Economics essay tests, and also the object of the measurement, which is the student (S). The aim is to estimate each variance components mentioned and their contributions to the total observed variance (Brennan,
2011). The researcher is also interested in estimating variance components for the interactions among these facets by comparing the relative size of the estimated variance components. This is possible because, G theory unlike CTT makes it possible to predict not only the effect of one source of error, but also the effects of the interaction of those sources of error (Yin & Shavelon, 2008). Hence, with G-theory, the researcher in this study can determine which sources of variation are most troublesome and which, if any, need to be addressed in an attempt to reduce unwanted inconsistencies in the ratings.
In G-theory however, two studies are considered, a generalizability study (G- study) and a decision study (D-study). G-study is designed to isolate and estimate variation due to the object of measurement and as many facets of measurement error as it is reasonably and economically feasible to examine. This indicates that variances associated with different facets of measurement including the object of measurement
(usually examinees) are estimated and evaluated in terms of their relative importance in contributing to the total score variance, given a universe of admissible observations (Brennan, 2001). A decision (D) study uses the information provided by the G-study to design the best possible application of the measurement for a particular purpose. The impact of various changes in the measurement design by changing different number of tasks or raters on score reliability is investigated for the universe of generalization of interest (Brennan, 2001).
As different from CTT, two different types of reliability coefficients can be computed in the G-theory using D-study. That is, it is possible to calculate the reliability coefficient (Φ or Phi) for the absolute decisions which are not available in CTT in addition to the generalizability coefficient (Eρ2 or G) which is found for relative decisions. That is, in D-study, two different types of reliability coefficients can be computed, one for norm-referenced and the other for criterion-referenced score interpretations respectively. A generalizability coefficient is the coefficient that addresses reliability in the context of relative decisions. A relative G coefficient reflects the degree
to which the objects of measurement (students) maintain rank order across facets, regardless of possible changes in raw score (Webb, Shavelson & Haerte, 2006). This is equivalent to a reliability coefficient (i.e., coefficient alpha) in classical test theory (Brennan, 2001), but a classical reliability coefficient usually implies a single undifferentiated source of measurement error. Here, the researcher can determine whether the scores obtained by Economics students is higher or lower by comparing the students’ score with those obtained by the norm group. According to Shavelson and Webb (1991),
the generalizability coefficients are useful in testing situations where the purpose of measurement is to make relative decisions about examinees (e.g., selection of students for a particular purpose) based on the relative standing of examinees compared to other Economics students in the same group or to a group average in test scores. In contrast, a
dependability index (Φ or Phi) uses absolute error variance [σ2 (δ)] as error variance and
is more appropriate for domain-referenced or criterion-referenced situations (Brennan,
2001). Dependability index also known as absolute G coefficients are most useful when the actual values of the obtained scores in Economics essay test are important or meaningful to the investigator. These typically involve performance measurements where there is a cut-off value that is deemed particularly meaningful. It ascertains whether or not each student has mastered what has been taught, that is, each student performance is interpreted in relation to a set standard or criterion of proficiency. However, when the measurement objective is to make absolute decisions about whether Economics students have attained or mastered a pre-specified Economics content or criterion level of performance, it is more appropriate to use the reliability coefficient (i.e., Φ) that takes into account such systematic differences related to test forms, tasks, and raters (Lee,
2005). In planning the D-study however, the researcher defines a universe of generalization, the set of facets and their level to which the researcher wishes to generalize, and specifies the proposed interpretation of the measurement. The researcher then uses the information from the G-study to evaluate the effectiveness of alternative designs for minimizing error and maximizing reliability of Economics essay test.
From the discussion above, it indicates that the use of classical test theory approaches to determining score reliability is not capable of identifying and untangling the profusion of error because it accounts for only one source of error at a time (Atilla,
2012). This therefore, poses a serious problem to researchers since there are several potential errors that can exist simultaneously in any measurement. Moreover, essay testing, involves more than one major random facet, which include at least, tasks and raters as major sources of score variability. There is need therefore, to employ generalizability theory that liberalizes classical test theory by allowing a researcher to disentangle multiple sources of error that contribute to the undifferentiated error in classical theory. These observations underscore the need to estimate the measurement error and score reliability of such facets (items and raters) using generalizability theory to determine their contributions and interaction effects in observed Economics students’ scores as well as determining the best way to achieve score reliability, which will go a long way in giving a clear picture of students true performance in any given assessment. This is because, the more reliable the scores are; the more confidence the score users will have in using the scores for making important decisions like selection, promotion, classification among others about the students.
Statement of the Problem
It is observed that Economics essay test, which challenges students to create responses rather than to simply select a response and also requires the students to demonstrate their reasoning and thinking skills, which gives the teachers the opportunity to detect problems students may have with their reasoning process, often lack score
reliability. Moreover, there are multiple sources of error that can affect the score reliability of Economics essay test which the application of the classical test theory cannot clearly estimate simultaneously since CTT addresses only one source of measurement error at a time. However, since scores obtained by students who are the object of measurement in a given examination contains a number of potential sources of error like task and rater variability which are the two major sources of measurement error in Economics essay as well as their interactions cannot be estimated simultaneously with the use of CTT. It is appropriate to employ Generalizability (G) theory since it provides models and methods that allow researchers to disentangle multiple sources of variation that contribute to error (E) and determine which source that contributes the most to the inconsistency which the application of CTT cannot perform.
Therefore, sources of error variance and interactions among these sources can be considered simultaneously in a single generalizability analysis. Thus, the power of G theory lies in its ability to examine multiple sources of variation and their unique interaction effects simultaneously. Hence, it is important to examine the contributions of these facets (items and raters) in the measurement procedure in Economics essay test with view of minimizing error and maximizing score dependability of Economics essay test.
Purpose of the Study
The purpose of this study is to estimate multiple sources of score variations and dependability in Economics essay test using generalizability theory. Specifically; the study is designed to determine the:
1. magnitude of error variance due to students, tasks, raters and interactions among facets.
2. relative and absolute error variance associated with the facets and their interactions in making relative and absolute decisions about the students.
3. differences in the generalizability coefficient of Economics essay test by increasing the number of conditions in each facet.
4. differences in the reliability coefficient of Economics essay test by increasing the number of conditions in each facet.
Significance of the Study
The theoretical significance of this study is anchored on Generalizability “G” theory propounded by Cronbach, Gleser, Nanda and Rajaratnam in 1972. G-theory is a conceptual and statistical framework and methodology that enables a researcher to disentangle multiple sources of error in measurement procedure. To a large extent, G- theory provided a more unified approach for assessing the reliability of examination scores than classical test theory. Therefore, this study is an attempt to empirically demonstrate the theoretical attributes of the theory by demonstrating the practical involvements of moving from theory to practice which this study will use in estimating various sources of error in a single analysis as against classical test theory.
In terms of the practical significance, the findings of this study will be beneficial to examination bodies, test item writers, teachers, as well as research students and research users. Examination bodies like West African Examinations Council (WAEC),
National Examinations Council (NECO) among others will benefit from this study because it will provide them with specific information about measurement error and how they will successfully design their examinations by estimating multiple sources of error other than internal consistency. This is because generalizability theory allows examination bodies to estimate some unknown measurement errors that may not have been taken care of and as well design best measurement approach through its D-study to increase reliability.
The study will help the test item writers to write or construct items that will be suitable for specific purposes with minimum measurement error, such that the test items can be used to make relative or absolute decisions concerning the students. The classroom teachers through the findings of this study will be aware that there are potential sources of errors that can mar the outcome of their measurement and possible ways of maximizing reliability. Teachers’ knowledge of the various sources of error and possible ways of maximizing reliability will help the teachers to construct items (that is, teacher made test) that will yield higher reliability by increasing the items of the instrument.
The research students as well as the research users will equally benefit from the present study. This is informed by the fact that this study would serve as a source of information and bank of knowledge for other researchers who may wish to embark on research from a related perspective in this field. It is obvious that this work will provide them direction and guideline in exploring their study in generalizability theory.
Scope of the Study
This study focuses on estimating multiple sources of variation and score dependability in Economics essay test using generalizability theory. The study specifically determines the contributions of the facets; students, items, raters and their interactive effect on error measurement and score reliability. The study is limited to Economics teachers and SSII Economics students in Anambra State. The content scope includes; Demand and Supply, concept of money, Agriculture, Distributive trade, and Production which was drawn from the SS1 and SS2 syllabus. These contents were chosen because much variation was noticed on them during the researcher’s preliminary analysis of the relationship between the assistant examiners’ and team leaders’ scores obtained from General Certificate Examination conducted by WAEC in 2004 (See Appendix E).
Research Questions
The following research questions were posed to guide the conduct of this study:
1. What are the differences in the magnitude of error variance due to students, items, raters and interactions among them?
2. What are the relative and absolute error variance associated to the facets and their interactions in making relative and absolute decisions about the students?
3. What are the differences in the generalizability coefficient of Economics essay test by increasing the number of conditions in each facet?
4. What are the differences in the reliability coefficient of Economics essay test obtained by increasing the number of conditions in each facet?
This material content is developed to serve as a GUIDE for students to conduct academic research
ESTIMATING MULTIPLE SOURCES OF VARIATION AND SCORE DEPENDABILITY IN ECONOMICS ESSAY TEST USING GENERALIZABILITY THEORY>
PROJECTOPICS.com Support Team Are Always (24/7) Online To Help You With Your Project
Chat Us on WhatsApp » 07035244445
DO YOU NEED CLARIFICATION? CALL OUR HELP DESK:
07035244445 (Country Code: +234)YOU CAN REACH OUR SUPPORT TEAM VIA MAIL: [email protected]