CHOOSE YOUR CURRENCY

ESTIMATING MULTIPLE SOURCES OF VARIATION AND SCORE DEPENDABILITY IN ECONOMICS ESSAY TEST USING GENERALIZABILITY THEORY

Amount: ₦5,000.00 |

Format: Ms Word |

1-5 chapters |



ABSTRACT

The purpose of this study was to estimate multiple sources of variation and score dependability in Economics essay test using generalizability theory. The study adopted fully crossed research design. The population for the study comprised one thousand eight hundred  and  thirty five  (1835) SSII Economics  students  in  the 47 public  secondary schools  in  Aguata  education  zone  of  Anambra  State.  A  sample  size  of  328  SSII Economics students was used for the study. The sample was obtained using purposive sampling technique. Economics Essay Test (EET) was used to collect data for this study. The instrument (EET) was validated by experts in Social Science Education (Economics Unit) and Science Education (Measurement and Evaluation), Faculty of Education, University of Nigeria, Nsukka. The instrument was trial tested on a sample of twenty five (25) SSII Economics students in Nsukka education zone to determine its internal consistency and reliability coefficient of 0.85 was obtained using Kendall’ coefficient of concordance. The data obtained for the study were analyzed using a computer program (EduG  version  6.1-e) specifically  designed  for  generalizability  theory  to  answer  the research  questions.  The  findings  of  the  study  revealed  that:  the  largest  variance component recorded was the residual (that is, SIR interaction) component which accounts for 77.6 % of the total variance in the G-study followed by student [σ2  (S)] component which account for 15.1 % of the total variance. The third largest variance component was

that of SR followed by the rater (R). Variances for item [σ2  (I)], student-by-rater [σ2

(SR)] and the  item-by-rater  interaction  effect  [σ2  (IR)] were found to be zero  (0.0); increasing  the  number  of  items  produces  a  more  generalizability  and  reliability coefficient than increasing the number of raters; the relative and absolute error variance

influencing the facets and their interactions were found to be 0.3590 and 0.3642 respectively. Based on the findings, it was recommended among others that test item writers should endeavour to increase the number of items of their questions to produce a better reliability and generalizability coefficient of the items.

CHAPTER ONE

INTRODUCTION

Background of the Study

Economics is one of the subjects offered by students at senior secondary school level in Nigeria. Economics is the science that deals  with production, exchange and consumption  of  various  commodities  in  economic  systems.  It  shows  how  scarce resources  can be used to increase  wealth and human  welfare. According to Mankiw (2001), Economics is the study of how society manages its scarce resources. Hence, the central focus of Economics is on scarcity of resources and choices among alternative uses. Similarly, Egunjobi and Egwuakhide (2010) see Economics as the study of human endeavours in respect of production, distribution, exchange and consumption. According to Okafor (2008), Economics as a subject helps the individual to be relevant in everyday life and could prepare students for an entrepreneurial career in the future. The study of Economics enables the leaders as well as the citizens to understand the basic economics concepts,  principles  as  well  as  to  understand,  appreciate  and  seek  to  improve  the economic  situation for their own social good (Obemeata, 1991). The definitions and explanation of Economics indicate that Economics is an important subject to both the students and society at large, because it cuts across all spheres of human endeavours.

The main goals and objectives of studying Economics in senior secondary school, are to enable the students to: (i) understand basic economic principles and concepts as tools for sound economic analysis; (ii) contribute intelligently to discourse on economic

reforms and development as they affect or would affect the generality of Nigerians; (iii) understand the structure and functioning of economic institution; (iv) appreciate the role of public policies on national economy; (v) develop the skills and also appreciates the basis for national economic decisions; (vi) become sensitized to participate actively in National Economic advancement  through entrepreneurship, capital market and so on; (vii) understand the role and status of Nigeria and other African countries in the international economic relationships; and (viii) appreciate the problems encountered by developing   countries   in   their   efforts   towards   economic   advancement   (Nigerian Educational Research and Development Council (NERDC), 2008).

These objectives of Economics form the bedrock on which all efforts to ensure proper and adequate implementation of the programme goals are based. In classroom setting for example, the teacher’s efforts are to ensure adequate achievement of the set goals or objectives of the lesson content. The extent to which the teacher is able  to achieve the goals of any lesson content is a reflection of the degree of achievement by the teacher in attaining the expected change(s) in the behaviours of the learners after instruction (Ugwuja & Igbokwe, 2009). These expected changes in the behaviours of the students are measured using different assessment tools or procedures.

Assessment is the process of gathering information about students’ abilities or behaviors for the purpose of making decisions about the students (Elliot, Kratochmill, Cook & Travers, 2000). Different assessment tools such as objective tests, essay tests, checklists, socio-metric techniques among others are utilized by the teacher depending on

the objectives of the measurement. Essay test is one of the assessment tools used in testing or assessing students’ academic achievement in any given instruction, especially when teachers want the students to originate, organize, express, and integrate ideas in a given problem. Essay test, according to Onunkwo (2002), is a test in which students are required to provide answers to questions and therefore offers students the opportunity to organize and express their ideas in writing. Hence, it is particularly useful in evaluating objectives which deal with selection, arrangement, organization and expression of ideas. In the light of this, essay tests usually provoke critical thinking skill; provide authentic experience, originality and ingenuity (Reiner, Bothell, Sudweeks & Wood, 2002).

An essay test consists of items in any subject to which students are required to produce either short responses or extended responses. It is the amount of freedom given that determines whether the test is a short-response type or the extended response type (Nwana, 2007). Short response essay type is the type of essay test in which students are required to provide written answers of a few lines in length to some brief questions (Onunkwo, 2002). Extended response  type, according  to  Nwana  (2007), requires  the student to provide a long, comprehensive and written answer of two or more pages to a question. These two types of essay test challenge students to create responses rather than to simply select a response (Reiner et al, 2002). That is, they usually require a student to compose  an  answer  rather  than  selecting  an  answer.  Onunkwo  (2002)  opined  that teachers use essay tests because they have the potential to reveal students’ abilities to reason, create, analyze, synthesize, and evaluate. This implies that it will enable  the

teacher to assess students’ understanding of and ability to think about subject matter content. The essay type requires the students to demonstrate their reasoning and thinking skills, which give teachers the opportunity to detect problems students may have with the reasoning processes.

However, assessment that requires test-takers to provide extended constructed responses, such as Economics essay test often lack score reliability across tasks because it depends  on  subjective  rater  for  scoring  the  students’  responses  (Cumming,  Kantor, Powers, Santos, & Taylor, 2000; Miller & Linn, 2000). Fulcher (2003) opined that the major problem that must be faced in essay testing is controlling the reliability (dependability) of scores in different rating or grading system. Therefore, the reliability of  the  scores  from  observations  is  one  of  the  most  important  issues,  which  needs attention.

Reliability is the confidence that a measurement reflects a real or stable trait, that is, it is the degree of consistency with which a test is measuring whatever it is measuring. It entails the question of whether assessment results will vary when the assessment is repeated under the same conditions. Ezeh  (2003) opined that if an instrument yields consistent measure each time it is scored, it is said to be reliable. Kim and Wilson (2009) referred to reliability of behavioural measures as the accuracy of generalizing from a person’s observed score on a measure or a test to the score that the person who has received averaged over all possible conditions.

Reliability issues in psychology and education have been addressed principally using Classical Test Theory (CTT), which postulates that an observed score (X) can be decomposed into a “true” score (T) and a single undifferentiated random error term (E) (Brennan, 2001). Within classical test theory, various coefficients are available for investigating single source of error variance. Consequently, it is possible to examine only a single source of measurement error at any given time. This poses a serious problem for researchers, since in reality several types of measurement errors can exist concurrently (Erkuş, 2003). However, in classical test theory, consideration of multiple sources of error variance within one analysis is unavailable (Crowley, Thompson & Worchol, 1994) This indicates that under classical test theory, stability (test-retest) measures of reliability regard occasion as the source of error; parallel-forms measures of reliability regard the form as the source of error; and internal consistency reliability measures regard items as the source of error (Eason, 1989; Webb, Rowley & Shavelson, 1988). Hence, each type of reliability estimate can be used to determine the degree to which true scores deviate from observed scores and equally estimate only one source of error.

The problem, however, is that classical test theory is unable to examine inconsistencies in test forms, raters, items, or occasions simultaneously (Brennan, 2001). This implies that with CTT, all sources of inconsistency are considered as a whole and not differentiated. The inability of classical test theory to analyze more than one source of error variance at a time, consider the possible, completely independent, or separate interaction effects of the measurement error variance and also determine which source contributes the most to the inconsistency, limits the theory as a psychometric technique in

analyzing Economics essay test (Yelboğa & Tavşancıl, 2010). This is because Economics essay test involves more than one major facet. The word “facet” according to Hecker (2013) is a set of conditions that contribute unwanted variation (measurement error) to observations in a study or the potential source of error, such as, test item variance, rater variance,  subject  variance,  and  levels  of  the  facets  are  called  conditions  (Cook  & Beckam, 2006). The major facets in essay test include, at least, tasks and raters as major sources of score variability. Essay test clearly requires employing a multifaceted analysis known as generalizability  theory that can  analyze  more  than one measurement  facet simultaneously, in addition to the object of measurement (that is, examinees) (Cronbach, Gleser, Nanda, & Rajaratnam, 1972).

Generalizability theory (G theory) according to Brennan (2001) is a conceptual and statistical framework for analyzing more than one facet in investigating measurement error and score reliability. G theory decomposes, estimates, and reveals all measurement errors, which are termed variance components. Unlike CTT, G-theory provides a flexible alternative that allows multiple sources of error to be estimated separately and also allows the impact of a variety of different types of sources of error on the reliability of measurements to be examined within a unified framework (Shavelson, Webb & Rowley,

1989). More specifically for instance, Economics essay testing can be viewed as a sample of students achievement drawn from a complex universe defined by a combination of all possible  tasks  (items)  and  raters  (judge).  Moreover, task facet can be  viewed  to be representative of the contents in Economics essay test, that is, the subject matter domain, and rater facet includes all possible Economics teachers who could be trained to score or

rate Economics essay test reliably. According to Brennan (2001), Generalizability theory is a powerful analytical  tool that provides information  about how  much  variation  is explicable by different facets and how score  reliability changes if rating designs are altered by increasing the number of conditions in each facet.

The choice and number of facets in a G-study may vary according to the interests of the researcher. In this study however, the researcher focuses on the items (I) and the raters  (R) because of the subjectivity in scoring Economics essay tests, and also the object of the measurement, which is the student (S). The aim is to estimate each variance components mentioned and their contributions to the total observed variance (Brennan,

2011). The researcher is also interested in estimating variance components for the interactions among these facets by comparing the relative size of the estimated variance components. This is possible because, G theory unlike CTT makes it possible to predict not only the effect of one source of error, but also the effects of the interaction of those sources of error (Yin & Shavelon, 2008). Hence, with G-theory, the researcher in this study can determine which sources of variation are most troublesome and which, if any, need to be addressed in an attempt to reduce unwanted inconsistencies in the ratings.

In G-theory however, two studies  are considered, a  generalizability  study (G- study)  and  a  decision  study  (D-study).  G-study  is  designed  to  isolate  and  estimate variation due to the object of measurement and as many facets of measurement error as it is reasonably and economically feasible to examine. This indicates that variances associated with different facets of measurement  including the object of measurement

(usually examinees) are estimated and evaluated in terms of their relative importance in contributing to the  total score  variance, given a universe  of admissible  observations (Brennan, 2001). A decision (D) study uses the information provided by the G-study to design the best possible application of the measurement for a particular purpose. The impact of various changes in the measurement design by changing different number of tasks or raters on score reliability is investigated for the universe of generalization of interest (Brennan, 2001).

As different from CTT, two different types of reliability coefficients can be computed in the G-theory using D-study. That is, it is possible to calculate the reliability coefficient  (Φ or Phi) for the absolute  decisions  which are  not available  in  CTT  in addition  to  the  generalizability  coefficient  (Eρ2   or  G)  which  is  found  for  relative decisions. That is, in D-study, two different types of reliability coefficients can be computed, one for norm-referenced and the other for criterion-referenced score interpretations respectively. A generalizability coefficient is the coefficient that addresses reliability in the context of relative decisions. A relative G coefficient reflects the degree

to which the objects of measurement (students) maintain rank order across facets, regardless of possible changes in raw score (Webb, Shavelson & Haerte, 2006). This is equivalent to a reliability coefficient (i.e., coefficient alpha) in classical test theory (Brennan, 2001), but a classical reliability coefficient usually implies a single undifferentiated source of measurement error. Here, the researcher can determine whether the scores obtained by Economics students is higher or lower by comparing the students’ score with those obtained by the norm group. According to Shavelson and Webb (1991),

the  generalizability  coefficients are  useful in  testing situations  where  the purpose of measurement is to make relative decisions about examinees (e.g., selection of students for a particular purpose) based on the relative standing of examinees  compared  to other Economics students in the same group or to a group average in test scores. In contrast, a

dependability index (Φ or Phi) uses absolute error variance [σ2 (δ)] as error variance and

is more appropriate for domain-referenced or criterion-referenced situations (Brennan,

2001). Dependability index also known as absolute G coefficients are most useful when the actual values of the obtained scores in Economics essay test are important or meaningful to the investigator. These typically involve performance measurements where there is a cut-off value that is deemed particularly meaningful. It ascertains whether or not each student has mastered what has been taught, that is, each student performance is interpreted in relation to a set standard or criterion of proficiency. However, when the measurement objective is to make absolute decisions about whether Economics students have attained or mastered a pre-specified Economics content or criterion level of performance, it is more appropriate to use the reliability coefficient (i.e., Φ) that takes into account such systematic  differences related to test forms, tasks, and raters (Lee,

2005). In planning the D-study however, the researcher defines a universe of generalization,  the  set  of  facets  and  their  level  to  which  the  researcher  wishes  to generalize, and specifies the proposed interpretation of the measurement. The researcher then uses the information from the G-study to evaluate the effectiveness of alternative designs for minimizing error and maximizing reliability of Economics essay test.

From the discussion above, it indicates that the use of classical test theory approaches to determining score reliability is not capable of identifying and untangling the profusion of error because it accounts for only one source of error at a time (Atilla,

2012). This therefore, poses a serious problem to researchers since  there are several potential  errors  that  can  exist  simultaneously  in  any  measurement.  Moreover,  essay testing, involves more than one major random facet, which include at least, tasks and raters as major sources of score variability. There is need therefore, to employ generalizability theory that liberalizes classical test theory by allowing a researcher to disentangle  multiple  sources  of  error  that  contribute  to  the  undifferentiated  error  in classical theory. These observations underscore the need to estimate the measurement error and score reliability of such facets (items and raters) using generalizability theory to determine their contributions and interaction effects in observed Economics students’ scores as well as determining the best way to achieve score reliability, which will go a long way in giving a clear picture of students true performance in any given assessment. This is because, the more reliable the scores are; the more confidence the score users will have in using the scores for making important decisions like selection, promotion, classification among others about the students.

Statement of the Problem

It is  observed  that  Economics  essay  test,  which  challenges  students  to create responses rather than to simply select a response and also requires the students to demonstrate their reasoning and thinking skills, which gives the teachers the opportunity to detect problems  students  may have  with their reasoning process, often lack score

reliability.  Moreover,  there  are  multiple  sources  of  error  that  can  affect  the  score reliability  of Economics  essay test which  the application  of the classical test theory cannot clearly estimate simultaneously since CTT addresses only one source of measurement error at a time. However, since scores obtained by students who are the object of measurement in a given examination contains a number of potential sources of error like task and rater variability which are the two major sources of measurement error in Economics essay as well as their interactions cannot be estimated simultaneously with the use of CTT. It is appropriate to employ Generalizability (G) theory since it provides models and methods that allow researchers to disentangle multiple sources of variation that contribute to error (E) and determine which source that contributes the most to the inconsistency which the application of CTT cannot perform.

Therefore, sources of error variance and interactions among these sources can be considered simultaneously in a single generalizability analysis. Thus, the power of G theory  lies  in  its  ability  to  examine  multiple  sources  of  variation  and  their  unique interaction effects simultaneously. Hence, it is important to examine the contributions of these facets (items and raters) in the measurement procedure in Economics essay test with view of minimizing error and maximizing score dependability of Economics essay test.

Purpose of the Study

The purpose of this study is to estimate multiple sources of score variations and dependability in  Economics  essay test using generalizability  theory. Specifically;  the study is designed to determine the:

1.  magnitude of error variance due to students, tasks, raters and interactions among facets.

2.  relative   and  absolute   error   variance   associated   with   the   facets   and   their interactions in making relative and absolute decisions about the students.

3.  differences   in  the   generalizability   coefficient  of  Economics   essay  test  by increasing the number of conditions in each facet.

4.  differences in the reliability coefficient of Economics essay test by increasing the number of conditions in each facet.

Significance of the Study

The  theoretical significance  of this study is anchored  on Generalizability “G” theory propounded by Cronbach, Gleser, Nanda and Rajaratnam in 1972. G-theory is a conceptual and statistical framework and methodology that enables a researcher to disentangle multiple sources of error in measurement procedure. To a large extent, G- theory provided a more unified approach for assessing the reliability of examination scores than classical test theory. Therefore, this study is an attempt to empirically demonstrate the theoretical attributes of the theory by demonstrating the practical involvements of moving from theory to practice which this study will use in estimating various sources of error in a single analysis as against classical test theory.

In terms of the practical significance, the findings of this study will be beneficial to  examination  bodies,  test  item  writers,  teachers,  as  well  as  research  students  and research users. Examination bodies like West African Examinations Council (WAEC),

National  Examinations  Council  (NECO)  among  others  will  benefit  from  this  study because it will provide them with specific information about measurement error and how they will successfully design their examinations by estimating multiple sources of error other   than   internal   consistency.   This   is   because   generalizability   theory   allows examination bodies to estimate some unknown measurement errors that may not have been taken care of and as well design best measurement approach through its D-study to increase reliability.

The study will help the test item writers to write or construct items that will be suitable for specific purposes with minimum measurement error, such that the test items can  be  used  to  make  relative  or  absolute  decisions  concerning  the  students.  The classroom teachers through the findings of this study will be aware that there are potential sources of errors that can mar the outcome of their measurement and possible ways of maximizing reliability. Teachers’ knowledge of the various sources of error and possible ways of maximizing reliability will help the teachers to construct items (that is, teacher made test) that will yield higher reliability by increasing the items of the instrument.

The research students as well as the research users will equally benefit from the present study. This is informed by the fact that this study would serve as a source of information and bank of knowledge for other researchers who may wish to embark on research from a related perspective in this field. It is obvious that this work will provide them direction and guideline in exploring their study in generalizability theory.

Scope of the Study

This study focuses on estimating multiple sources of variation and score dependability   in   Economics   essay   test   using   generalizability   theory.   The   study specifically determines the contributions of the facets; students, items, raters and their interactive effect on error measurement and score  reliability. The study is limited to Economics teachers and SSII Economics students in Anambra State. The content scope includes; Demand and Supply, concept of money, Agriculture, Distributive trade, and Production which was drawn from the SS1 and SS2 syllabus. These contents were chosen because much variation was noticed on them during the researcher’s preliminary analysis of the relationship between the assistant examiners’ and team leaders’ scores obtained from General Certificate Examination conducted by WAEC in 2004 (See Appendix E).

Research Questions

The following research questions were posed to guide the conduct of this study:

1.  What are the differences in the magnitude of error variance due to students, items, raters and interactions among them?

2.  What are the relative and absolute error variance associated to the facets and their interactions in making relative and absolute decisions about the students?

3.  What are the differences in the generalizability coefficient of Economics essay test by increasing the number of conditions in each facet?

4.  What are  the differences  in the  reliability  coefficient of Economics  essay test obtained by increasing the number of conditions in each facet?


This material content is developed to serve as a GUIDE for students to conduct academic research



ESTIMATING MULTIPLE SOURCES OF VARIATION AND SCORE DEPENDABILITY IN ECONOMICS ESSAY TEST USING GENERALIZABILITY THEORY

NOT THE TOPIC YOU ARE LOOKING FOR?



PROJECTOPICS.com Support Team Are Always (24/7) Online To Help You With Your Project

Chat Us on WhatsApp » 07035244445

DO YOU NEED CLARIFICATION? CALL OUR HELP DESK:

  07035244445 (Country Code: +234)
 
YOU CAN REACH OUR SUPPORT TEAM VIA MAIL: [email protected]


Related Project Topics :

DEPARTMENT CATEGORY

MOST READ TOPICS