Compliance hearing exhibits, ''Assessment Processes (Examples)''

ASSESSMENT PROCESSES ( EXAMPLES)zro mm z (A o f Assessment Processes (Examples) 1. Memorandum to elementary and junior high principals, Nov. 16, 1998, on schedule for picking up SAT9 testing materials 2. Memorandum to elementary school principals, Dec. 14, 1998, on procedures for upcoming administration of the criterion-referenced tests in reading and mathematics 3. Memorandum to elementary and junior high principals, Jan. 5, 1999, on the testing procedures for grades 4 and 8 ACTAAP Benchmark examinations fi 4. Memorandum to elementary and junior high principals and counselors, Jan. 26, 1999, on inservice schedule for test coordinators for the ACTAP Benchmarks for grades 4 and 8 5. Memorandum to selected administrators on Data Quality with attached paper written by Dr. Glytm Ligon 6. Memorandum to elementary principals, Aug. 17, 1999, relating to use of released items from Smart Start assessments 1. E-mail to curriculum staff, Aug. 23, 1999, relating to use of released items from Smart Start assessments 8. E-mail to elementary and middle school principals, Sept. 17, 1999, inviting them to an overview session on the new pre- and post-test Achievement Level Tests developed by Northwest Evaluation Association. 9. Memorandum in Sept. 22, 1999, Learning Links to prineipals identifying training needs to administer the Observation Survey and Developmental Reading Assessment 10. Memorandum to principals and K-2 teachers in March 15, 2000, Learning Links setting up an assessment training review for the Developmental Reading Assessment and Observation Survey 11. E-mail to Bormie Lesley on Mar. 17, 2000, suggesting a resource on how to assess technology knowledge 12. Memorandum in Apr. 5, 2000, Learning Links to elementary and middle school principals and test coordinators on new information relating to ACTAAP Benchmark examinations in grades 4 and 8 and the field testing in grade 6. 13. Document entitled Description of the Assessment System prepared in April 2000 in response to a request from the National Science Foundationrelating to the assessment of mathematics and science 14. Document entitled Procedures for Providing Data Analysis/Interpretation to Decision Makers prepared in April 2000 in response to a request from the National Science Foundationrelating to the assessment of mathematics and science =^^1 15. Document entitled Orientation to the Analysis and Interpretation of Test Results prepared in April 2000 in response to a request from the National Science Foundation^relating to the assessment of mathematics and science. 16. E-mail to Kathy Lease, May 23, 2000, providing feedback to proposed survey of middle school students and teachers. 17. E-mail to principals, Aug. 25, 2000, providing information on upcoming administration of the Achievement Level Tests in September. -fl' 18. E-mail to Bonnie Lesley, Aug. 31, 2000, providing information on new middle school report card 19. E-mail to Botmie Lesley, Aug. 31, 2000, providing copy of new middle school report card report 20. Memorandum from Linda Austin to Marian Lacey providing Middle School Report Card Update 21. E-mail to middle school principals, Jan. 3, 2000, setting up training for teachers on how to administer the State Benchmark examinations 22. Memorandum to Division of Instruction, Feb. 1,2000, setting agenda for Feb. 2 meeting
includes information on the District Assessment Plan 23. E-mail to elementary principals, Feb. 1, 2000, providing information on the use of calculators on Benchmark examinations 24. E-mail to principals, Feb. 3, 2000, providing copy of assessment schedule/matrix to distribute to teachers 25. Document prepared in fall 1999 by PRE on Achievement Level Tests
Assessments that Make a Difference 26. Memorandum to all principals and test coordinators, Mar. 17, 2000, establishing training sessions for the administration of the Benchmark and end-of-course examinations 21. Memorandum in Apr. 5, 2000, Learning Links to high school principals and test coordinators providing new information from ADE on the end-of-course literacy examination 3^ ^7 ^7 28. E-mail to Kathy Lease and Les Gamine, Apr. 7, 2000, providing rationale for adding science assessments to the Achievement Level Tests -^3- 29. Memorandum in Aug. 30, 2000, Learning Links to elementary principals and K-2 teachers including pre-testing instructions for the Observation Survey and Developmental Reading Assessment 30. Memorandum in Aug. 30, 2000, Learning Links to all principals and test coordinators establishing inservice schedule for administration of the SAT9 and ALTs 31. Memorandum in Sept. 8, 2000, Learning Links to elementary principals relating to K- 2 assessment and the importance of the language arts instructional block '/(f 32. Memorandum in Sept. 27, 2000, Learning Links to elementary and middle school principals relating to the administration of the end-of-module tests in mathematics and the end-of-unit tests in science ^7 33. Memorandum in Sept. 26, 2000, Learning Links to elementary principals relating to instructions to complete the Observation Survey and Developmental Reading Assessment 34. Memorandum to principals, Oct. 13, 2000, requesting feedback through a survey for consideration by the Assessment Focus Group
copy of survey attached 35. Memorandum to principals, Feb. 13,2001, with information on the administration of the climate surveys for parents, teachers, students, and administrators -7^ 36. E-mail, Feb. 26, 2001, relating to administration of surveys for the Extended Year Education school evaluation ^7 37. E-mail to curriculum directors, Feb. 27, 2001, relating to discussion of the potential purchase of an electronic curriculum/assessment management system 38. E-mail to principals and selected others on Mar. 1, 2001, relating to an information session on ALT online testing 39. E-mail to principals. Mar. 1, 2001, providing spring testing schedule for elementary, middle, and high schools 40. E-mail to Les Gamine, Mar. 8, 2001, providing outline of PRE responsibilities for Dr. James, incoming superintendent 41. Memorandum to elementary principals. Mar. 14, 2001, providing information on end- of-module mathematics criterion-referenced tests42. E-mail between various staff. Mar. 14-15, 2001, relating to analysis of results of mathematics and science criterion-referenced tests 43. Document entitled Mathematics, Reading, and Language Achievement Tests: Administration Guide prepared by PRE for use in training sessions for the ALTs, 2000-01 1I- LL I I Planning, Research, & Evaluation -Instructional Resource Center 3001 S. Pulaski Little Rock, AR 72206 M TO: FROM: DATE: Elementary and Junior High Principals Dr. Kathy Lease, Asst. Supt. ^4 November 16,1998 E SUBJECT: Pick up of Stanford Achievement Test, Ninth Edition, Reports for Grades 3 &8 The Stanford Achievement Test, Ninth Edition, (Stanford 9) reports for grades 3 and 8 are available for you or your designee to pick up in the Instructional Resource Center (IRC), Room 12. If you have not picked up your Stanford 9 reports for grades 5, 7, and 10, please do so immediately. 0 If you have any questions regarding these reports, please contact us at 2121, 2123, or 2125. cc
Elementary and Junior High Counselors 2 f TO: ftp. LITTLE ROCK SCHOOL DISTRICT Planning, Research, and Evaluation FROM: Elementary School Principals Dr. Kathy Lease, Asst. SuptJ^ Planning, Research, and Evaluation SUBJECT: Criterion Reference Tst (CRT) Administration DATE: December 14, 1998 LL The purpose of this memorandum is to inform you of the procedures for the upcoming administration of the CRT. The grade levels to be tested are 3, 4, 5 and 6. The subject areas that will be tested are mathematics and reading. The CRT is scheduled to be administered to one half of the schools on January 6-7, 1999
the remaining schools will test on January 7-8. Each school will be assigned testing dates. If you have a conflict with your assigned testing dates, please contact PRE and we will try our best to accommodate your schedule. See attached list for your testing dates. As you are aware, the Little Rock School District (LRSD) Boarfrof Education approved the administration of the CRT to measure student progress with the curriculum. It is for this purpose that we are conducting the second CRT in January of 1999" that will assess progress on the grade level benchmarks. Unlike the first CRT that was scored by individual teachers, this test will be machine scored by PRE in order to assist teachers with the grading. Therefore, it is necessary that the instructions and procedures for administering be followed exactly. Please remember that the results of this test must be incorporated into the second nine weeks grade. The value of the test and how it is incorporated may be decided at the building level. The Reading and Mathematics Departments, along with input from teachers, established ten (10) benchmarks per subject area representing skills and objectives that students should know and be able to do. There will be four (4) questions per benchmark totaling forty questions for both sections of the test. Students must answer 3 of 4 questions correctly to obtain mastery. Please share the relevant information from this memo with your teachers and share instructions on the following pages with your elementary teachers, grades 3 through 6. We have attached a list of Teacher ID Numbers for you to assign to individual classroom teachers. The teachers will use these numbers when completing the scoring sheets. The following pages will provide additional information related to the administration of the CRT. If you have any questions regarding the testing instructions or procedures, please contact the PRE staff at 2120, 2123, or 2125. Thank you for your cooperation and dedication to providing students in the LRSD a quality and equitable education.CRITERION REFERENCED TESTS ASSIGNED TESTING DATES January 6-7,1999 Badgett Bale Baseline Booker Brady Carver Magnet Chicot Cloverdale Dodd Fair Park Forest Park Franklin Fulbright Garland Geyer Springs Gibbs Magnet Jefferson January 7- 8,1999 - King Magnet Mabelvale McDermott Meadowcliff Mitchell Otter Creek Pulaski Heights Elementary Rightsell Rockefeller Romine Terry Wakefield Washington Watson Western Hills Williams Magnet Wilson Woodruff r CRITERION REFERENCED TEST DIRECTIONS FOR ADMINISTERING (READING AND MATHEMATICS)PROCEDURES PRIOR TO,and.AFTER..TEST,ADMINISTRATION Delivery and Return of Scoring Sheets from PRE CRTs and score sheets will be delivered to schools. The score sheets must be returned by the principal or his/her designee promptly at the completion of the second day of testing and returned to PRE (Room 12). Each grade level should be identified by teacher and returned separately in the envelope provided. Teachers may keep the classroom sets of CRTs for reference. Please make sure that the scoring sheets are turned in the same direction prior to placing in the return envelope. IMPORTANT: > PLEASE MAKE SURE THAT THE SCORING SHEETS ARE TURNED IN THE SAME DIRECTION PRIOR TO PLACING IN THE RETURN ENVELOPE. > DO NOT PAPER CLIP OR RUBBER BAND THE SCORING SHEETS. > PLEASE REMOVE THE PERFORATED STRIPS ON EACH SIDE OF THE SCORING SHEETS BEFORE RETURNING TO PRE. PROCEDURES.DUR.ING.T.E.ST.ADM.INISTRA.TION The test administration should be conducted in an optimal testing environment that will promote a successful experience for all students. Unlike the SAT-9, (a timed, norm-referenced test) the CRT is not timed. Therefore, students should be given ample time to complete each section of the test. It is recommended that the reading and mathematics sections of the test be administered on separate days. Determination of the sequence for administering the mathematics and reading tests will be your decision to make. However, the first forty items on the scoring sheets must be reserved for the Reading Test and following forty items must be reserved for the Mathematics Test The teacher should read the directions provided at the beginning of each test to the students. Sample test items will be provided for the third grade Reading Test only.Completing the Scoring Sheets To complete the scoring sheets you must
> > > Use #2 pencils only. Disregard true and false selections. This is a multiple-choice test with four selections (a, b, c or d). Have students print the following: Name, Subject (CRT-2), DO NOT COMPLETE HOUR AND DATE. Print appropriate teacher number in the Test ID Number section (see your principal for the correct teacher number). Then mark the corresponding circle below each box.). Begin the teacher number with the first numeral aligned from > the left. NOTE: The last three boxes will be blank. Print student social security number in Student /Teacher ID No. section. Then mark the corresponding circles. Begin the student social security number with e first numeral aligned from the left. NOTE: The last box will be blank. > Follow instructions provided on the score sheets for correct marking of bubbles. PLEASE NOTE: The first forty (40) items on the scoring sheet are designated for the Reading Test (1-40). The following forty (40) items are designated for the Mathematics Test (41-80). Attached, is an example of a completed simulated copy of the scoring sheet. I IMPORTANTl I Please make sure that students do not^ when marking their ansvvers. The coring. # > -< i s If you have any questions with reg the CRT, please do not hesitate to i,,^ swImm MHAMF .. SUPJ6CT I LITTLE ROCK SCHOOL DISTRICT _____ DATE _ ( MARKING INSTRUCTIONS USE_NO._ 2 ^^NCIL^pNJ^ INCORRECT MARKS CORRECT MARKS a a o C 6 M Bl N ATI ON A N S W E R SHE ET a/' [Q update student RECOHD8~} ERASE ALL CHANGES CLEANLY AND COMPLETELY . MAKE NO STRAY MARKS ANYWHERE ON THIS FORM TEST ID NUMBER @ o o o w 0 1 _ STUDENT/TEACHER ID NO. o o Q U I etQQQQQQQQQ e e0 0000000 Q 0 OO -^q1JL(^-Q----Q- - '7?(2Qci/n^ 1
16OO 2000 17 T F ! T F 3 : 18 o T F T F T F 4 O 0 0 T F 5 20 T F MULTIPLE CHOICE/TRUE-FALSE 6 O T F 1 T F 8 O T F ,90 T F 10 O T F 11 0 0 T F 12OO I 27 T F 13 0 i T F 14OO0 j 23 T F 15 0 i 30 i T F T F I T F i 19 I T F T F 21 T F 22 O T F 23 0 T F 24 0 T F 25 0 T F 26 0 T F I T F 26 O 0 T F T F r T F 31 T F 32 T F 33 T F 34 T F 35 T F 36 T F 37 T F 38 T F 39 T F 40 41 T F 42 0 T F 43 0 T F 44 0 T F 45 T F 46 0 T F 47 0 T F 48 0 T F 49 T ,F 50 0 T F 51 T F 52 T F ! 53 T F 54 0 T F 55 T F 56 T F 57 T F 58 0 0 T F 59 T F 60 0 T F 61 T F 62 T F 63 T F 64 T F 65 T F 66 T F (1 T F 68 T F 69 T F 70 T F T F 72 0 T F 73 T F 7i 0 0 T F 75 T F 76 O T F 77 0 T F 78 T F T F 80 T F 81 T F 82 0 T F 83 0 T F 84 0 T F 85 T F 86 0 T F 87 T F 88 0 T F 89 T F 90 0 T F 91 0 T F 92 T F 93 T F 94 T F 95 T F 96 0 T F 97 T F 98 0 O T F 99 T F 100 1 I I I I I I I I I I I I I 1,111 111 I II I n I I I I Ti II I I 11 I I I I I I II I I I I I I I MARKING INSTRUCTIONS DO NOT MARK IN USB NO. 2 PENCIL ONLY rj THIS SHADED AREA INCORRECT MARKS 0 @ CORRECT MARKS e e ERASE ALL CHANGES CLEANLY AND COMPLETELY MAKE NO STRAY MARKS ANYWHERE ON THIS FORM MATCHING t0OOOQ00000 1B0 QOO0000O000GOC 2O0O00O00OO0O00O0000O0000O0 170 OOOOOO0OO0OOOOO00OOOOOOO' 3O0O00O00OO0O0000000O0O00O0 18 0 0'0 0 O O 0 0 0 0 0 0 0 O 0 O 0'0 o c 1O0OO0OOOOOOO000O000O0O0OO 13O0000OOOOO0O0O0O00OOOOOOOC 5 OOOOOOO0OOO 20 OOOO0O0O00000OOOOOOC tOO0O0O00O00000000 2iOOOO0O0OO0OOOOOOOf iQ 0OOOOOOO0 22 OOOO0OO0OOOOOOC 8O0OOOOO0O0OO0OO 23OOOOO0O0OOOOOOOO'^ 9 00OOOOOOOO 24 00OO00000OO00O( :i)O OO00OOOOO0O00OOOO0O 25OOOOOOOO0OO00OOOOOOOC iiO00OOOOO0O 26 OOOOOOO0OOOOOC 12 OOOOOOOOOOOOOO0 13 0OOOOOOO0 2jQ O0OOOO0OOOOOOOOOGO 28OOOOOOOOOQOOOOO 14 O(p0OOOO0(^OOOO0 29OOOOOOOOOOOOOOOQOOOC 15 00OO0O000000Q0Q 30 OOOOOOO0OOOOOOO i I I I I i I 11 I i l l i i I I i I I i I I I i f iIiiiiiiiiiiiTiiii i iiiiii i 11 i 3L-L //z/<<7 Planning, Research, & Evaluation Instructional Resource Center 3001 S. Pulaski Little Rock, AR 72206 To: From: Date: Elementary and Junior High Principals -Aathy Lease, Asst Supt., PRE January 5,1999 Re: ACTAP for 4 and graders ACTAP will be administered February 1-4. You will receive more specific information on times and testing procedures later. Through school mail you will receive copies of the ACTAP Parent Notification Pamphlet which must be distributed to the parents of 4^ and 8^ graders prior to January 15. This pamphlet provides parents with an overview of the test and some strategies for helping their students prepare for these tests. The back of these pamphlets has a place for you to fill in the Little Rock School District name, the name of your school, and the dates of the test. Principals can decide who they want to fill in the information (students, teachers, office staff) and how (hand written, using a stamp, etc.). Please call PRE, if you have questions (324- 2121). ia 4 L~L (/ V? 7 Planning
Research, and Evaluation TO: FROM: Elementary and Junior High Principals and Counselors Dr. Kathy Lease, Assistant Superintendent, PRE Yvette Dillingham, Evaluation Specialist, PRE DATE: January 26, 1999 I 5 SUBJECT: ACTAP Benchmarks for Grades 4 and 8 In-service Rescheduled I The ACTAP in-service has -been rescheduled for January 28, 1999, in the district office Boardroom from 2:00 - 4:00 p.m. We do realize this will allow school test coordinators only one day to in-service test administrators. Please allow the time to provide this in-service on Friday, January 29, 1999. S Test materials will be available for pick-up on Wednesday, January 27 at 1:00 p.m. Please DO NOT open secured test materials until the first day of testing, February 1, 1999
however, grade 4 test administrators may prepare manipulatives as soon as they receive them. Test materials must be placed in a secured location at all times except during the time of testing. Please review and become very familiar with the Test Security Guidelines. Your patience and understanding are greatly appreciated as we prepare, not under the most optimal conditions, for the administration of grades 4 and 8 (pilot) Benchmark Examinations. If you have further questions, please contact PRE at 2120, 2123 or 2125. 5LITTLE ROCK SCHOOL DISTRICT INSTRUCTIONAL RESOURCE CENTER 3001 PULASKI STREET LITTLE ROCK, AR 72206 (501) 324-2131 June 10, 1999 TO
Les Gamine John Ruffin Kathy Lease Brady Gadberry Junious Babbs Ed Williams FROM: Dr. Bonnie LesleyrAssociate Superintendent for Instruction SUBJECT
Data Quality I think youll find the attached paper by .Glynn Ligon on data quality interesting and helpful as we search for ways to improve. Attachments BAL/rcm dataqual Page 1 of 17 EvaluatioiTSoftwafe Publishing Data Quality: Earning the Confidence of Decision Makers Glynn D. Ligon Evaluation Software Publishing, Incorporated Paper Presented at the Annual Meeting of the American Educational Research Association April 1996 New York, New York Data quality is more than accuracy and reliability. High levels of data quality are achieved when information is valid for the use to which it is applied, and when decision makers have confidence in the data and rely upon them. Summary Professionals responsible for educational research, evaluation, and statistics have sought to provide timely and useful information to decision makers. Regardless of the evaluation rriodel, research design, or statistical methodology employed, informing the decision making process with quality, reliable data is a basic goal. The definition of quality for education data has not been adequately addressed in the literature of educational research and evaluation. In the publications describing quality related to general information systems, the concept is narrowly interpreted to mean accurately J and reliably processed data. This paper ties together the foundations of data '' quality from the formal information systems literature with the practical aspects of data quality in the arena of public education decision making. A hierarchy of data quality is described to assist both the understanding of http://www.evalsoft.coin/esp/html/body_dataqual.html 6/8/99 dataqual Page 2 of 17 quality and the requirements for achieving quality. The hierarchy ranges from the availability of dysfunctional, bad data to the quality level of data-based , decisions made with confidence. For practitioners, a checklist is provided for use in determining the quality of their data sources. Readers of this paper are requested to provide the author with ideas on the topic of data quality. Comments specific to this paper, anecdotes illustrating points, or further thinking related to the pursuit of data quality are all solicited. Please communicate your reactions to: lnternet:gligon@evalsoft.com Voice:512-458-8364 Fax:512-371-0520 Mail:1510 W. 34th Street Suite 200Austin, Texas 78703 Background Data quality is essential to successful research, evaluation, and statistical efforts in public schools. As statewide accountability systems that rely upon large data bases grow, concern follows about the data quality within those emerging state-level data bases. As states and the Federal government move toward establishing data warehouses to make information available electronically to anyone, questions are raised about the quality of the data collected and stored. What is not universally sought is Federally imposed standards for data and information systems. There is broad support for voluntary standards which states and local school districts can adopt. What is needed first is a way to know when quality data are available and when caution should be exercised. (New. Developments in Technology: Implications for Collecting, Storing, Retrieving, and Disseminating National Data for Education G. Ligon, Paper Prepared for MPR Associates and the National Center for Education Statistics, November, 1995.) Decision makers at all levels are relying upon data to inform, justify, and defend their positions on important issues. What are the key criteria on which to determine data quality? Is there a logical sequence to the processes for ensuring quality in information systems? The concern for data quality is somewhat different than the slowly emerging interest in education data that has grown for decades. The concern for data quality is a sign of maturity in the field, an increasing sophistication by the audiences who use education data. In other words, first we asked "Are our students learning?" Then we had to ask "What are the education indicators that we should be monitoring?" Finally, we are asking "Now that we have some indicators, do we trust them?" (WhatDow-Jones Can Teach Us: Standardized Education Statistics and Indicators, G. Ligon, Presented at the American Educational Research Association Annual Meeting, 1993
A Dow Jones ! Index for Educators , G. Ligon, The School Administrator, December,-1993.) Nn easy point in time to mark is the release of the "Nation at Risk" report-. Much reform in education followed, including expansion of accountability http://www.evalsoft.com/esp/htmbT3ody_dataqual.html dataqual Page 3 of 17 systems within states. The search heated up for the true, reliable indicators of quality in education. A major event was the passage of the 1988 Hawkins Stafford Education Amendments that called for improving the quality of the nation's education data. From that legislation, the National Forum for Education Statistics was begun, and from that group has followed a continuing focus on data quality issues. The Forum is made up mainly of state education agency representatives, who at times include local education agency staff in their work groups. I have combined notes and observations from two decades of research and evaluation in public schools with the experiences from five years of reviewing and designing information systems for state and national education agencies. Often the question has been asked as to the definition of data quality and how to achieve it. The deliberations of the work groups responsible for the development of the Standards for Educational Data Collection and Reporting (SEDCAR), the ANSI ASC X-12 EDI standards for the electronic exchange of student records (SPEEDE/ExPRESS), and the national definition of dropout rates for the Common Core of Data collected by the National Center for Education Statistics have provided a unique opportunity to observe how quality is sought and defined from various perspectives. (Getting to the Point and Counter Point of Dropout Reporting Issues , G. Ligon, Presented at the American Educational Research Association Annual Meeting, April, 1994.) My membership on the U.S. Department of Education Evaluation Review Panel and Texas' Commissioner's Advisory Committee for Research and Evaluation has presented opportunities to relate the definitions and processes for quality data to on-going activities. One overarching observation from these experiences is that there are multiple perspectives that determine the reality of data quality. These are generally represented by: Decision Makers (parents, teachers, counselors, principals, school board members, tax payers, etc.) Program Managers (principals, directors, supervisors, etc.) General Audiences (parents, taxpayers, businesses, etc.) Data Collectors and Providers (clerks, teachers, counselors, program managers, etc.) Evaluators, Researchers, Analysts Individuals may occupy more than one of these groups simultaneously. At the risk of over simplifying, the primary perspective of each group may be described as: I Decision Makers: "Do I have confidence in the data and trust in the person providing them?" Program Managers: "Do the data fairly represent what we have accomplished?" - General Audiences: "Did I learn something that appears to be true and useful, or at least interesting?" Data Collectors and Providers
"Did the data get collected and reported completely and in a timely manner?" Evaluators, Researchers, Analysts: "Are the data adequate to . support the analyses and the results from them?" http://www.evalsoft.coin/esp/html/body_dataqual.html 6/8/99 dataqual Page 4 of 17 In this view, the burden for data quality falls to the data collectors and providers, and the evaluators, researchers, and analysts. Who else would be in a better position to monitor and judge data quality? However, in the end, the audiences (e.g., program managers, decision makers, and general audiences) give the ultimate judgment when they use, ignore, or disregard the data. Which ties in well to this paper's conclusion that the highest level of data quality is achieved when information is valid for the use to which it is applied and when decision makers have confidence in the data and rely upon them. The Pursuit of a Definition of Data Quality Four years ago, Robert Friedman, formerly the director of the Florida Information Resource Network (FIRN) and now in a similar position for Arkansas, called and asked for references related to data quality. The issue had arisen as the new statewide education information system for Arkansas was being developed. There were few references available, none satisfactory. I began documenting anecdotes, experiences, and insights provided by individuals within the educational research, evaluation, and information systems areas to search for "truths. The resultant hierarchy is one representation of what was found. This paper describes some of these anecdotes and experiences to illustrate the thinking of national, state, and local professionals. Several ideas were consistently referenced by individuals concerned with data quality. 1. Accuracy Technical staff mention reliability and accuracy. This is consistent with the published literature in the information systems area. Accuracy, accuracy, accuracy-defined as do exactly what we are told, over and over. Not all information specialists limit themselves to the mechanical aspects of accuracy
however, because they may not be content or process specialists in the areas they serve, their focus is rightfully on delivering exactly what was requested. After all, that is what the computer does for the Quality data in, quality data out. 2. Validity However, programmatic staff point out that data must be consistent with the construct being described (i.e., validity). If their program is aimed at delivering counseling support, then a more direct measure of outcomes than an achievement assessment is desired. ) Valid data are quality data. http://www.evalsoft.com/esp/html/body_dataqual.html 6/8/99 dataqual Page 5 of 17 1 3. Investment A key element frequently cited as basic for achieving quality is the reliance upon and use of the data by the persons responsible for collecting and reporting them. School clerks who never receive feedback or see reports using the discipline data they enter into a computer screen have little investment in the data. School clerks who enter purchasing information into an automated system that tracks accounts and balances have a double investment. They save time when the numbers add up, and they receive praise or complaints if they do not. Whoever is responsible for collecting, entering, or reporting data needs to have a natural accountability relationship with those data. The data persons should experience the consequences of the quality of the data they provide. This may be the most important truism in this paper: The user of data is the best recorder of data. 4. Certification Typically, organizations have a set of "official" statistics that are used, regardless of their quality, for determining decisions such as funds allocation or tracking changes over time. These official statistics are needed to provide some base for planning, and the decision makers are challenged to guess how close they are. Organizations should certify a set of official statistics. 5. Publication Public reporting or widespread review is a common action cited in the evolution of an information system toward quality. In every state that has instituted a statewide accountability system, there are stories of the poor quality of the data in the first year. Depending upon the complexity of the system and the sanctions imposed (either money or reputation), subsequent improvements in data quality were seen. The most practical and easily achieved action for impacting data quality is: Publish the data. I 6. Trust Decision makers refer to the trust and confidence they must have in both the data and the individuals providing the data. http://www.evalsofl.com/esp/html/body_dataqual.htinl 6/8/99 dataqual Page 6 of 17 1 Trust is a critical component of the working relationship between decision makers and staff within an organization. That trust must be present for data to be convincing. Consultants are used at times to provide that trust and confidence. Decision makers often do not have the time nor the expertise to analyze data. They rely upon someone else's recommendation. Data should be presented by an individual in whom the decision makers have confidence and trust. Trust the messenger. These six statements faithfully summarize the insights of professionals who have struggled with data quality within their inforrhation systems. They address processes that contribute toward achieving data qualitythe dynamics influencing quality within an information system. They do not yet clearly indicate how successful the organization has been in achieving quality. To make that connection, the following hierarchy was developed. A Hierarchy of Data Quality A hierarchy of data quality has been designed to describe how quality develops and can be achieved. The paper details the components and levels within this hierarchy. This schema is to be regarded as fluid within an organization. Some areas of information, such as student demographics, may be more advanced than others, such as performance assessments. Some performance assessments may be more advanced than others. The highest level of quality is achieved when data-based decisions are made with confidence. Therefore, several components of quality must be present, i.e., available data, decisions based upon those data, and confidence by the decision maker. Ultimately, quality data serve their intended purpose when the decision maker has the trust to use them with confidence. The traditional virtues of quality (e.g., reliability and validity) form the basis for that trust, but do not ensure it. Accuracy is the traditional characteristic defined within formal information systems architecture. Accuracy begs the question of whether or not the data are worthy of use. From the observations of organizational quests for quality information systems, the concept of official data has been described. Data are official if they are designated as the data to be used for official purposes--e.g., reporting or calculation of formulas such as for funding schools and programs. At the earliest stages of information systems, the characteristic of being available is the only claim to quality that some data have. The level at the base of the hierarchy is characterized by no data being available. Attachment A illustrates the hierarchy. ) Bad Data http://www.evalsoft.com/esp/html/body_dataqual.html 6/8/99 dataqual Page 7 of 17 -1.1 Invalid Bad data can be worse than no data at all. At least with no data, decision makers rely upon other insights or opinions they trust. With bad data, decision makers can be misled. Bad data can be right or wrong, so the actual impact on a decision's outcome may not always be negative. Bad data can result from someone's not understanding why two numbers should not be compared or from errors and inconsistencies throughout the reporting process. The definition of bad data is that they are either
Poorly standardized in their definition or collection to the extent that they should be considered unusable, or Inaccurate, incorrect, unreliable. An example of bad data occurred when a local high school failed to note that the achievement test booklets being used were in two forms. The instructions were to ensure that each student received the same form of the exam for each subtest. However, the booklets were randomly distributed each day of the testing, resulting in a mixture of subtest scores that were either accurate (if the student took the form indicated on the answer document) or chance level if the form and answer document codes were mismatched. This high school was impacted at the time by cross-town bussing that created a very diverse student population of high and low achievers. From our previous analyses, we also knew that an individual students scores across subtests could validly range plus or minus 45 percentile points. Simple solutions to interpreting the results were not available. {Empty Bubbles
What Test Form Did They Take? D. Doss and G. Ligon, Presented at the American Educational Research Association Annual Meeting, 1985.) Carolyn Folke, Information Systems Director for the Wisconsin Department of Education, contributed the notion that the hierarchy needed to reflect the negative influence of bad data. In her experience, decision makers who want to use data or want to support a decision they need to make are vulnerable to grasping for any and all available data-without full knowledge of their quality. The message here is look into data quality rather than assume that any available data are better than none. None O.OUnavailable ) Before "A Nation at Risk," before automated scheduling and grade reporting systems, and before the availability of high-speed computers, often there were no data at all related to a decision. So, this is really the starting point for the hierarchy. * http://www.evalsoft.com/esp/html^ody_dataqual.html 6/8/99 dataqual Page 8 of 17 When a local school district began reporting failure rates for secondary students under the Texas No Pass/No Play Law, one school board member asked for the same data for elementary students. The board member was surprised to hear that, because elementary grade reporting was not automated, there were no data available. (After a long and painful process to collect elementary grade data, the board member was not pleased to learn that very few elementary students ever receive a failing grade and that fewer fail in the lower achieving schools than fail in the higher achieving schools.) (No Pass - No Play: Impact on Failures, Dropouts, and Course Enrollments, G. Ligon, Presented at the American Educational Research Association Annual Meeting, 1988.) When no data are available, the options are typically obvious-collect some or go ahead and make a decision based upon opinion or previous experience. However, there is another option used by agencies involved in very large-scale data collections. The Bureau of the Census and the National Center for Education Statistics both employ decision rules to impute data in the absence of reported numbers. Missing cells in tables can be filled with imputed numbers using trends, averages, or more sophisticated prediction analyses. Decision makers may perform their own informal imputations in the absence of data. Available 1.1 Inconsistent Forms of Measurement Poor data come from inconsistencies in the ways in which outcomes or processes are measured. These inconsistencies arise from use of nonparallel forms, lack of standardized procedures, or basic differences in definitions. The result is data that are not comparable. In 1991, we studied student mobility and discovered that not only did districts across the nation define mobility differently, but they also calculated their rates using different formulas. From 93 responses to our survey, we documented their rates and formulas, then applied them to the student demographics of Austin. Austin's "mobility" rate ranged from 8% to 45%, our "turbulence rate ranged from 10% to 117%, and our "stability rate ranged from 64% to 85%. The nation was not ready to begin comparing published mobility rates across school districts. (Student Mobility Rates: A Moving Target, G. Ligon and V. Paredes, Presented at the American Educational Research Association Annual Meeting, 1992.) A future example of this level of data quality may come from changes in the legislation specifying the nature of evaluation for Title I Programs. For years, every program reported achievement gains in normal curve equivalent units. Current legislation requires each state to establish an accountability measure and reporting system. How will performance be aggregated across states? How will gains be verified by the U.S. Department of Education as mandated? Full time equivalents and head counts, duplicated and unduplicated counts, average daily attendance and average daily membership are all examples of how state accountability systems must align the way schools maintain their http://www.evalsoft.com/esp/html/body_dataqual.html 6/8/99 dataqual Page 9 of 17 records. Who is not familiar with the "problem" of whether to count parents in a PTA meeting as one attendee each or as two if they have two students in the school? 1.2Data Collected by Some at Some Times Incomplete data are difficult to interpret. In 1994, the Austin American Statesman published an article about the use of medications for ADD/ADHD students in the public schools. The headline and point of the story was that usage was much lower than had been previously reported. The person quoted was not a school district employee and the nature of some of the statistics caused further curiosity. So, I called the reporter, who said he had not talked to the District's Health Supervisor and that the facts came from a graduate student's paper. Checking with the Health Supervisor showed that only about half the schools had participated in the survey, some of those with the highest levels of use did not participate, the reporter used the entire District's membership as the denominator, and the actual usage rate was probably at least twice what had been reported. The reporter's response: "I just reported what she told me." 1.3Data Combined, Aggregated, Analyzed, Surhmarized The highest level of "available data" is achieved when data are summarized in some fashion that creates interesting and useful information. At this point in the hierarchy, the data begin to take on a usefulness that can contribute to a cycle of improved quality. At this point, audiences are able to start the process of asking follow-up questions. The quality of the data becomes an issue when someone begins to use summary statistics. One of the most dramatic responses to data I recall was when we first calculated and released the numbers and percentages of overage students, those whose age was at least one year over that of their classmates. Schools have always had students' ages in the records. Reality was that no one knew that by the time students reached grade 5 in Austin, one out of three was overage. In at least one elementary school over 60% of the fifth graders were old enough to be in middle school. (The number of elementary retention's began to fall until the rate in the 9O's was about one fifth of the rate in the 8O's.) {Do fVe Fail Those We Fail?, N. Schuyler and G. Ligon, Presented at the American Educational Research Association Annual Meeting, 1984
Promotion or Retention, Southwest Educational Research Association Monograph, G. Ligon, Editor, 1991.) _ ) When relatively unreliable data'are combined, aggregated, analyzed, and summarized, a major transformation can begin. Decision makers can now apply common sense to the information. Data providers now can see consequences from the data they report. This is an-important threshold for data quality, in countless conversations with information systems managers and public school evaluators, a consistent theme is that when people start to see their data reported in public and made available for decision making, they begin to focus energies on what those data mean for them and their school/program. http://www.evalsoft.com/esp/html/body_dataqual.html 6/8/99 dataqual Page 10 of 17 Texas schools began reporting financial data through REIMS (Public Education Information Management System) in the 198Os. The first data submissions were published as tables, and for the first time it was simple to compare expenditures in specific areas across schools and districts. Immediately, a multi-year process began to bring districts more in line with the State's accounting standards and to ensure better consistency in the matching of expenditures to those categories. When districts reported no expenditures in some required categories and others reported unrealistically high amounts, the lack of data quality was evident. DATA BECOME INFORMATION. Around this point in the hierarchy, data become information. The individual data elements are inherently less useful to decision makers than are aggregated and summarized statistics. From this point on in the hierarchy, basic data elements are joined by calculated elements that function as indicators of performance. Official 2.1 Periodicity Established for Collection and Reporting Periodicity is the regularly occurring interval for the collection and reporting of data. An established periodicity is essential for longitudinal comparisons. For valid comparisons across schools, districts, and states, the same period of time must be represented in everyone's data. The National Center for Education Statistics (NCES) has established an annual periodicity set around October 1 as the official date for states to report their student membership. Reality is that each state has its own funding formulas and laws that determine exactly when membership is counted, and most do not conduct another count around October 1 for Federal reporting. I was called on the carpet by the superintendent once because a school board member had used different dropout rates than he was using in speeches during a bond election. He explained very directly that "Every organization has a periodicity for their official statistics." That of course is how they avoid simultaneous speeches using different statistics. After working hard with the staff to publish a calendar of our official statistics, I discovered that very few districts at the time had such a schedule. (Periodicity of Collecting and Reporting AISD's Official Statistics, G. Ligon et al., Austin ISD Publication Number 92.M02, November, 1992.) 2.20fficial Designation of Data for Decision Making ) Finally, official statistics make their way into the hierarchy. The key here is that "officiar does not necessarily guarantee quality. Official means that everyone agrees that these are the statistics that they will use. This is a key milestone, because this designation contributes to the priority and attention devoted to these official statistics. This in turn can contribute to on-going or future quality. http://www.evalsoft.com/esp/htm l/body_dataqual.html 6/8/99 dataqual Page 11 of 17 Every year, our Management Information Department's Office of Student Records issued its student enrollment projection. The preliminary projection was ready in January for review, and a final projection for budgeting was ready by March. Here is another example of how the presence of a bond election can influence the behavior of superintendents and school board members. The superintendent gave a speech to the Chamber of Commerce using the preliminary projection. Then our office sent him the final projection. He was not happy with the increase of about 500 in the projection. He believed that created a credibility gap between the figures used in campaigning for the bonds and the budgeting process. So, the preliminary projection, for the first time in history, became the final, "official" projection. The bonds passed, the next year's enrollment was only a few students off of the "official" projection, the School Board was impressed with the accuracy of the projection, and Austin began a series of four years when all the projection formulas were useless during the oil arid real estate bust of the late 8O's. The next time the "official" projection was close was when a member of the school board insisted that the district cut 600 students from its projection in order to avoid having to budget resources to serve them. THE RIGHT DA TA MUST BE USED. At this point, the qualities of accuracy and reliability are required. Moreover, the best data are not quality data if they are not the right data for the job. 2.3Accuracy Required for Use in Decision Making With the official designation of statistics, either by default or intent, their use increases. Now the feedback loop takes over to motivate increased accuracy. The decision makers and the persons held accountable for the numbers now require that the data be accurate. Wh6n we began publishing six-week dropout statistics for our secondary schools, the principals started to pay attention to the numbers. They had requested such frequent status reports so the end-of-the-year numbers would not be a surprise, and so they could react if necessary before the school year was too far along. Quickly, they requested to know the names of the students that we were counting as dropouts, so verification that they had actually dropped out could be made. Having frequent reports tied directly to individual student names improved the quality of the dropout data across the schools. THE RIGHT ANALYSES MUST BE RUN. The quality of data is high at this point, and the decision maker is relying upon analyses conducted using those data. The analyses must be appropriate to the question being addressed. A caution to data providers and audiences: There are times when data quality is questioned, but the confusing nature of the data comes from explainable anomalies rather than errors. We should not be too quick to assume errors when strange results arise. A district's overall average test score can decline even when all subgroup averages rise
students can make real gains on performance measures while falling farther behind grade level
schools can fail to gain on a state's assessment, but be improving. (Anomalies in Achievement Test Scores: What http://www.evalsoft.coIn/esp/html^ody_dataqual.html 6/8/99 dataqual Page 12 of 17 Goes Up Also Goes Down, G. Ligon, Presented at the American Educational Research Association Annual Meeting, 1987.) Valid 3.1 Accurate Data Consistent with Definitions Trained researchers are taught early to define operationally all terms as a control in any experiment. Every organization should establish a standard data dictionary for all of its data files. The data dictionary provides a definition, formulas for calculations, code sets, field, characteristics, the periodicity for collection and reporting, and other important descriptions. Using a common data dictionary provides the organization the benefits of efficiency by avoiding redundancy in the collection of data elements. Another important benefit is the ability to share data across departmental data files. (Periodicity User Guide, Evaluation Software Publishing, Austin, Texas, 1996.) The classic example of careless attention to definitions and formulas is Parade Magazine's proclamation that an Orangeburg, South Carolina, high school reduced its dropout rate from 40% to less than 2% annually. Those of us who had been evaluating dropout-prevention programs and calculating dropout rates for a number of years became very suspicious. When newspapers around the nation printed the story that the dropout rate in West Virginia fell 30% in one year after the passage of a law denying driver's licenses to dropouts, we were again skeptical. Both these claims had a basis in real numbers, but each is an example of bad data. - The Parade Magazine reporter compared a four-year, longitudinal rate to a single-year rate for the Orangeburg high school. The newspaper reporter compared West Virginia's preliminary dropout count to the previous year's final dropout count. (The West Virginia state education agency later reported a change from 17.4% to about 16%.) (Making Dropout Rates Comparable: An Analysis of Definitions and Formulas, G. Ligon, D. Wilkinson, and B. Stewart, Presented at The American Educational Research Association Aimual Meeting, 1990.) 3.2Reliabie Data Independent of the Collector Reliability is achieved if the data would be the same regardless of who collected them. / What better example is available than the bias in teacher evaluations? When Texas implemented a career ladder for teachers, we had to certify those eligible based upon their annual evaluations. The school board determined that they were going to spend only the money provided by the State for career ladder bonuses, so that set the maximum number of teachers who could be placed on the career ladder. Our task was to rank all the eligible teachers and select the "best." Knowing there was likely to be rater bias, we calculated a Z score for each teacher based upon all the ratings given by each evaluator. Then the Z scores were ranked across the entire district. The adjustments based upon rater bias were so large, that near perfect ratings given by a very easy evaluator could be ranked below much lower ratings given by a very tough evaluator. The control was that the teachers' http
//www.evalsoft.com/esp/htmLl3ody_dataqual.html 6/8/99 dataqual Page 13 of 17 rankings within each raters group were the same. ) Everything was fine until a school board member got a call from his childs teacher. She was her schools teacher-of-the-year candidate but was ranked by her principal in the bottom half of her school, and thus left off the career ladder. The end of the story is that the school board approved enough local money to fund career ladder status for every teacher who met the minimum state requirements, and we were scorned for ever having thought we could or should adjust for the bias in the ratings. {Adjusting for Rater Bias in Teacher Evaluations: Political and Technical Realities, G. Ligon and J. Ellis, Presented at the American Educational Research Association Annual Meeting, 1986.) 3.3Valid Data Consistent with the Construct Being Measured The test of validity is often whether a reasonable person accountable for an outcome agrees that the data being collected represent a true measure of that outcome. Validity is the word for which every trained researcher looks. Validity assumes both accuracy and reliability. Critically, valid data are consistent with the construct being described. Another perspective on this is that valid data are those that are actually related to the decision being made. The local school board in discussing secondary class sizes looked at the ratio of students to teachers in grades 7 through 12 and concluded that they were fairly even. Later they remembered that junior high teachers had been given a second planning period during the day, so their actual class sizes were much higher. Then they moved on to focus on the large discrepancies between class sizes within subject areas to discover that basic required English and mathematics classes can be efficiently scheduled and are large compared to electives and higher level courses. In the end, the school board members became more understanding of which data are valid for use dependent upon the questions they are asking. Quality 4.1 Comparable Data: Interpretable Beyond the Local Context Quality is defined here beyond the psychometric and statistical concepts of reliability and validity. Quality is defined by use. Quality data are those that function to inform decision making. For this function, the first criterion is: Quality data must be interpretable beyond the local context. There must be a broad base of comparable data that can be used to judge the relative status of local data. We can recognize that there are some decisions that do not necessitate comparisons, but in most instances a larger context is helpful. Each time I read this criterion, I argue with it. However, it is still in the hierarchy because decisions made within the broadest context are the best informed decisions. Knowing what others are doing, how other districts are performing does not have to determine our decisions, but such knowledge ensures that we are aware of other options and other experiences. AERAS Division H sponsors an annual publications award competition to showcase the best of the nations evaluation reports. Each year, these can be seen in the Annual Meeting exhibit area. Educational Research Service http://www.evalsoft.com/esp/html/body_dataqual.html 6/8/99 dataqual Page 14 of 17 I and PDK's CEDR both disseminate these reports. The annual award recipients represent excellent examples of evaluation studies that typically provide analyses and interpretations useful beyond their local context. Most states and districts have struggled with defining and reporting their dropout rates. Despite the lofty goal often embraced of having 100% of our students graduate, there is still the need for comparison data to help interpret current levels of attrition. When we compared Austin's dropout rate to published rates across the nation, we found that the various formulas used by others produced a range of rates for Austin from 11 % to 32%. Our best comparisons were across time, within Austin, where we had control over the process used to calculate comparable rates. (Making Dropout Rates Comparable
An Analysis of Definitions and Formulas, G. Ligon, D. Wilkinson, and B. Stewart, Presented at The American Educational Research Association Annual Meeting, 1990.) 4.2Data-Based Decisions Made with Confidence The second criterion is: Data-based decisions must be made with confidence, at least confidence in the data. This is the ultimate criterion upon which to judge the quality of data-'do the decision makers who rely upon the data have confidence in them. Assuming all the lower levels of quality criteria have been met, then the final one that makes sense is that the data are actually used with confidence. This is a good time to remind us all that confidence alone is not sufficient. One reason the construct of a hierarchy is useful is that each subsequent level depends upon earlier levels. A local district's discipline reporting system had been used for years to provide indicators of the number of students and the types of incidents in which they were involved. The reports were so clear and consistent that confidence was high. /\s part of a program evaluation, an evaluator went to a campus to get more details and discovered that only about 60% of all discipline incidents were routinely entered into the computer file. The others were dealt with quickly or came at a busy time. No one had ever audited a school's discipline data. On the other hand, the dropout and college-bound entries into a similar file were found to be very accurate and up-to-date. My biases are evident in the descriptions of the ievels of this hierarchy: 1. Accurate and reiiable data should be a given in any information system. 2. Knowing the question being asked or the decision to be made is critical to ensuring that the right data are used and the appropriate analyses are conducted. 3. Beyond these more mechanical levels of quality, use is the goal. A claim of true quality cannot be made unless the data are useful, usable,' and used. Information systems professionals can be understood for ending their http://www.evalsoft.com/esp/htinl/body_dataqual.html 6/8/99 dataqual Page 15 of 17 I treatment of data quality somewhere in the middle of this hierarchy. For those who work at the decision-making level of an organization, more is required. Applying the Hierarchy to a Local School District To illustrate whether or not the hierarchy has any relationship to a real information system, I thought back three years to our data in Austin. Attachment B is a summary of my ratings of several of the information systems from that time. These ratings range from -1.1 for the misleading data available on the computers in each school, to 4.2 for the reliable and relied upon data available on lunch and transportation programs. Yes, I rated those two areas as higher quality than assessment, in which I had invested almost 20 years. Our assessment data were excellent, but we never achieved that highest level of trust and confidence afforded lunch and transportation data. Some of that might be part of the nature of school board members' uneasiness with complex-looking test scores, or the constant tirades of detractors giving individual accounts of how test scores mislabeled their students. Assessment data will always be more challenging to control than the basic counts of who eats and who rides. But take nothing away from the lunch and bus people. They used their data, depended upon them, and ensured their quality. What Can an Organization Do? A self-assessment of data quality can be conducted in each area. This can be very formal with a team approach, or very informal with a checklist kept handy for reference whenever quality issues arise. Attachment C is a sample checklist that contains the key criteria that were identified through the development of the hierarchy. The highest level of data quality would be illustrated by a positive response to each question in the checklist. The format recognizes that data quality will vary across areas and even across sub-areas within an area. The answers to the questions on the checklist may not be known or may be different depending upon an individual's role within the organization. Sections A. Statistics and B. Data Eiements match with levels 1.3 through 3.1 of the hierarchy. Positive ratings in these sections indicate a foundation for best practice in creating reliable, quality data files. Section C. Results and Interpretation matches levels 2.2 through 3.3. Positive ratings in this section indicate that the data are being analyzed and reported for use. Section E. Investment fits into the hierarchy around levels 2.2 and 2.3 where the attention focused upon the data and the use of the data by the providers are key. Section D. Confidence represents level 4.2 where use is made of the data-with confidence. - 1 Dealing with Error http://www.evalsoft.coin/esp/html/body_dataqual.html 6/8/99 dataqual Page 16 of 17 When I read this paper just before its printing, there was a sense that the higher level nature of the hierarchy did not deal well with some of the nitty- gritty issues of data quality that are usually fretted over by information systems managers and data providers. Many of these fall into the general category of error. Error can be mistakes that result in bad data or those pesky probability statistics that keep us from ever being 100% confident in our data. / have always been uncomfortable calling some of these problems errors when the reality is that they represent at times conscious decisions merely differences in how data are recorded from place to place. factors are divided below in two general categories. Error or 1. Measurement Errors Measurement errors are those imprecisions that result from our inability to be absolutely perfect in our measurements. One is the reliability of an instrument, test, or performance task (illustrated by a test-retest difference). Measurement errors can also be "intentional" as occurs when we round numbers or put values in ranges rather than use a more precise value. Sampling error limits the probability of reliable data. Measurement error is adequately dealt with in text books. Measurement error is less often adequately dealt with in practice. At times, we lose precision by translating our data from one format to another. For example, a student's course history from one high school must be translated into the standards of another high school when the student transfers. Not only might the course content and levels not match, but the credits awarded and grading system may differ. When a California school that uses three dozen ethnicity codes for its students reports to the Office for Civil Rights, those codes are crosswalked to five categories. 2. Mistakes These errors occur, and the challenge is to notice them, so they can be corrected if possible. Calculation errors, data entry errors, programming errors, and other human mistakes are best addressed with adequate training, monitoring, and redundancy. Some useful techniques for detecting errors accompany the emergence of automated information systems. We now have the ability to run edit checks on data bases to determine the reasonableness of the data. Check sums can be calculated and compared to benchmark totals. Ranges of values, valid codes, and field characteristics (e.g., alphabetic, numeric, date, etc.)'can be verified by the computer. Professionals always have available one of the best techniques-the use of estimating. Individuals who are goo estimators are those that are good at detecting potential errors. Use of trend data and comparable group data when available is helpful to judge the reasonableness of data. A perspective that has become almost universal among professionals dealing with data quality issues is that when information systems became distributed throughout organizadons rather than being centralized, that the potential for errors was also distributed. The design of a distributed information system must account for data quality checks and establish responsibility for quality. The traditional notion that data processing's responsibility for accuracy begins http://www.evalsoft.com/esp/html/body_dataqual.html 6/8/99dataqual Page 17 of 17 and ends at the computer room door changed when that "door" was distributed to multiple locations through the magic of networks. Now the organization as a whole, each department that uses information as well as each department that collects information, must take responsibility for data quality. The bottom line on statistical error is that other references have been dealing with the details of this issue for a long time. The probability issues basic to sampling and measurement error are permanent and calculable. The mistake issues have management solutions that should be employed within every organization. Conclusion The hierarchy was a convenient way to think through what makes for quality data. Reality is that our information systems will not fall neatly into one of the levels of the hierarchy. In fact they may not often evolve sequentially through each level. At any point in time, their levels may shift up or down. What is useful here is that the hierarchy describes the characteristics of relatively low and relatively high levels of data quality. With the checklist and the hierarchy, an organization can begin to examine quality issues and plan improvements as needed. http
//www.evalsoft.coni/esp/html/body_dataqual.html 6/8/99 6 To: From: Date: Planning, Research, & Evaluation Instructional Resource Center 3001 S. Pulaski Little Rock, AR 72206 Elementary Principals Kathy Lease, Asst. Supt., PRE August 17, 1999 Re: Smart Start Assessments mil A committee of folks from the Instructional Division, along with the PRE staff, met Friday, August 27, 1999 and reviewed the released items from the Smart Start disk. It was the decision of the Assessment Planning Committee that schools could use all the assessments on the disk. The group decided that schools could benefit greatly from utilizing those assessments, and that our curriculum staff could design second and third quarter CRTs. For schools who need to concentrate on improving their scores on the CRTs and the Benchmark exams, these released items will be a big asset. We are going to have to invest some time in teaching teachers who need training in how to score papers using a scoring guide (rubric). What we do know is that students will perform better on assessments, if they are familiar with the format of the questions. These released items give teachers an excellent opportunity to provide quality practice for the Benchmark exams. The other thing that we know about these released items is that they assess the curriculum that we are required by the state to teach. Therefore, how well the students perform on these assessments gives principals an opportunity to informally assess the effectiveness of the delivery of the curriculum. If you have questions about how to use the released items, please contact one of the curriculum folks from that area of expertise (literacy or math) or PRE, and well get someone to help you. If you do not have a disk, contact Marion Woods. Cc
Dr. Bonnie Lesley, Associate Superintendent Sadie Mitchell, Associate Superintendent Frances Cawthon, Assistant Superintendent 7 LESLEY, BONNIE From: Sent: To: Cc: Subject: LEASE. KATHY R. Monday, August 23,1999 12:53 PM PRICE, PATRICIA
HUFFMAN, KRIS
TEETER, JUDY
MILAM, JUDY
KILLINGSWORTH, PATRICIA
FREEMAN, ANN
CLEAVER, VANESSA
GLASGOW, DENNIS LESLEY, BONNIE
TRUETT, IRMA Smart Start Assessments Importance: High Dear Folks, We would like to initiate some conversation about the released items on the Smart Start disk and how we could use them in our CRTs. I would also like Susie Davis and Pat Busbea to join us, but they dont have email, so Pat Price and Ann, when you see them, please invite them. We would like to meet on Friday morning at 8:30 in room 18. I sent a memo to principals asking if they had folks who would be interested in working with us. We have had several responses. After this initial brainstorming session, we will add teachers, parents, and principals to a larger planning group. If you cant join us, please me or Irma (2121) know. Thanks, KL Dr. Kathy Lease, Asst. Supt. Planning, Research, and Evaluation Little Rock School District 3001 S. Pulaski St. Little Rock, AR 72206 Phone: 501-324-2122 Fax: 501-324-2126 Email: kriease@lrsdadm.lrscl.k12.ar.us 1 8 LESLEY, BONNIE From: Sent: To: Cc: Subject: LEASE. KATHY R. Friday. September 17.1999 1:43 PM ANDERSON. BARBARA
ASHLEY. VIRGINIA
BEARD. SUSAN
BRANCH. SAMUEL
CARSON. CHERYL
CARTER. LILLIE
CHEATHAM. MARY
COURTNEY. THERESA
COX. ELEANOR
SMITH. DARIAN
MITCHELL. DEBORAH
DUNBAR. ETHEL B.
DONOVAN. FAITH
FIELDS. FREDERICK
GOLSTON. MARY
HALL. DONNA
HARKEY. JANE
HARRIS. HENRY
HOBBS. FELICIA L.
JONES. BEVERLY
KEOWN. ADA
SCULL. LILLIE
MANGAN. ANN
BARKSDALE. MARY D.
MENKING. MARY
MORGAN. SCOTT
ACRE. NANCY
OLIVER. MICHAEL
PHILLIPS. TABITHA
BROOKS. SHARON A.
SMITH. DARIAN
TUCKER. JANIS A.
WARD. LIONEL
WILSON. JANICE M.
WORM. JERRY
ZEIGLER. GWEN S.
CARTER. JODIE
HOWARD, RUDOLPH
BROWN, LINDA
NORMAN. CASSANDRA R.
SMITH. VERNON
BERRY. DEBORAH
FULLERTON. JAMES
HUDSON. ELOUISE
JAMES. BRENDA
BUCK, LARRY
MOSBY. JIMMY
PATTERSON. DAVID
ROUSSEAU. NANCY MITCHELL. SADIE
CAWTHON. FRANCES H.
LACEY. MARIAN G.
BRADFORD. GAYLE NWEA Overview Darian. I dont which of these addresses is right for you! Hope you get this. It came back the first time! KL Dear Principals. I would like to invite you to an. overview session on the new pre-and post-test assessments that we will be developing for grades 3-11 with Northwest Evaluation Association. Our consultant will be here from Portland to give us a presentation on what we can expect from our new assessments and the process that will take place to design those assessments. The session will be on Sept. 23"' at 3:00 in room 19 of the IRC. I scheduled it at this time so that there would be a minimum amount of interference with your school day. should you decide to attend. I think you would find the session informative and helpful as we look at the whole assessment package that is facing us. If you cannot come at this time, there will be a morning session for curriculum and district-level folks at 9:00 in room 19 at the IRC. You are welcome at either session. Please call me if you have questions (2122). Thanks. Kathy PS-Please call your fellow principals that may not be reading their email!! Dr. Kathy Lfease, Asst. Supt. Planning, Research, and Evaluation Little Rock School District 3001 S. Pulaski St. Little Rock, AR 72206 Phone: 501-324-2122 Fax: 501-324-2126 Email: krlease@lrsdadm.lrsd.k12.ar.us 1 9F EARLY CHILDHOOD/LITERACY LITTLE ROCK SCHOOL DISTRICT INSTRUCTIONAL RESOURCE CENTER 3001 PULASKI STREET LITTLE ROCK, AR 72206 September 17,1999 TO: Principals FROM
Pat Busbea and AAnnn Freeman THROUGH: Pat Price, Early Childhood/Literacy SUBJECT: Assessment Training During our recent assessment trainings we found that several teachers have not received all the training necessary to fulfill the Districts testing requirements. We need your help in identifying these teachers so we can provide the needed training. Please provide a list of any new K-2 teachers who began working in your building after the trainings were held on August 16*^ and 17^. Also, if you are a Success For All school please list any teachers that did not attend the ELLA trainings that were held on August 16* and 17 . Both lists need to include the teachers name, grade level, and assessment trainings needed. Please send your list(s) to us by Monday, September 27. Or you may e-mail this information to Pat Price. Thank you for your cooperation. Listed below is a list of the assessments for each grade level: Kindergarten teachers need to be trained in the following assessments from Marie Clays Observation Survey: Letter identification Word test Concepts about print Writing vocabulary Hearing and recording sounds in words (dictation sentence). First Grade teachers need to be trained in the following assessments from Marie Clays dbservation Survey: Assessment Training - Memo September 17, 1999 Page Two Letter identification Word test Concepts about print Writing vocabulary Hearing and recording sounds in words (dictation sentence) Ret^nn administering the Developmental Reading Assesments by Joetta Beavers. --------- Marie Clays Observation Survey: i" ' fello^ins assessments from Word test Writing vocabulary Heanng and recording sounds in words (dictation sentence) teachers also need training in administering the ^velopmental Reading Assessment by Joetta Beavers Test by Gentry and Gillett. and the Gentry Spelling /adg i ) j i r i10 i i Memorandum LL l>l\sloc <0Mtnw I I I To: From: Principals and K-2 Teachers Pal Busbea and Ann Freeman Through: rat Price, Director of Early Childhood/Literacy Date: Re: 03/09/00 Assessment Training There will be an assessment training review for anyone who feels they need to go over the assessments before post-testing begins in April. We will review running records, the Developmental Reading Assessment and the sub-test in Observation Survey. The training will be Monday, March 20,2000 from 3:30 p.m.- 6:30 p.m. at King Elementary in toe multipurpose room. Please call Sandra in toe Literacy Department at 324-0526 to register for toe training. We will have to limit toe session to 50 participants due to space, so please notify us as soon as possible. t f' if' .F 1 t. '.i- I11LESLEY, BONNIE From: lent: Subject: NEAL, LUCY Friday, March 17, 2000 3:38 PM LESLEY, BONNIE great book Last night I had one of those wake-up-in-the-middle-of-the-night-and-worry sessions. Thinking I would read myself back to sleep I picked up How Teachers Learn Technology Best by Jamie McKenzie. It is super. He covers so many issues we are dealing with from how adults learn, how we assess technology knowledge, approaches to integrating technology and not just buying "stuff'....and on and on. I would lend it to you but I'm afraid you might not give it back and I need this book. Anyway -1 had to share it. If I buy a few more copies, would you like one? One site he lists for assessing technology is http-.l/www.bham.wednet.edu/assess2.htm. Take a look at the assessment for staff. If we could survey our staff for baseline data and then give it again each year, we would have some good stuff that would help us figure out a plan. See what you think. Enough. I think I'll go home. Lucy M. Neal, Director, Technology and Media Services Little Rock School District 3001 S. Pulaski Street Little Rock, Arkansas 72206 501.324.0577 (voice) 501.324.0504 (fax) 1 12LITTLE ROCK SCHOOL DISTRICT Planning, Research, and Evaluation . 3001 South Pulaski X Little Rock, Arkansas 72006 March 4, 2000 - TO
Elementary, Middle School Principals and Test Coordinators FROM
Ywette Dillingham, Testing and Evaluation Specialist THROUGH
athy Lease, Assistant Superintendent, PRE SUBJECT
ACTAAP Benchmark Exam (Grades 4 & 8) and Field Testing (Grade 6) We have received the following important and new information from the ADE regarding the above exams
Please make note that Intermediate Level (Grade 6) students in the LRSD will take the Writing portion of the Benchmark Examination (Field Test). LRSD was not selected to participate in the End of Course Algebra I and Geometry Field Tests. Please send home with students, the week of April 10-14, the enclosed Benchmark Parent Notification Pamphlet that explain the importance of ACTAAP and provide information on the Benchmark Exams (Grades 4 & 8) and Field Test (Grade 6). Test Administrators need to be very familiar with the content of the enclosed Test Administrators Manuals, therefore, please disseminate the manuals immediately upon receipt. Secured test materials for (Grades 4 & 8) will be delivered to schools on April 19. Secured test materials for (Grade 6) will be delivered to schools no later than April 24. If you have any questions or need additional Parent Notification Pamphlets and Test Administrators Manuals, please call me at 2123 or fax your request to 324-2126. 13r Description of the Assessment System System Overview Assessment of student achievement is partitioned as follows
1. Comparison with national norms 2. Mastery of State standards and benchmarks 3. Mastery of District standards and benchmarks The District trend has been to reduce broad utilization of norm referenced measures, such as the Stanford Achievement Test, 9* Edition and to target select grades where comparison with national norms may be most useful. Concurrently, a broad expansion of criterion referenced measurement has been instituted in a sequential and cumulative process of administration at targeted grade levels. Norm Referenced Measures of Math & Science Achievement Relative to mathematics and science, the following three measures are administered annually at targeted grade levels: SAT-9 ACT EXPLORE PLAN AP Stanford Achievement Test 9" Edition American College Testing American College Testing American College Testing The College Board Advanced Placement Test Grades 5, 7, 10 Grades 11, 12 Grade 8 Grade 10 Grades 10, 11, 12 Widely known and utilized in school systems, no description of these measures is provided here. Criterion Referenced Measures of Math & Science Achievement Relative to mathematics and science, the following three measures are to be administered armually at targeted grade levels: ALT ACTAAP CRT ACTAAP Achievement Level Test State Benchmark Examination District Benchmark Examination Grades 2-11 Grades 4, 8 Grades 3-11 End of Course Tests: Algebra I, Geometry, Biology I A brief overview of these lesser known measures is provided later in this document. Measures for Testing Mathematics Achievement The tables below were designed to clarify the assessment system. Measures for Testing Mathematics Achievement Grade Level K i 2 3 4 5 6 7 8 9 10 11 12 End of Course Total Grade Level K i 2 3 4 5 6 7 8 9 10 11 12 End of Course Total Number of Tests 0 0 2 2 3 3 2 3 4 2 5 4 2 2 34 Type of Tests Name of Tests Criterion Criterion Criterion Criterion & Norm Criterion Criterion & Norm Criterion & Norm Criterion Criterion & Norm Criterion & Norm Norm Criterion 11 criterion, 6 norm ALT, CRT ALT, CRT ALT, CRT, ACTAAP ALT, CRT, SAT-9 ALT, CRT ALT, CRT SAT-9 ALT, CRT, EXPLORE, ACTAAP ALT, CRT ALT, CRT, SAT-9, PLAN, AP ALT, CRT ACT, AP AP, ACT Algebra 1, Geometry Measures for Testing Science Achievement Number of Tests 0 0 01 12 12 21 43 31 21 Type of Tests Name of Tests Criterion Criterion Criterion & Norm Criterion Criterion & Norm Criterion &. Norm Criterion Criterion & Norm Criterion & Norm Criterion & Norm Criterion 11 Criterion, 6 Norm ALT ALT ALT, SAT-9 ALT ALT, SAT-9 ALT, EXPLORE ALT ALT, SAT-9, PLAN, AP ALT, ACT, AP ALT, ACT, AP Biology 1 2 Orientation to Criterion Referenced Assessment Measures Criterion referenced testing is an increasingly significant approach to assessment in the Little Rock School District and in the Arkansas State Department of Education in a phinful effort to measure achievement in relation to standards and their attending benchmarks. Summaries of lesser known measures are provided below. State Mandated ACTAAP Benchmark Examination, Grades 4 & 8 The State is in the process of implementing its Arkansas Comprehensive Testing, Assessment & Accountability Program (ACTAAP) which includes a Benchmark Examination containing a measure of mathematics achievement. The intent and purpose of this component is to identify students in need of additional instruction in mathematics. This examination process is being developed, piloted, and implemented in a sequential and cumulative process beginning with 4' th grade in SY 1997-98, and including 8' grade in SY 1998-99. SY 2000-01 will incorporate the math measure for b"" grade currently being piloted in other schools across Arkansas. Also end- of-course measures for Algebra I, Geometry, and Biology I are currently in the item development phase. The comprehensive mathematics component contains multiple-choice and open-response questions based on The Arkansas Mathematics, Reading, and English/Language Arts Curriculum Frame-works. Items are developed with the assistance and approval of the Arkansas Department of Mathematics Content Advisory Committee composed of active Arkansas educators with expertise in mathematics. The committee develops and reviews both multiple-choice and openresponse items to ensure they reflect the Arkansas Curriculum Frameworks and are grade- appropriate. While multiple-choice questions are scored by machine to determine if the student chose the correct answer from four options, responses to open-response mathematics questions are scored by trained readers using a pre-established set of scoring criteria. Students can receive a test score of one through four with four representing Advanced followed by Proficient, Basic, and Below Basic. Achievement Level Test (ALT), Grades 2-11 The recently implemented Achievement Level Test (ALT) includes a series of mathematics achievement measures that increase in difficulty across eight levels. This type of measurement is designed to document growth by assessing students at the cutting edge of their individual achievement level. Fall and spring administration across grades 2-11 permit measurement of growth within and across school years expressed in two kinds of scores: percentile scores and scale or RIT (Rasch Interval Scale) scores. Percentile scores can be used to compare students to the large group of test takers using the ALT developed by the Northwest Evaluation Association. It is important to note that this is a comparative group currently involving 104 schools districts 3and 500,000 students and growing 4 to 13 points annually. This is not a norm group configured to represent public school populations. More importantly, demonstration of growth within and across an individuals matriculation in grades 2-11 is documented using the RIT score designed to make direct comparisons to a criterion performance level along a scale from 160 to 250. Students typically start at a RIT score of about 170-190 in the fall of the 3"* grade and progress to the 230-260 range by high school. Students at 235 have reached a readiness level for Algebra I. It is very important to note that along the Rasch Interval Scale, scores have the same meaning regardless of the individual students grade level. This type of measurement allows some students to start at a higher RIT level and some low-achieving students to never reach the top level. The design provides an accurate measure of each students achievement where the typical standardized test, by its nature, provides inadequate measures for many students, especially those at the high and low ends of the scale. Also important is the fact that tests are aligned with The Arkansas Mathematics, Reading, and English/Language Arts Curriculum Frameworks, thus enabling the District to determine impact and effectiveness of its instructional programs. The pool of test questions, developed by the Northwest Evaluation Association, has been extensively field tested to insure items of the highest quality and fairness. A balance of math teachers and curriculum specialists (i.e., race, gender, and grade level) matched the pool of questions to the standards and their attending benchmarks included in the aforementioned Frameworks. During test development activities, questions were calibrated for difficulty and assigned to a level (e.g.. Math levels 1-8). For example
An appropriate expectation of a Level 1 student is to multiply whole numbers, while a Level 6 student should be able to multiply fractions. This calibration makes it possible to calculate the RIT score which is tied directly to the curriculum. alts are administered during the 1" and 3"* quarters and measure achievement in elementary grade and middle grade math, algebra I, algebra II, and geometry. The tests consist of multiple choice questions and while there is no time limit each test takes approximately 90 minutes to complete. A variety of reports are available and the NWEA software disaggregates the data by school, teacher, grade level, subject, gender, ethnicity, and special codes (e.g., special education and ESL). Classroom reports provide student data on RIT scores, percentile scores, and performance in relation to standards and benchmarks. District/school wide reports provide comparison data among schools. Parent reports provide RIT, percentile, and benchmark performance scores as well as an explanation of how to interpret test scores. Future developments include the addition of a Science Component to the ALT in the late Spring of 2000. Using the aforementioned process, select science teachers and curriculum specialists will draw items from a software pool provided by the Northwest Evaluation Association. This process, which can be repeated to revise the test periodically, will enable the District to ensure congruence of the science measure with the aforementioned Frameworks. First administration of 4 the science measure is planned for Fall, 2000. District Mandated CRT Benchmark Examination, Grades 3-12 Both the aforementioned ACTAAP Benchmark Examination and the Districts CRT Benchmark Examination are designed to measure a students proficiency of The Arkansas Mathematics, Reading, and English/Language Arts Curriculum Frameworks. In contrast to sequential and cumulative implementation of the ACTAAP (4* and 8'*' grades to date with 6* grade in 2001), the CRT Benchmark Examination measures mathematics achievement across grades 3-12. The Districts Department of Planning, Research and Evaluation conducted an extensive and thorough reliability and validity study using data from students taking the CRT Benchmark Examination during Spring, 1999. Study results performed by Ph.D. level research personnel document the CRT Benchmark Examination as a reliable measurement with concurrent validity. In addition, very strong evidence documents that over time this measure will get consistent results (reliability). Pearson-Product Moment correlation results were obtained using test scores from the 4* grade ACTAAP and CRT Benchmark Examinations. A test of internal consistency (i.e., reliability) was performed on the CRT Benchmark Examination. Results indicate the ACTAAP and CRT Benchmark Examinations have a positive and significant conelation (Math, .663, <.01 and Reading, .690, <.01). Alpha levels, an indicator of reliability, were .924 for the CRT Benchmark Examinations measure of mathematics achievement and .899 for the measure of reading achievement. The effectiveness of this powerful and sensitive measure has been enhanced by development of district and class summaries designed to display results in an item analysis format that connects test results for individual questions to benchmarks and their respective standards. Examination of the class report reveals how visually easy it is to identify precisely which benchmarks an individual student or groups of students need further instruction on and which they perform at the proficient or advanced level. 514 Procedures for Providing Data Analysis/Interpretation to Decision Makers System Overview Decision makers work at all levels of math and science programming. Parents, teachers, curriculum consultants, principals, program directors, various levels of system administrators, and community partners all make decisions relevant to CPMSA programming. Each cohort needs data collected and recorded for the project. The format varies from comprehensive documents laden with figures for annual reports or site visits from the funding agent to targeted fact sheets or briefs prepared for program participants such as lead teachers participating in academic support initiatives or ancillary staff coordinating academic enrichment programs. Formats For Disseminating Data to Facilitate Direct Programming A variable format from the brief to the comprehensive is needed for direct program providers. At the time of this report the academic support format involves a series of tables with a fact sheet used to record and report progress on a mathematics initiative encompassing 4* and 8* grades. This specific example of a working format designed to collect and disseminate data in a circular fashion that promotes program evaluation and program implementation in a feedback loop and is continually adapted to meet emerging needs is a particular area of expertise for the NSF Program Evaluator. This format works particularly well with direct service providers and will continue to be used. A version of this loop format is currently being utilized to facilitate expansion of the After School Science Club. Activities include and elementary and middle school teacher surveys to capture their input in decision making related to prioritizing science club content for the upcoming year to be congruent with standards and benchmarks addressed in classroom-based activities. The materials used in these two endeavors are designed to collect and disseminate data across various participants in a decision-making network related to a specific program. Table/fact sheet materials are in place to collect and disseminate data for direct providers to use their own decision making or in collaborative decsion making with classroom teachers or communitybased site participants. Formats vary for facilitating direct programming but typically they involve mechanisms that can visually convey just-in-time data to personnel applying the information directly. They are designed to promote effective functioning and to capture all critical variables with ease in recording, transmitting, processing (analyzing and interpreting) and returning processed data to participants in a timely manner to keep the loop functioning to the benefit of all. Formats for Disseminating Data to Facilitate Long Range Planning Various groups need access to the Program Evaluation Record, the repository where current dataI, are housed on the following topics: 1, 2. 3. 4. 5. 6. 7. 8. District-Wide Student Demographics & Select Statistics Enrollment Information for Gate-Keeping Math and Science Courses Achievement Data for Math and Science Academic Support Initiatives Academic Enrichment Programs Professional Development and Certification Across Teachers of Math and Science Community Engagement Resource Allocation Such groups include the following types of personnel: CPMSA program/project administrators Cabinet members Board of Education Community media Principals School facilities Counselors Assistant principals CPMSA Governing Board Other grant funded program.'project administrators Easy access to this repository is needed for direct (1) fiscal, (2) programmatic, and (3) community-based initiatives related to NSF. Additionally, these types of personnel have frequently contacted the NSF Program Evaluator for specific data to extend and enrich CPMSA planning. Personnel representing other District programs also request data they know is warehoused in the Program Evaluation Record for a variety of District needs. At the current time, a primary copy of the Program Evaluation Record is housed in the NSF Program Evaluators office and the Program Directors office. These two individuals have planned for some time to utilize an existing District approach to transmit a copy of the current edition to select personnel from list above. The approach is to distribute copies in ring binder format with periodic Information Updates forwarded by in-house mail. The Information Update can take memo form: identifying its purpose and intent and where to insert it into the ring binder. This is a mechanism all District personnel are familiar with and utilized for similar types of projects. Plans are underway to introduce the Information Update process and to distribute initial materials at the next NSF Governing Board meeting. For example, the recently completed first year evaluation of the After School Science Club would be an excellent Information Update as would the 4' and S* Grade Mathematics Initiative. 2To extend the utility of tins approach, a Project Circular in the form of a fact sheet, will be disseminated electronically using the extensive District e-mail network. Updates at periodic intervals will be emailed to personnel selected by the Co-Principal Investigators and the Project Director. It is important to note that in the fall meeting of the NSF Governing Board this type of electronic information dissemination was enthusiastically endorsed. The Districts email directory installed in each computer will facilitate this. The electronic network is there and can be easily utilized for this purpose. Additionally, a feedback loop can be implemented here to enable personnel to request data and to submit data and in this way enhance planning and decision making. Formats for Disseminating Data to Facilitate the Understanding of Data Frequently reports related to test results are distributed to coimselors, principals, and other busy personnel with little time or expertise in decoding these dense and technical manuscripts. District-wide ACT, Plan, Explore, and AP reports have been purchased and routed to the NSF Program Evaluators office where they are processed and transformed into useful, practical formats with direct application to program decision making. Currently these reconstituted test reports are housed in the Program Evaluation Record. This data can easily be disseminated in small relevant segments using a fact sheet form that is compact, visually accessible and highly useful for busy personnel who need to apply the information directly. The Information Update or Project Circular format in hard copy or electronically or both can be used to distribute this very useful information to District personnel. Marketing research has long demonstrated the value of the touch-touch method of information dissemination in contrast to the one-time bury them with data approach. Recent brain research conoborates these findings. This concept will be embedded in the approach of disseminating the analysis and interpretation of test results in small relevant segments as close to the time of administration as possible. This process will be monitored to identify modifications that will make it more effective and efficient. Formats for Disseminating Data to Specific CPMSA Program Components The NSF Program Evaluator will meet with the Project Team to identify the most effective and useful approach for processing data. Once data has been processed, it will be presented to the Project Team so they have access to program evaluation data prior to other District personnel. Evaluation reports, both formative and summative, will be included as well as a preview of Information Updates such as the aforementioned After School Science Club report or the 4* and 8' Grade Mathematics Initiative. In addition, CPMSA program personnel have related data needs that can be met through this mechanism. For example, Dr. Bonnie Lesley recently needed to access raw data on students enrolled in Pre-AP and AP courses for the past three years. She analyzed and interpreted the 3data, by school and by specific course. Her intent and purpose was to (1) report the efforts that had been made to increase percentages of African American students enrolled in these courses and (2) to identify areas needing improvement by school and subject area. The resulting information was disseminated in writing and discussion with the Division of Instruction. Plans were made to enhance curriculum and teacher training for the Pre-AP courses. A goal was established to align activities with ACT objectives and the AP syllabus and to begin reviewing materials. This is an excellent example of data driving decision making that produces a feed back loop with direct impact on programming. Formats for Disseminating Data to Meet Policy Reporting Requirements A May/June program evaluation presentation has been scheduled to inform District board members of the status and activities of CPMSA program evaluation. While this formal activity will serve to meet policy reporting requirements, it will be an appropriate time to initiate an informal reporting process utilizing the aforementioned Information Updates and Project Circulars. Summary Principal inyestigators, the program director, and the NSF program evaluator have considerable combined expertise in presenting information to stimulate its direct application in educational environments. In addition, the NSF Program Evaluator has provided technical assistance to programs nationwide related to the dissemination of data in accessible formats that promote immediate application. The combined energy and expertise will bring the initiatives described herein to a high level of quality and utility designed to promote excellence in decision making. 415 1. Orientation to the Analysis & Interpretation of Test Results District-wide Orientation A major transition is underway related to the type and number of assessment measures used to document the mathematics and science achievement of students. Two major initiatives (ALT, CRT) designed to obtain highly detailed information directly related to standards and benchmarks for all students, from primary to secondary, are so new that the results are still being processed. At the time of this report, race/ethnicity disaggregation is not yet available. These measures have great breadth and depth and are designed to located students on a scale and then measure growth. These pieces of the picture will be accurate and comprehensive based on reliable and valid measures but these pieces of the picture will arrive later. See addendum to this report, sent separately to Julio Lopez, NSF project officer, with our first figures to document CPMSA programmatic impact at the fourth grade cohort - both the current group and those who are now in 5* grade. Our first real sign of impact is powerful: the J"" and 5*** grade had gains of 1.5% and 18.8% respectively between the first administration of the CRT Benchmark Exam in 1998 and the second administration in 1999. This is a grade level cohort at which considerable resources have been targeted not the least of which are the wonderful lead teachers. At the time of this report the data were not yet disaggregated by race so we cannot yet identify how these increases are spread across the 6 race/ethnicity cohorts . Implications for us center around the fact that 3"* and 6"' grades had a decrease of 3.4% and 4% respectively. Other criterion-referenced measures (ACTAAP) are being implemented sequentially and cumulatively one grade at a time so although scores and race/ethnicity disaggregation are available they only cover two grades. While the information they provide is excellent and has great promise for expanding the achievement picture, it is a work in progress. Test results for the well-established norm referenced measures are available, particularly with race/ethnicity disaggregation. It is important not to be distracted by their availability. They have serious flaws and that is why the ongoing transition is occuring to support their results with criterion referenced data. Targeted grade levels for administering this particular measure do not capture growth and may be producing an incomplete picture of student achievement. Other norm referenced measures such as Advanced Placement are measures students self select to take. For this measure and for the Explore, Plan, and ACT tests, from American College Testing, teacher concern centers around the fact that students and their parents, particularly minority students, are not well informed about the intent and purpose of such measures. Students take these tests with little or no planned introduction to promote motivation and enhanced test performance. It is also important to note the impact on math and science during this transition. The change initiative first impacted mathematics and only now is addressing science assessment. For example, work is underway to implement criterion referenced measures, particularly the ALT. At this time the only available data in this report related to .science achievement comes from norm referenced tests: the SAT-9, Advanced Placement, Explore, Plan, and ACT. A review of this document will create the impression of an incomplete picture. That impression will be accurate and serve to underscore the transitional status of LRCPMSA activities to document mathematics and science achievement. A positive influence on this process has been created by the National Science Foundation. Not only has the initiation timetable has been speeded up but (1) measures that capture growth for each student and (2) measures that document achievement in relation to standards and benchmarks have become a major focus. The achievement picture is expanding in tandem with CPMSA. At the same time grant- sponsored programs are stimulating achievement, the Assessment System is better able to measure the growth. It is important to note that the Core Data Elements report required by NSF focuses on mathematics and science assessment results after three years of CPMSA programming. The LRCPMSA will be implementing its third year at the same time the Assessment System transition activities are coming together. In the meantime, this report has been partitioned to aid the reader in getting a clearer picture related to mathematics and science achievement at the elementary, middle, and high school levels. Across measures, the most complete picture possible at this time is presented. Particularly in the case of the new criterion referenced measures, preliminary data is all that is currently available. This information is presented as it affords a valuable preview of upcoming comprehensive data. IL Analysis & Interpretation of Mathematics Test Results Elementary Student Achievement in Math Overview Elementary student achievement in mathematics is measured by one norm referenced measure: the Stanford Achievement Test - 9'*' Edition. Three, less familiar, criterion referenced measures are also utilized: the Achievement Level Test (ALT)
the Criterion Referenced Test (CRT)
and the ACTAAP Benchmark Examination. In preparation for understanding the results of these less familiar please refer to the Description of the Assessment document contained in another section of this report. Mathematics Achievement Measured by Criterion Referenced Tests Interpretation of Achievement Level Test (ALT) Results The most recent documentation of elementary mathematics achievement are results of the recently administered Achievement Level Test (ALT). Administered in March, test takers 2involve students in Grades 2-5. District-wide summaries are available at the time of this report. However, data related to race/ethnicity are not yet available to identify the configuration of the population. Nevertheless, the available mean, median, and standard deviation document the achievement of each grade level cohort. More importantly, the first administration of this type of measure locates each of the 7711 LRSD students in grades 2 - 5 on the Rasch Interval (RIT) Scale so each subsequent fall and spring administration of the test can measure growth. Across grades 2 - 5, 7145 test takers (92.6% of the total elementary population of 7711) comprised the following cohorts: Grade 2 Grade 3 Grade 4 Grade 5 23.5% 26.0% 25.9% 24.3% 1,686 students 1,864 students 1,852 students 1,743 students The ALT utilizes the Rasch Interval Scale in which test takers typically start at a scored of 170- 190 in the fall of the third grade. Mean RIT scores displayed below are consistent with expectations for their grade. Although the range of RIT scores is not yet available for each grade, the median and standard deviation indicate little dispersion or variability of scores. For grades 2-5 scores are clustered around the mean and not widely scattered. Grade Mean Median Standard Deviation 2 3 4 5 182 194 202 208 183 194 202 208 13.46 13.49 13.26 13.60 Once race/ethnicity data are available, it will be possible to determine how much of the variability or dispersion from the mean is associated with these variables. These scores should be regarded as the starting location on the RIT scale from which growth will be measured twice annually until each student completes grade 11. Interpretation of ACTAAP Benchmark Test Results As the ACTAAP overview in Description of the Assessment System indicates, the Benchmark Examination for 4* grade has only been administered for two school years. Across the two first administrations of the Benchmark Examination, an average of 1,701 fourth graders (1,643 in SY 1997-98 and 1,760 in SY 1998-99) took the math component of this test. Complete data is available for both Baseline and Year One. Although this measure is only administered at the 4th grade level, 88.8% of the population at that grade level took the test. Available race/ethnicity data make it possible to describe configuration of the population in detail. 3The majority of 4th grade test takers performed at the basic or below basic quartiles: 74% (1208 students) in the Baseline year
78% (1366 students) in Year One. Distribution across quartiles was very stable (less than 10% change). The greatest change in the number of test takers was a 9.0 increase in those performing at the below basic quartile in Year One. Quartile 1997-98 1999-98 Difference Advanced Proficient Basic Below Basic Test Takers 10% 16% 24% 50% 1,639 9% 13% 19% 59% 1,754 -1 -3 -5 9 115 The majority of test takers were Black/not Hispanic. From Baseline to Year One the mean number of test takers in this cohort was 680 or 39.9% of the test taking population. As figures in the following table indicate, a significant majority (89.0%) of this cohort performed at the two lowest quartiles. Compared to other cohorts. Black test takers had a significant minority performing at the Advanced and Proficient quartiles. This cohort had the smallest percentage of test takers performing at the Advanced quartile. Quartiles Indian Asian Black White Hispanic 1 2- 3 4 12.5 7.0 20.5 60.0 30.5 23.5 16.5 29.5 2.5 8.0 20.0 69.0 24.5 27.5 26.5 23.5 9.0 27.5 16.5 47.5 The second largest group of test takers were White/not Hispanic with a mean of 501 (29.4%) test takers from Baseline to Year One. As in the following table indicate, this cohort was very evenly distributed across each of the four achievement levels with a slightly larger (6%) at the Proficient and Basic quartiles. The Hispanic cohort had a mean of 24 (14.1%) test takers from Baseline to Year One. A bifurcated distribution characterized this cohort: 47.5% at Basic/Below Basic and 27.5% at Proficient. The Asian cohort had a mean of 30 (17.6%) students. The majority of students (54.0%) performed at the Advanced and Proficient while 46.0% were at Basic/Below Basic. The small American Indian cohort had a mean of 14 (8.5%) across Baseline and Year One with 80.5 % at 4at Basic/Below Basic. Mathematics Achievement Measured by Norm Referenced Test Results Interpretation of SAT-9 Test Results A single norm referenced measure is administered at the elementary level - the Stanford Achievement Test - 9th Edition. Complete data are available for Baseline through Year Two. Although this measure is only administered to 5th graders, available math achievement data make it possible to describe configuration of the population in detail. Across the three administrations of the SAT-9, an average of 1575 fifth graders (1,635 in SY 1997-98
1530 in SY 1998-99
and 1560 in SY 1999-2000) took the math component of this test. Complete data is available for Baseline through Year One. Although this measure is only administered at the 5th grade level, 82.9% of the population at that grade level took the test. Available race/ethnicity data make it possible to describe configuration of the population in detail. Quartile 1997-98 1999-98 Difference 1999-20 Difference Fourth Third Second Top Test Takers 15% 23% 23% 39% 13% 20% 25% 42% -2.0 -3.0 2.0 3.0 12% 19% 23% 46% -1 -1 -2 -4 1,635 1,530 105 1560 75 The majority of test takers were Black/not Hispanic. Across Baseline and Year One the mean number of test takers in this cohort was 989 or 63.2% of the test taking population. As figures in the following table indicate, a significant majority (80.5%) of this cohort performed at the two lowest quartiles. Compared to other cohorts. Black test takers had a significant minority performing at the top two quartiles. This cohort had the smallest percentage of test takers performing in Quartile 4. Quartiles Indian Asian Black White Hispanic Other 1234 41.6 8.3 16.6 33.3 11.5 12.6 34.6 41.0 57.3 23.2 13.2 6.1 21.3 22.3 28.2 28.0 40.5 28.2 15.1 18.5 23.8 26.1 23.8 26.1 5 The second largest group of test takers were White/not Hispanic with a mean of 414 (26.4%) students. As figures in the following table indicate, this cohort was very evenly distributed across each of the four quartiles with a slightly larger (6%) number in the top two quartiles. The Hispanic cohort had a mean of 127 (8.1%) test takers across Baseline and Year One. The majority (68.7%) performed in e two lowest quartiles. The remaining 33.6% were rather evenly dispersed across the top two quartiles. Configuration of this cohort is similar to the dispersion of Black test takers across the four quartiles. The Asian cohort had a mean of 18 (1.5%) students. Configuration of this cohort is the opposite of that for Hispanic test takers. The majority of students (75.6%) performed in Quartiles 3 and 4 with a rather evenly distributed 24.3% in Quartiles 1 and 2. The Other cohort of students for whom race/ethnicity is not identified had a mean of 14 (.08%) students. Configuration of this cohort is similar to the rather even dispersion of White test takers across the four quartiles. A bifurcated distribution characterized the American Indian cohort: 41.6% in Quartile 1 and 49.9% in Quartiles 3 and 4. This small group had a mean of 12 (.006%) from Baseline to Year Two. Summary of Mathematics Achievement at the Elementary Level The ACTAAP and SAT-9 document stability in the number of test takers across race/ethnic cohorts. Additionally, these two measures identify the ongoing position of Black and Hispanic test takers in the lower two quartiles in contrast to White and Asian cohorts distributed at higher quartiles and more evenly across the four quartiles. Both measures locate 67 - 78% of elementary test takers in the two quartiles signifying the lowest performance level. Criterion referenced data currently available locate elementary students on an interval scale that will be utilized twice annually in upcoming school years to identity growth in mathematics achievement. Current mean, median, and standard deviation analysis indicate scores have small amounts of variance and that student scores are consistent with expectations for their grade. Middle School Student Achievement in Math Overview Middle school student achievement in mathematics is measured by two norm referenced measures: the Stanford Achievement Test - 9 Edition
and Explore. Three, less familiar, criterion referenced measures are also utilized: the Achievement Level Test (ALT)
the Criterion Referenced Test (CRT)
and the ACTAAP Benchmark Examination. In preparation for 6understanding the results of these less familiar please refer to the Description of the Assessment document contained in another section of this report. Mathematics Achievement Measured by Criterion Referenced Tests Interpretation of Achievement Level Test (ALT) Results As with the elementary population, the most recent documentation of middle school mathematics achievement are results of the recently administered Achievement Level Test (ALT). Across grades 6-8,4765 test takers (88.4% of the total elementary population of 5386) comprised the following cohorts: Grade 6 Grade 7 Grade 8 Algebra I Algebra II Geometry 33.6% 32.9% 26.1% 6.9% .04% .02 % 1604 students 1568 students 1246 students 330 students 30 students 14 students One 8*^ grader, a special education student, performed at an elementary level with a score of 189. This type of student would perform at the 1 quartile on the SAT-9 with no possibility of measuring growth. The ALT is able to document small increments of growth over time. Although the range of RIT scores is not yet available for each grade, the median and standard deviation indicate little dispersion or variability of scores, For grades 6 - 8 scores are clustered around the mean and not widely scattered although the variability for grade 6 and particularly for grade 7 are larger than for the 8* grade cohort and for cohorts in the elementary grades. Mean RIT scores displayed below are consistent with expectations for their grade. Grade Mean Median Standard Deviation 207 213 213 206 212 212 14.08 16.65 13.49 6 7 8 The performance of middle school students in Algebra I, Algebra II, and Geometry is displayed below: Algebra I Mean Median Standard Deviation Grade 7 Grade 8 260 252 259 252 12.95 9.09 7Algebra II Grade 8 259 259 12.20 Geometry Grade 8 264 264 9.96 Once race/ethnicity data are available, it will be possible to determine how much of the variability or dispersion from the mean is associated with these variables. These scores should be regarded as the starting location on the RIT scale from which growth will be measured twice annually until each student completes grade 11. Interpretation of ACTAAP Benchmark Test Results As the ACTAAP overview in Description of the Assessment System indicates, the Benchmark Examination for 8* grade was administered for the first time in SY 1998-99. Complete data is available for the 1497 eighth graders (85.0 of the 1772 eighth grade population) who took the math component of this test. Available race/ethnicity data make it possible to describe configuration of the population in detail. The majority of 8th grade test takers (89.0%) performed at the basic or below basic quartiles. Quartile Advanced Proficient Basic Below Basic Test Takers 1998-99 2% 9% 24% 65% 1,639 The majority of test takers were Black/not Hispanic. The 1011 test takers in this cohort were (67.5%) of the test taking population. As figures in the following table indicate, a significant majority (97.5%) of this cohort performed at the Basic/Below Basic quartiles. Compared to other cohorts. Black test takers had a significant minority performing at the Advanced and Proficient quartiles. Such a bottom-loaded distribution is typical for this cohort. Quartiles Indian Asian Black White Hispanic 1 2 3 4 0 0 17.6 82.3 9.0 27.2 33.3 30.3 1.8 2.0 17.0 80.5 5.0 25.6 41.8 26.4 0 5.7 17.1 77.1 8The second largest group of test takers were White/not Hispanic with a mean of 401 (26.7%) students. The majority of test takers (68.2%) are bottom-loaded in quartiles 3 and 4. This is a very different distribution than the 4* grade ACTAAP test takers who were evenly distributed across the four quartiles. The Hispanic cohort had a mean of 127 (8.1%) test takers across Baseline and Year One. The majority (94.2%) performed at the Basic/Below Basic quartiles. The 33 member Asian cohort (2.2% of the test taking population) was top-loaded with 63.6% at the Advanced and Proficient quartiles as is characteristic of this cohort. A similar distribution characterized the American Indian cohort
99.9% in the Advanced and Proficient Quartiles. This small group contained 17 members or 1.1% of the test taking population. Mathematics Achievement Measured by Norm Referenced Test Results Interpretation of SAT-9 Test Results Complete Stanford Achievement Test - 9th Edition data are available for Baseline through Year Two. Although this measure is only administered to 7th graders, 85.5% of the population at that grade level took the test. Available race/ethnicity make it possible to describe configuration of the population in detail. Across the three administrations of the SAT-9, an average of 1545 seventh graders (1592 in SY 1997-98
1615 in SY 1998-99
andl428 in SY 1999-2000) took the math component of this test. Complete data is available for Baseline through Year Two. The majority of test takers were Black/not Hispanic. From Baseline to Year Two the mean number of test takers in this cohort was 1013 or 65.5% of the test taking population. As figures ill the following table indicate, a significant majority (76.7%) of this cohort performed at the two lowest quartiles. Compared to other cohorts. Black test takers had a significant minority performing at the top two quartiles. This cohort had the smallest percentage of test takers performing in Quartile 4. Quartile 1997-98 1999-98 Difference 1999-20 Difference Fourth Third Second Top 17% 18% 21% 44% 16% 19% 23% 40% -1.0 1.0 2.0 -4.0 18% 18% 22% 41% 2.0 -1.0 -1.0 1.0 Test Takers 1,592 1,615 -23 1428 -187 9 The second largest group of test takers were White/not Hispanic with a mean 443 (28.6%) students. The majority (64.2% of this cohort performed in quartiles 3 and 4. Quartiles Indian Asian Black White Hispanic Other 1 2 3 4 44.4 22.2 11.1 22.2 18.7 20.3 25.0 35.9 53.5 23.2 14.2 7.3 46.8 18.9 26.6 37.6 16.8 18.8 26.6 37.6 33.3 27.5 13.7 25.4 The Hispanic cohort had a mean of 26 (1.6%) test takers from Baseline to Year Two. The Other cohort of students for whom race/ethnicity is not identified had a mean of 16 (1.0%) students. The Asian cohort had a mean of 42 (1.1%) students. The majority of students (60.9%) performed in Quartiles 3 and 4. A bifurcated distribution characterized the American Indian cohort: 44.4% in Quartile 2 and 33.3% in Quartiles 3 and 4. This small group had a mean 3 (.01%) across Baseline and Year Two. Interpretation of Explore Test Results Complete Explore data are available for Baseline through Year One. Although this American College Testing measure is only administered to 8th graders, 96.7% of the population at that grade level took the test. Across the two administrations of Explore, an average of 1715 eighth graders (1842 in SY 1997-98 and 1589 in SY 1998-99) took the mathematics, Pre-Algebra, Algebra/Geometry component of this test. Available race/ethnicity data make it possible to describe configuration of the population in detail. The 1997-98 average score of 11.9 for the District was 2.7 points lower than the national average of 14.3. In 1998-99, the average District score was 14.6 compared to the national average of 14.3. Thus in Year One, a gain of 3 points was made in the District's average score. The comparison of math scores across race/ethnic groups clearly illustrate that the White and Asian test takers performed close to the national average in the Baseline Year and well above it in Year One. Positively, each cohort demonstrated growth. Black Indian White Mexican Hispanic Asian Puerto Rican Hispanic 1997 11.7 1998 13.0 Dif 1.3 7.0 11.0 3.0 14.4 16.6 2.2 9.0 12.0 3.0 13.7 15.8 2.1 9.0 12.0 3.0 10Summary of Mathematics Achievement at the Middle School Level As with the elementary population, the ACTAAP and SAT-9 document stability in the number of test takers across race/ethnic cohorts. Additionally, the Explore measure identifies the ongoing position of Black and Hispanic test takers in the lower two levels of test performance in contrast to White and Asian cohorts distributed at higher levels of performance and more evenly across levels. At this time, these measures provide the only race/ethnicity documentation of achievement in mathematics. Criterion referenced data currently available locate middle school students on the RIT scale for middle math. Algebra I, Algebra II and Geometry. An added benefit is the ability to document growth of special education students using the ALT measure. These students would otherwise be lodged permanently in the SAT-9 quartile 1 with documentation of small increments of achievement in mathematics. High School Student Achievement in Math Overview High school student achievement in mathematics is measured by three norm referenced measures: the Stanford Achievement Test - 9* Edition
the ACT
and Plan. One, less familiar, criterion referenced measure has cunently been implemented from the master plan for the Assessment System
the Achievement Level Test (ALT), soon to be followed by Criterion Referenced Test (CRT) for end-of-course documentation of Algebra, Geometry, Concept Geometry, and Trigonometry. In preparation for understanding the results of these less familiar please refer to the Description of the Assessment document contained in another section of this report. Mathematics Achievement Measured by Criterion Referenced Tests Interpretation of Achievement Level Test (ALT) Results As with elementary and middle school populations, the most recent documentation of high school mathematics achievement are results of the recently administered Achievement Level Test (ALT). Across 9-11, 3853 test takers (54.2% of the total high school population of 7106) comprised the following cohorts: Algebra 1 Algebra II Geometry 16.9% 16.4% 20.8% 1203 students 1171 students 1479 students 11Twenty-eight special education students, performed at an elementary level and 53 at the middle math level. This type of student would perform at the I* quartile on the SAT-9 with no possibility of measuring growth. The ALT is able to document small increments of growth over time. Although the range of RIT scores is not yet available for each grade, the median and standard deviation indicate little dispersion or variability of scores. Scores are clustered more closely around the mean and less widely scattered for these specific subjects than for elementary or middle math. Student mean RIT scores displayed below for Algebra I are consistent with expectations for their grade. Grade Mean Median Standard Deviation 9 10 11 241 239 238 240 239 237 7.08 6.37 6.14 Student mean RIT scores displayed below for Algebra II are consistent with expectations for their grade. Grade Mean Median Standard Deviation 10 11 12 254 250 249 253 250 248 7.80 5.80 4.55 Student mean RIT scores displayed below for Geometry are consistent with expectations for their grade. Grade Mean Median Standard Deviation 10 11 12 248 246 246 247 245 245 6.60 5.78 6.88 Once race and gender data are available, it will be possible to determine how much of the variability or dispersion from the mean is associated with these variables. 12Mathematics Achievement Measured by Norm Referenced Test Results Interpretation of SAT-9 Test Results Complete Stanford Achievement Test - 9th Edition data are available for Baseline through Year Two. Although this measure is only administered to 10th graders, 80.7 % of the population at that grade level took the test. Available race/ethnicity data make it possible to describe configuration of the population in detail. Across the three administrations of the SAT-9, an average of 1571 seventh graders (1645 in SY 1997-98
1576 in SY 1998-99
and 1493 in SY 1999-2000) took the math component of this test. Complete data is available for Baseline through Year Two. Quartile 1997-98 1999-98 Difference 1999-20 Difference Fourth Third Second Top 15% 27% 28% 31% 34% 26% 24% 34% -19 -1.0 -4.0 -3.0 13% 24% 27% 36% -21 -2.0 -3.0 2.0 Test Takers 1,646 1,576 -70 1493 -83 The majority of test takers were Black/not Hispanic. From Baseline to Year Two the mean number of test takers in this cohort was 1070 or 55.0% of the test taking population. As figures in the following table indicate, a significant
This project was supported in part by a Digitizing Hidden Special Collections and Archives project grant from The Andrew W. Mellon Foundation and Council on Library and Information Resoources.