Sunday, May 24, 2009

Type Token Ratios and Success on the Effective Writing Test

I was just reading about the components of lexical richness in John Read’s book Assessing Vocabulary (Read, 2000) when I thought I’d write about one aspect of lexical richness in relation to the performance of NNES students on the Effective Writing Test at the University of Calgary. One of my key findings is that while about 70% of native English speaking (NS) students pass this test on their first attempt, only 23% of NNES students whose first language is of East Asian origin (Chinese, Japanese, Korean, Vietnamese, Laotian) actually pass on their first attempt. Obviously something is affecting the scores of these NNES students. Roessingh’s work (2008) with NNES high school students looking at the role vocabulary plays as an underlying variable in determining success on the written response component of the Grade 12 English 30-1 Diploma examination is one of the factors that has led me to analyse the vocabulary use of novice undergraduate writers at university.

One aspect of my analysis is the type to token ratios of NS versus NNES students on the test. Type to token ratios look at the number of unique words students use in comparison to the total number of words that they use. In his book, Read (2000) explains how students with a high type to token ratio use a variety of different words in their writing. Students with lower type to token ratios are using a limited number of words repetitively in their writing. Through the use of low frequency words, synonyms, hyponyms, and specific vocabulary items, good writers are able to tap into their larger vocabulary knowledge in order to convey a more precise meaning. Another term for type to token ratios is lexical variation.

Looking at my own research into lexical variation (please see the chart accompanying this blog), I found that NS students wrote significantly shorter essays than NNES students. However, the number of unique words employed by each of these groups was statistically the same. This resulted in NS students having a higher type to token ratio than their NNES counterparts. In other words, the NS students were employing a greater range of expression than the NNES students in their essays, and this significant difference may be one of the factors contributing to the low success rate of NNES students on the Effective Writing Test.

It was interesting to see that the NS students were writing shorter essays with more variation than the NNES students. In addition to being repetitive, another reason behind why the NNES students were writing longer essays includes the necessity of using circumlocution when a precise term wasn’t lexically available. More words are needed than necessary to express ideas due to a lack of vocabulary. However, the NS students typically have access to a much larger lexicon, enabling them to employ greater lexical variation in their writing.

On the Effective Writing Test rubric, repetitious diction is one of the aspects of word use that markers are evaluating. While circumlocution isn’t overtly marked according to the rubric, the use of “too many words” is a key component of the word use category. This is what drew me to consider the lower type to token ratios as being a possible underlying factor to the overall quality of undergraduate compositions.

These findings seem to mirror the findings of Cheryl Engber (1995) who was looking at the relationship between lexical proficiency and reader judgments of the overall quality of timed essays. Engber found that reader judgements of the overall quality of these compositions did reflect lexical variation. This leads me to conclude that a similar effect may be taking place in the Effective Writing Test, with the lower lexically varied essays of NNES students having less success than the more lexically varied essays of their NS counterparts.

Some References:

Engber, C. (1995). The relationship of lexical proficiency to the quality of ESL compositions. Journal of Second Language Writing 4(2), 139-155.

Read, J. (2000). Assessing Vocabulary. Cambridge: CUP.

Roessingh, H. (2008). Variability in ESL Outcomes: The Influence of Age on Arrival and Length of Residence on Achievement in High School. TESL Canada Journal 26(1), 87-107.

Wednesday, May 13, 2009

Creating a corpus of first year university academic writing

Preparing my corpus

This past Friday I presented at the British Columbia Teachers of English as an Additional Language (BC TEAL) conference. The topic was “Comparing non-native and native English undergraduate vocabulary in writing”. The first part of my talk dealt with the creation of the corpus that I used for my analyses.

The focus of my presentation was on what lexical frequency based analyses reveal about active vocabulary breadth of knowledge in novice native English speaking (NS) and non-native English speaking (NNES) undergraduate writing. By novice, I mean I am looking at first year students at the University of Calgary who have not yet passed the Effective Writing Proficiency Requirement (

In order to investigate this question, I’ve gathered a corpus of writing, which I’m calling the Effective Writing Corpus. The writing samples in the corpus come from the Alberta Universities’ Writing Competence Test, also called, the Effective Writing Test (EWT). The EWT is a test administered to first year university students at the University of Calgary, the University of Lethbridge and Athabasca University. The test is designed to look for university level writing competence, and it is administered to all students who are entering university with less than a score of 75% on the English Language Arts 30-1 Diploma exam, or less than a blended grade of 80% on the blended grade of the diploma exam and the class score (50-50 split). Students who enter university with higher scores than these are exempt from the EWT. There are also other ways students are exempt, such as achieving a score of B- in a first year English course ( In total the test is sat approximately 2250 times each year, with some of those sittings being repeated attempts to pass the test by the same students.

The EWT itself takes the form of a persuasive or expository essay answering one of four questions. These questions tap on a general body of knowledge, and no specialized knowledge is needed to answer the questions. An example of a question on the EWT might be along the lines of “Should the Government of Alberta institute mandatory physical education courses from kindergarten to Grade 12?” The essay answer written by the student should be around 400 words, and the markers are looking for university level writing competence. Some of the key points markers pay attention to include logical arguments, clear organizations, well developed paragraphs, well constructed sentences, accurate word use, and correct grammar, spelling and punctuation. English language dictionaries are permitted in the test, and the students have two and a half hours to complete their essays.

The corpus I am building focuses on the academic year of 2003/2004. Out of the approximately 2250 tests that were written that year, 561 NS students and 184 NNES gave permission for their tests to be used for research purposes. This is approximately 33% of the total amount of tests written in that year. Out of the NNES papers, 40 different languages were represented in the raw data. Out of these 40 languages, by far the greatest numbers of students had Chinese, Arabic, Spanish, and Punjabi as their first languages. Chinese was the largest group of all NNES students.

Breaking the students down into their constituent first languages reveals some interesting results in terms of performance on the EWT. 70% of all NS students who write the EWT pass on their first attempt. If we look at all the NNES students, except for those whose first language is of East Asian origin, 47% of NNES students (minus East Asian languages) pass the EWT on their first attempt. Finally, if we look only at students with a first language originating in East Asia, only 23% of those students pass the EWT on their first attempt. It is also interesting to note, that at the end of the academic year, there are about 700 students who still have not completed the Effective Writing Requirement. out of those 700 students, approximately 90% (630) are NNES. If approximately 75% of NNES students are of East Asian origin, that means about 470 NNES students of East Asian origin are still struggling to complete the Effective Writing Requirement by the end of the school year, and face being blocked from registering in their second year classes.

It is because of the struggles NNES whose first language is of East Asian origin face in passing the EWT that I have decided to focus on this group of students for my study. By focusing on this group of students, 79% of the papers in the NNES sub-corpus are written by students with Chinese (Cantonese and Mandarin) as their first language. The rest of the NNES sub-corpus is made up of Korean, Vietnamese, Japanese and Laotian. The NNES students have varying lengths of residence in Canada, ranging from 14+ years, 10-13 years, 7-9 years, 4-6 years, and less than 3 years. Each of these cohorts contains between 11 and 20 students.

The two sub-corpora (NS and NNES) also revealed some differences in faculty enrolment and topic choice between the two groups of students. The top three faculties of enrolment for NS students at the time of writing were Communication and Culture, Science, and Social Science. The top three faculties of enrolment for NNES students at the time of writing were Social Science, Science and Engineering. The top three topics for NS students were Physical Education, Computers, and Urban Growth. The top three topics for NNES students were Physical Education, Computers, and Being Ready for the Workforce.

Before I could begin my analysis, I had to prepare the raw data. The EWT is a handwritten test in official University of Calgary exam booklets. All the tests were typed and converted into text files for computer storage and analysis. As the papers were being typed, they were corrected for spelling, with spelling errors being noted on the original raw data. Proper nouns, such as of people and places, were recategorized into the first one thousand most frequent words of English. Semantic and derivational errors were also recategorized into the first one thousand most frequent words of English. Doing this prepared the data for linguistics analysis using various tools found on the Compleat Lexical Tutor website (Cobb, 2009).

Monday, May 11, 2009

The U of C Graduate Conference

This is the poster I'm presenting at the U of C Graduate Conference this May. I'm basically looking at the breadth of productive vocabulary in the writing of native English speaking and non-native English speaking first year university students. For the conference, I've had to weave in the themes of innovation and sustainability as well.