[ Pobierz całość w formacie PDF ]
.A collaborative approach to corpus buildingIn order to ensure that things ran smoothly during the collaborative exercise, itwas necessary ªrst to establish a number of guidelines or ground rules, and aftersome discussion an initial strategy was worked out.The strategy was thenreªned based on our experience over the course of the semester.This reªnedstrategy is described below, and it addresses the following issues: (1) coordina-tion, (2) number of texts contributed by each student per corpus, (3) quality oftexts, (4) time frame, and (5) ªle format.Coordination: In my continuing role as facilitator, I was determined not toend up being the sole person responsible for coordinating the compilation ofeach corpus.Therefore, it was decided that for each corpus, two studentswould act as the coordinators.The coordinator duties would rotate so thatdixerent students would get to act in this role over the course of the semester.Furthermore, it was decided that when students were acting as coordinators,they did not have to contribute texts to the corpus (but they still had to do theCorpus building in the translation classroom 201actual translation homework).Essentially, the coordinators were to act as asort of clearinghouse.Students in the class would e-mail their texts to thecoordinators, who would check for and eliminate duplications (i.e., caseswhere the same text had been submitted by multiple students).3 The remainingtexts would then be collated into a single corpus that would be posted on theclass Web site, where it could be accessed by all the students.Number of texts contributed by each student per corpus: It was agreed thateach student in the class (with the exception of the coordinators) would try toidentify three relevant texts that would make a good addition to the corpus.This number was considered to be a reasonable goal; however, it was not anabsolute.If a student could only identify two suitable texts, these would still bewelcome; likewise, if a student located four or ªve relevant texts, they could allbe submitted.At the time of this experiment, there were 22 students in the class, so thereasoning went as follows: if two students were to act as coordinators, that left20 students to contribute to the corpus.If each student attempted to submitthree texts, it was hoped that we would end up with a corpus of reasonable size.Even allowing for some duplication, we felt that we would likely end up with acorpus that was bigger than those corpora that individual students had previ-ously compiled.Quality of texts: The students agreed to put some time and care intoselecting their three texts.It was noted that if everyone were to simply submitthe texts corresponding to the ªrst three hits that came up using a Web searchengine, then there would be a lot of duplication and the texts might not bepertinent, which would limit the value of the corpus.This point is explored inmore detail in the Discussion section below.Time frame: It was noted that in order for the process to run smoothly, areasonable amount of time had to be given for both the contributions and thecoordination.It was agreed that each target text would be distributed threeweeks in advance.Students would have one week to identify suitable textsand e-mail them to the coordinators.The coordinators would have one weekto check for duplication, to amalgamate the texts into a corpus, and to postthis corpus on the class Web site.All the students would then have one weekto consult the corpus.Note that the students were free to begin working ontheir translation prior to being able to access the corpus, but once available,the corpus could be used to conduct further research or to verify or reviseprevious decisions.File format: It was agreed that students would e-mail their contributions tothe coordinators as attachments in plain text (ASCII) format.This decision202 Lynne Bowkerwas made for a number of reasons.First, it simpliªed the job of the coordina-tors as it meant that they did not have to worry about having access to dixerenttypes of computers or software packages and they did not have to manipulatedixerent ªle formats.Second, it ensured that the corpus would be in a formatthat could be manipulated by the corpus analysis software to which the stu-dents had access (i.e., WordSmith Tools).Finally, it also reduced the chancesof spreading viruses.Results of the collaborative corpus building exerciseLike the preceding experiments, the collaborative corpus-building exercise wasconducted as part of a fourth-year technical translation course.In order to givesome coherence to the course, the theme of computer security was selectedand seven dixerent source texts each of a dixerent text type and eachfocusing on a dixerent subject relating to computer security were chosen.Table 1 summarizes the corresponding comparable corpora that were com-piled as part of the exercise.DiscussionThis section will outline strategies used by the students in selecting the textsand compiling the corpora; di¹culties that were encountered and solutionsused to overcome them will also be discussed.In addition, some generalcomments will be made on the suitability of the World Wide Web as a resourcefor building comparable corpora.As previously mentioned, speciªc detailsTable 1.A brief description of the corpora produced as part of the collaborativecorpus building exercise.Subject Text type Texts Texts Texts/wordssubmitted rejected in corpusPasswords FAQ Web page 58 35 23 texts / 40,600wAntivirus programs Instructional 78 22 56 texts / 170,919wEncryption Informative/popularized 74 19 55 texts / 216,522wFirewalls Buyer s guide 63 18 45 texts / 136,017wSteganography Product description 35 21 14 texts / 7,401wBiometrics Research article 29 17 12 texts / 69,651wCookies Technical encyclopedia 41 19 22 texts / 11,754wentryCorpus building in the translation classroom 203about techniques used to extract translation-related information from thecorpora have already been detailed elsewhere in the literature (see Bowker2000; Bowker and Pearson 2002) and so will not be repeated here.The ªrst corpus to be constructed dealt with passwords , and the text typewas a FAQ, which is a list of Frequently Asked Questions (and answers) about agiven subject.FAQs are becoming an extremely popular and important featureof the Internet.Although they originated as a way to help new users learn the rules for using newsgroups, etc., one can now ªnd FAQs on a wide range ofpopular topics.In total, the students submitted 58 texts for possible inclusionin the corpus; however, there was a high degree of duplication and the ªnalcorpus ended up containing only 23 texts
[ Pobierz całość w formacie PDF ]