A dictionary-based account The use of dictionaries in productivity studies: A controversy |
33
09:10 صباحاً
date: 2025-01-17
|
Read More
Date: 2023-09-14
788
Date: 2023-07-29
903
Date: 2023-10-03
874
|
A dictionary-based account
The use of dictionaries in productivity studies: A controversy
Neuhaus (1971, 1973) proposes that the degree of productivity of a rule should be measured by counting the number of derivatives in a given period, which is most conveniently done by checking historical dictionaries like the OED. The use of dictionaries in the determination of the productivity of a process is, however, problematic.
For example, Baayen and Renouf (1996:69) object to the use of dictionaries in productivity studies and claim, "Dictionaries, unfortunately, are not a reliable source for studying morphological productivity". They back up this position by some general considerations on the nature of dictionaries and by pointing out the obvious failure of some dictionary-based studies to provide correct results. In particular, they cite the findings presented by Cannon (1987), and contrast them with their own corpus-derived results. Through this comparison it becomes obvious that Cannon's dictionary-based account is thoroughly inadequate. The question remains, however, whether the refutation of Cannon's analysis can be used as an argument against the use of dictionaries as such. Let us dwell on this question a bit.
Three main arguments against dictionaries as data bases for productivity studies can be put forward. First, for commercial and practical reasons, dictionaries usually do not aim at the comprehensive documentation of productively-formed, transparent words, but rather cover the more frequent and idiosyncratic items. This is quite sensible since dictionary-users need not check those words whose meaning is entirely predictable from its elements, which by definition is the case with productive formations. However, this argument against dictionaries may hold for learner dictionaries or even large desk dictionaries (such as RHCD, Webster's Third, or SOED), but certainly not for dictionaries like the OED, which aims at complete coverage (e.g. Berg 1991:4), and whose virtues and versatility lie exactly in this fact. The OED (at least before its release on compact disc) is not meant to provide quick and easy reference, but thorough and complete information on individual words and the development of the English vocabulary.
Hence, it is not surprising that in comparison to the amount of data provided by current innovational dictionaries, the OED coverage is better. For example, according to Cannon (1987) The Barnhart Dictionary of New English since 1963, The Second Barnhart Dictionary of New English, and Merriam's Addenda Section to Webster's Third New International Dictionary of the English Language contain only 19, 12, and 15 -ize neologisms, respectively, whereas the OED lists 2841 for the period from 1900 to 1985,2 including almost all of the derivatives listed in the three other dictionaries.
But, and this is the second argument against dictionaries, even if lexicographers aim at complete coverage, this intention does possibly not prevent them from overlooking new, regular formations just because they are regular. We already saw in Productivity: Definitions and measurements that productive formations tend to go unnoticed by language users (e.g. Schultink 1961). Therefore, even the OED lexicographers fall victim to the unavoidable tendency to include the more salient idiosyncratic forms and neglect the listing of regular derivatives. That this is not pure speculation can be illustrated by the almost equal numbers of 20th century neologisms listed for nominal -ness and verbal -ize (279 vs. 284, respectively), although no one would seriously doubt that -ness is far more productive (in respect to Ρ, P*, and V) than -ize. It seems that -ness forms do not strike the lexicographers' eyes to the same degree as do -ize words, for whatever reason. As we will see later, the vast majority of -ize neologisms are completely regular, and therefore reasons other than idiosyncrasy must be responsible for the saliency of -ize verbs. One might venture the hypothesis that, other things being equal, new verbs tend to be more salient than new nouns. For example, the number of nouns in English is extremely large whereas the number of verbs is comparatively small, which makes new verbs potentially more noticeable as such than new nouns. For example, the CELEX lexical data base (Baayen et al. 1993), which is based on the Cobuild corpus, lists 6582 simplex nouns and 2727 complex nouns involving the most common noun-deriving suffixes (-ee, -er, -ation, -ment, -al, -ness, -ian, -ity, -ism), but only 2581 simplex verbs and 400 complex verbs involving not only the suffixes -ize and -ify, but also the prefixes de-, re-, be- and en-. But even if there are discrepancies in the comprehensiveness of the OED's listing of different types of derivatives, this fact does not make the OED-based measure useless. In the case of -ize and -ness, the number of neologisms is so high that it is uncontroversial to state that both suffixes have been productive in the twentieth century. In contrast, the verbal suffix -en, for example, is only listed with two neologisms in this century, which is in line with the opinion unanimously expressed in the literature that -en is dead.
From all this one can conclude that the number of neologisms in the OED can reliably be used to tell productive processes from unproductive ones, but that the measure is less reliable in ranking productive processes, i.e. in determining their respective degrees of productivity. Interestingly, the OED-based measure shares this property with Baayen's measure P, the 'productivity in the narrow sense', about which Baayen says that its "primary use ... [is] to distinguish between unproductive and productive processes as such", whereas the global productivity P* is "especially suited to ranking productive processes" (1993:194).
The third argument against dictionaries is that they drag along a whole range of old complex forms that may distort the analysis because they are residues of morphological processes that have long ceased to be productive or because they are unanalyzed or reanalyzed borrowings.3 Again this argument is not valid for a historical dictionary like the OED, because the search can be restricted to a given period, so that only the neologisms of that stretch of time enter the analysis.
To summarize, the main disadvantage of a dictionary like the OED is that the sample from which the dictionary entries are collected is left unspecified and that there is some inconsistency in the sampling of individual forms. However, these shortcomings do not make the OED unsuitable for productivity studies. As we have seen, if very few neologisms are attested in a rather long period of time, this is a strong indication of unproductivity. Hence, the dictionary-based measure is a reliable instrument for distinguishing between productive and unproductive processes. In conclusion, comprehensive historical dictionaries like the OED should be able to yield significant results in productivity studies, contrary to the claims by Baayen and Renouf (1996). The comparison of dictionary-based and text-based measures below will substantiate this claim.
After the review of arguments against dictionary-based productivity accounts, we may now briefly examine three points where a dictionary-based measure appears superior to text-based measures. First, lexicographers of extremely large dictionaries like the OED scrutinize larger quantities of texts than hitherto available electronic corpora can provide, which may lead to a more comprehensive picture of the derivational patterns. For example, the number of Cobuild hapaxes involving verbal affixes is much smaller than the number of neologisms in the OED.4
Another advantage of the use of a dictionary lies in the circumvention of complex statistical calculations. The problem is of course not the calculation of the different measures itself, which is a fairly easy task if one has a modern calculator at hand, but lies in the determination of the tokens and types upon which the calculation should be based. Whereas with the dictionary data, the decision to count a form as belonging to the given derivational category only means the addition of one item, a few types more or less in the text-based calculation may result in significant changes in the overall number of tokens of a category.
Perhaps the most important positive trait of the OED-based measure is that the productivity of zero-derived items can be determined, which is impossible to do with the computer-programs used for the analysis of the text-corpora. Thus, under a strict text-based approach the degree of productivity of conversion cannot be calculated at all. In view of the importance of conversion as a verb-deriving process, this is the most serious drawback of the text-based measures.
1 This figure excludes the 62 neologisms in -ize that are only listed as participles (-izing/-ized). See the discussion below for details.
2 Even if the period of sampling is perhaps longest for the OED, this does not suffice as an explanation of these discrepancies.
3 Note that this is also a serious problem for text-based approaches, as will become clear below.
4 The hapaxes are significant, because the proportion of neologisms is highest among the types with the lowest frequency, especially hapaxes.
|
|
علاج جفاف وتشقق القدمين.. مستحضرات لها نتائج فعالة
|
|
|
|
|
الإمارات.. تقنية رائدة لتحويل الميثان إلى غرافين وهيدروجين
|
|
|
|
|
بالفيديو: منها رفع الراية الزينبية السوداء في صحن العقيلة (ع).. فعاليات متنوعة تقيمها العتبة الحسينية ضمن فعاليات مهرجان الليالي الزينبية
|
|
|