Distribution of verbal overgeneralizations in the Serbian Corpus of Early Child Language

The study is aimed at exploring the occurrence of innovative verb forms recorded in early spontaneous children’s production of Serbian – a language with rich inflectional and derivational morphology. The overgeneralized verbs were retrieved from the corpus of eight children’s production, longitudinally recorded from 1;6 to 4;0, and the developmental patterns of the distribution across age were explored. The analysis shows that overgeneralizations *hoćem ‘want’ 1.sg.pres., *nećem ‘want’ 1.sg.pres., *možem ‘can’ 1.sg.pres. and *bidem ‘be’ 1.sg.pres. are the typical representatives of early overgeneralizations in Serbian. They are typically recorded in all children’s speech samples with relatively high frequency. The overgeneralized *hoćem and *nećem developmentally precede all other overgeneralized verbal forms, but disappear earlier than others. The overgeneralizations of contentive verbs, usually hapaxes, spread later and retain their position long after. The findings are discussed in comparison with the previous findings on overgeneralizations, with particular attention to the developmental patterns they exhibit.

The aim of the present study is to describe the occurrence and developmental patterns in the production of verbal morphological overgeneralizations in the course of acquisition of a morphologically rich 292 PSIHOLOŠKA ISTRAŽIVANJA VOL.XX 2 language.In linguistics, overgeneralizations are typically defined as the application of a grammatical rule in cases where it does not apply (Nordquist, 2017).In contrast to analytical languages, such as English, Serbian is a fusional language with a highly developed inflectional and derivational morphology.Since both are the most important sources of structural information and meaning, it is worthwhile to explore a sample of children's innovative word forms at the moment of their composition.Overgeneralizations in early language production reveal that a child recognizes a meaning as appertaining to a certain class of words, wields some knowledge on structural regularities in his/her first language, and uses the regularities in his/her efforts of a newword formation.Overgeneralizations in spontaneous children's production uncover the connection between the form and function at the moment of its emergence, which reveals the regularities known to a child, and aspects of mother tongue grammar that are still to be acquired.
Considered related to the question of negative evidence and the so-called "logical problem of language acquisition" (LPLA) (Baker 1979;Baker & MacCarthy, 1981;Pinker, 1984), overgeneralizations have become a battlefield for significant theoretical dispute in the recent decades.In generative linguistics, children's productivity is considered to be the result of abstract knowledge that moulds the new coming words into the rules of the input language.In efforts IN THE SERBIAN CORPUS OF EARLY CHILD LANGUAGE to explain the retreat of overgeneralizations in the course of development, the thesis of direct blocking of overgeneralization by the corresponding irregular form was offered and supported in the field (Anderson, 1977;Baker, 1979;Baker & MacCarthy, 1981;Clark, 1983;Pinker, 1984).However, it was also opposed by a more general Competition Model (MacWhinney, 1988;1989;1993), in which overgeneralizations are considered as subject to three types of pressures: the analogic pressure that rouses overgeneralizations, the auditory representations from the episodic support that reigns them back, and the competition between them.The role of analogy was also suggested in the single-route model (Bybee & Moder, 1983) and supported in the connectionist accounts (Albright & Hayes, 2003;Ambridge, 2010;Rumelhart & MacClelland, 1986;Skousen, 2001), which advocated that rules are superfluous if parallel distributed processing is proposed.
The debate referred primarily to the English grammatical forms, so the models offered for one language might not be necessarily applicable to other languages.Important insights into the verbal overgeneralizations in a morphologically complex language were provided by a study of early Spanish verbal overgeneralization errors, which has shown that overgeneralizations are possible in both the stem/root and suffixes (Clahsen, Aveledo, & Roca, 2002).Children overapplied regular forms to irregular, but not vice versa, with the onset linked to the appearance of obligatory finiteness markings.The low frequency irregular verb forms yielded more errors than the highfrequency ones.Similarly, Aguirre (2003) reported that one particular class of verbs in Spanish (the most salient one) became preferable once the studied child entered the proto-morphology stage.
The significance of language typology was emphasized in the framework of Natural Morphology (Dressler, 1985;Dressler, 2005), proposing that grammatical modules are not innate, and that children gradually learn the grammar from the input.The children acquiring a system with rich morphology are more tuned to morphology than the children acquiring the languages of simple morphology.The typological properties of a particular language are essential for the course of development because they are acquired already at the proto-morphological phase.Productivity and transparent morphology facilitate the early start of the usage of language specific structural properties (Bittner, Dressler, & Kilani-Schoch, 2003;Dressler, 1985;Dressler, 2005;Hržica, 2011;2012;Katičić, 2003;Radisavljević, 2013).
The studies in morphologically rich languages cited above varied from spontaneous production to narratives, and were conducted on a very small number of children (1 to 3), except for Clahsen et al. (2002).In addition, it is quite difficult to compare the distribution of the first occurrence and developmental patterns of overgeneralizations across age levels, given that the studies differed regarding the age span, e.g.1;7-1;10 (Aguirre, 2003); 1;10-2;2 (Anđel et al., 2000); 1;2-3;2 (Hržica, 2012).The phenomenon of overgeneralizations in early child language is yet to be explored in Serbian, which makes the description of their first occurrence and distribution in a larger number of children highly significant.The Serbian Corpus of Early Child Language enables a longitudinal exploration of the spontaneous production of a relatively large number of children (8) in a relatively wide age span (16 age levels between 1;6-4;0).

Verbal morphology in Serbian
Before we present our study, a simplified overview of Serbian verbal morphology is necessary.The verbs in Serbian are generally inflected for person (1 st , 2 nd , 3 rd ), number, and gender (marked in participles only), as well as tense and mood. 4They consist of a stem and an inflection, where the stem comprises a root and a thematic vowel, as in (1): ( Thematic vowels and consonants intervene between the root and the inflection (e.g.pev-a-ti 'to sing ' , misl-i-ti 'to think' , trep-nu-ti 'to blink').They are responsible for the choice of a conjugation pattern a given verb belongs to (Radisavljević, 2013).
In order to make a verbal form in Serbian, the inflectional ending is added onto the present or infinitival stem.The present stem is formed by omitting the inflectional ending -mo from the 1.pl.pres.(e.g.pev-a-mo 'sing' 1.pl.pres.dim -diminutive, refl -reflexive, neg -negative, imperf -imperfective aspect, perfperfective aspect.

IN THE SERBIAN CORPUS OF EARLY CHILD LANGUAGE > peva-).
There are two ways of making the infinitival stem -by omitting the infinitival marker -ti in the verbs ending in -ti preceded by a vowel (e.g.pev-a-ti 'sing' inf.> peva-) or by omitting the inflectional ending -oh from the 1.sg.aorist in the verbs whose infinitive ends in -ti preceded by a consonant (tresti 'shake' inf.> tres-oh 1.sg.aorist> tres-) or in -ći (seći 'cut' > sek-oh 1.sg.aorist> sek-) (Stanojčić & Popović, 2000: 108).Among the verbal forms relevant for this paper, the present and imperative are built from the present stem, whereas the past participle (used in the perfect) and synthetic future are built from the infinitival stem.The examples of forms which emerge in children's production are given below, exemplified by the verb pevati 'sing' .
According to the traditional Serbian grammars, there are seven conjugational classes, differing in morphological complexity (Table 1).

Conjugational class Present stem ending Infinitival stem ending Example
A different approach towards the inflectional verbal classes in Serbian is taken by Radisavljević (2013).Following the model of Natural Morphology proposed in the classification of verbs in Croatian (Dressler, Dziubalska-Kołaczyk, & Katičić 1996), the author describes the most productive verbal microclasses in Serbian in terms of inflectional productivity.
Apart from the rich inflectional morphology, Serbian has a productive system of derivational morphology, which in case of verbs serves as an aspectual marker.Traditionally, Serbian verbs are divided in two classes: perfective and imperfective.According to Arsenijević (2006: 202): "the stem verb is normally imperfective [...].Adding a prefix to a stem verb contributes a lexical meaning (often even causing a shift in the lexical meaning of a verb), and it makes the verb perfective.[...] Adding a suffix to a perfective verb (even to a perfective stem verb) makes the verb imperfective.The suffix does not contribute any lexical meaning".For example, pevati imperf 'sing' , zapevati perf 'start singing' , otpevati perf 'finish singing' .

Aims
Having in mind the findings of previous studies in different languages (e.g.Albright & Hayes, 2003;Anđel et al., 2000;Bittner et al., 2003;Bybee & Moder, 1983;Hržica, 2012;MacWhinney, 1988;1989;1993;Rumelhart & MacClelland, 1986;Skousen, 2001), the following distributional regularities in Serbian overgeneralizations are to be expected: a.First verbal overgeneralizations are highly expected at the earliest age level of the Serbian corpus (1;6).Since they announce the beginning of grammatical development, the onset of overgeneralizations may appear at different age levels in individual children.b.The most prevalent form of overgeneralizations is the overregularization of irregular forms (cf.Clahsen et al., 2002;Hržica, 2012).c.The role of analogy in the overgeneralizations (e.g.MacWhinney, 1988;1989;1993) is expected, in terms of using inflections or thematic vowels analogous to other forms or verb classes evident from the input (cf.Hržica, 2012;Katičić, 2003).d.Overgeneralizations are a temporary phenomenon in the acquisition of verbal morphology, and it is possible to track the onset, prevalence and ending point of their occurrence in the children's production.
The appearance of overgeneralizations and neologisms in early production of a synthetic language reveals the developmental changes in the perception of words from the language input, and uncovers the developmental beginning of word parsing.An exploration of the formal characteristics provides insight into the parsing of words (root vs. derivation vs. inflection), which are generally perceived as units.Due to the scope of the paper, which is restricted IN THE SERBIAN CORPUS OF EARLY CHILD LANGUAGE to the distributional properties of early overgeneralizations in Serbian, formal characteristics are going to be only briefly discussed, while a detailed linguistic analysis will be provided in a prospective article (Authors, in preparation).
The corpus consists of app.1.000.000words in total, out of which more than 235.000 have been produced by children.An automatic lemmatization of words was conducted on the basis of the Frequency Dictionary of Contemporary Serbian Language (Kostić, 1999), which is a procedure that produced only 4% of errors in the case of adult language (Ilić & Kostić, 2003a, 2003b).When a sample of children's production is targeted, the phonological deviations from the conventional form of words request for additional manual work and annotation, especially in the samples of young ages.The CLAN programs FREQ and KWAL of the CHILDES Project were used in order to retrieve the children's verbs and check for their linguistic and situational context.The analysis of the verbal forms properties revealed overgeneralizations as innovative categories, which will be concisely described.The distribution of their usage was inspected across age levels and discussed from the developmental point of view.
The sample for the corpus consisted of eight monolingual Serbian children (4 boys, and 4 girls), whose spontaneous speech production was longitudinally recorded from 18 to 48 months of age.The recordings were made at 16 age levels, approximately at 2-month intervals, and lasted for 90 minutes each.The children belonged to the families of middle SES, living in Belgrade and Banja Luka, equally distributed.Three children had an older sibling, five were single.In four families, at least one parent was highly educated (university or college), while all others had a high school degree.
Table 2 presents the basic measures of language development in the sample: a. the Mean Length of Utterances calculated by words MLUw (in order to exclude numerous exclamations, non-linguistic expressions, and simple nominations, only utterances containing verbs were included); b. vocabulary size estimated by the average number of types (lexical entries) at all ages; c. vocabulary size (N of types) at the earliest age level (1;6).and where is the eye *JEL: nećem@z:nv !%mor: v:neo|nećem !%eng: 'I do not want to.' %act: uzima slikovnicu sa stola %eng: taking the picture book from the table

Prevalence of verbal overgeneralizations in the SCECL
An approximate prevalence of children's overgeneralizations estimated on the automatically lemmatized corpus is presented in Table 3.A manual itemized analysis has been performed on verbs, while other parts of speech are to be explored in detail in the future.The main reason is the fact that verb overgeneralizations are an extensive and quite productive group of overgeneralization errors, which effectively reflect the complexity of the grammar.Overgeneralizations retrieved from the corpus belong to the following categories of verbs: modals (e.g.moći 'can'), auxiliaries (e.g.biti 'be') and contentive (lexical) verbs (e.g.doneti 'bring').
The exact number of reported overgeneralizations among verbs (calculated in types) in the child language of the SCECL is 61, with the overall frequency of 126.Having in mind the overall children's production, it is obviously a rare phenomenon in Serbian, just as it is in other languages.Nevertheless, the information that every twentieth verb calculated in the types (4.9%) in early child language is overgeneralized illustrates the importance of the analysis that follows.

Morphological properties of early verbal overgeneralizations
The inspection of inflectional and derivational properties of overgeneralized verbs revealed four relatively homogenous categories of overgeneralization errors recorded in spontaneous Serbian children's production: stem overgeneralization, inflectional overgeneralization, changes in derivation and compounding, and idiosyncratic verbs (neologisms).Having in mind the lack of studies of overgeneralizations that would report on the difference between various types of overgeneralization errors, the categories reported herein were mainly differentiated by relying on the method of induction.The primary criterion was to explore whether the errors were made at different formative parts of the verb: the inflection, the stem, the prefix or the suffix.a. Stem overgeneralizations.This category contains two subcategories.Firstly, the stem of another verbal form of the same verb is overgeneralized, while the inflectional ending of the new form remains typical for the given form.The most prominent examples in this group are *bidem < budem ('be' 1.sg.pres.),*bideš < budeš ('be' 2.sg.pres.), and *bide < bude ('be' 3.sg.pres.).The root of the stem was produced by analogy with the infinitival stem bi-in the infinitive bi-ti and/or past participle bi-o, bi-la, bi-lo, etc. 9  Apart from the auxiliary biti 'be' , numerous contentive verbs also exhibit this type of stem overgeneralization: *donesti < doneti ('bring' inf.), *odnesećemo < odnećemo ('take away' 1.pl.fut.), *popnićeš < popećeš ('climb' 2.sg.fut.), *kažila < kazala ('tell' f.sg.ppart.),etc., which in the last case leads to the consonant mutation.The typical pattern observed within this category is the use of the present stem instead of the infinitival one of the same verb, and vice versa.
The second type of stem overgeneralization has been registered in the verbs in which a thematic vowel is substituted by a vowel typical for some other conjugational class: *donesam (analogous to class V A pevam) < donesem ('bring' 1.sg.pres.),*izmišljem (analogous to class I tresem) < izmišljam ('think up' 1.sg.pres.),*namazam (analogous to V A pevam) < namažem ('spread on' 1.sg.pres.).It is this type of overgeneralizations which is typically studied in morphologically complex languages (cf.Anđel et al., 2000;Hržica, 2012).b.Inflectional overgeneralizations.This category includes the cases of an inappropriate inflection added to a verb root by analogy with other inflected verb forms. 10Typical examples are the forms *hoć-e-m, *hoć-a-m, or *hoć-i-m 'want' 1.sg.pres.used instead of the target form hoć-u.11 It is also the case with *neć-e-m 'want.neg'1.sg.pres.< neć-u, and *mož-e-m or *mog-a-m 'can' 1.sg.pres.< mog-u.These are modal/auxiliary verbs which deviate from the Serbian verb paradigm in 1.sg.pres., given that these three verbs are the only ones to bear the ending -u instead of -m in that form.This irregularity probably causes the children's inclination to use the more frequent inflection -m instead of the low frequent -u.
The analysis has also revealed certain patterns of analogy: e.g. with the conjugational class I (tresem) in the case of *hoćem and *možem, the conjugational class VI (nosim) in the case of *hoćim, or the class V A (pevam) in the case of *hoćam and *mogam.Therefore, these examples are not just the cases of an inflectional, but also the stem overgeneralization.As will be shown later, this small group of verbs makes the most frequent kind of overgeneralization in Serbian children's production.
A more detailed analysis of the formal properties of all categories of overgeneralizations described above is to be provided in a future study (Authors, in preparation).

Developmental patterns
Although the overall number and frequency of overgeneralizations does not allow us to draw conclusions on the underlying language acquisition mechanisms, the data provide an opportunity to search for the patterns in usage and point out the relevant developmental regularities.
All eight children in the corpus used overgeneralized forms; however, some children appear to be more productive when it comes to the number and frequency of various overgeneralized forms and the number of different categories (cf.Table 4).Even though the earliest instances of overgeneralizations are found already at the youngest age in the sample (1;6), it is only a property of the two most talkative children: ANA and LUK have the largest vocabulary size in the sample and are among the three children that produce the longest utterances (Table 2).The others start with overgeneralizations later (JEL 1;8, ANE 1;10, LAZ 2;2, NIK 2;2, DAC 2;10, MIL 2;10), which shows that they do not appear at the stage of children's earliest utterances, but probably only with the onset of grammatical marking, which will be explored in more detail in the future.
All other overgeneralizations, affecting the contentive verbs and creation of neologisms, are recorded in very low frequency or as hapaxes.They make an open group of verbs from different conjugational classes traced in children, which shows that the search for proper conjugation and morphological regularities is applied.Their occurrence stretches towards the latest age levels, probably for as long as the vocabulary grows.

Discussion
The results presented above seem to support the findings from the previous studies and improve our knowledge on overgeneralizations in fusional languages.
The overall rate of verbal overgeneralizations in early Serbian (4.9%) is in accordance with the rates obtained for English and German (cf.Markus et al., 1992 for English andClahsen &Rothweiler, 1993 for German), and slightly higher than the rate of the overregularization errors in Spanish (2%), although we must bear in mind that these figures refer to the instances of overregularization as one type of overgeneralization errors.
With regard to the categories of overgeneralization errors, the following tendencies are observed.
In the domain of inflectional morphology, children tend to regularize irregular forms.There are verbal forms which seem particularly difficult for children, such as hoću 'want' 1.sg.pres., neću 'want' 1.sg.pres.neg. and mogu 'can' 1.sg.pres., given that the vast majority of errors involves these verbs.The most plausible explanation for this finding is the fact that these three verbs bear a different inflection (-u) in the 1.sg.pres. in comparison to all other verbs in Serbian which bear -m.Therefore, the errors represent a tendency to conjugate the verbs by analogy with the other inflected forms.Exactly the same error (*hoćem < hoću) was reported in Croatian (Hržica, 2012;Katičić, 2003).The issues of both overregularization and frequency have already been pointed out in previous research (Ambridge, 2010;Brown, 1973;Clahsen et al., 2002;Hržica, 2012;Katičić, 2003;Kuczaj, 1977;MacWhinney, 1976;MacWhinney, 1993;Marcus et al., 1992;Slobin, 1971Slobin, , 1973;;Perek & Goldberg, 2015).Regarding the frequency, it is noteworthy that hteti, ne hteti and moći are high-token-frequency verbs in Serbian.However, the inflectional ending -u in 1.sg.pres.has been found in these three verbs only, which makes the form ending in -u infrequent and non-productive.
The categories of overgeneralizations we reported herein match the categories of overgeneralizations recorded in early Croatian, at least to a certain extent -in the domain of analogy errors reported in the use of the forms *hoćem < hoću (Anđel et al., 2000;Hržica, 2012;Katičić, 2000), and the stem overgeneralization -the case when a child opts for a wrong stem, e.g.*pis-a-m < piš-e-m (Hržica, 2012).These findings suggest that typologically similar languages exhibit similar patterns of overgeneralizations.However, unlike the studies on Croatian which mainly focused on one type of errors (i.e.overregularization or class shift), the data from the SCECL revealed two other types of overgeneralization errors: the changes in derivation and compounding and neologisms.All types of overgeneralizations deserve a more detailed analysis in the future, especially in relation to the frequency of IN THE SERBIAN CORPUS OF EARLY CHILD LANGUAGE the overgeneralized inflections, thematic vowels, prefixes and suffixes on the verb forms in the input.An additional analysis of stem overgeneralizations (and possibly a re-examination of the verbal classes in Serbian) should be conducted in order to provide the data comparable to the studies conducted on other morphologically rich languages and to test the model of Natural Morphology (Bittner et al., 2003) on Serbian.
It is important to emphasize that our findings differentiate between hteti 'want' and ne hteti 'want.neg.' on one side, and all other overgeneralizations on the other.It seems that hteti 'want' and ne hteti 'want.neg.' are the first target of children's generalization of the grammar perceived in the input, and the first carriers of morphological regularities among Serbian verbs.Moći 'can' and biti 'be' , even though they belong to the same group of highly frequent modal/auxiliaries, are only later affected by overgeneralization, and therefore do not have such a prominent function in the course of language acquisition.On the other hand, from the developmental point of view, they seem to resemble contentive verbs that appear a little bit later in a large number and low frequency and are affected by overgeneralization long after.This layout reveals the linkage between the gradual growth of vocabulary and the acquisition of conjugational rules.
Regarding the age span, the study showed that certain types of overgeneralizations continue to exist even at later ages (at least until the age of 4;0, as covered by the SCECL).Therefore, it is of high importance to include a wide age span when studying this phenomenon.
The analysis of morphological properties of overgeneralizations potentiated the visibility of the following processes in new word formation: a) Classification.Children recognize quite a complex range of Serbian verbal forms and conjugational classes early.This supports the findings on an early sensibility to specific structural properties of morphologically rich languages (Dressler, 1985;2005;Hržica, 2011).b) Regularization.The recognized patterns are used in the production of grammatical forms which causes a regularization of irregular forms.c) Analogy.The main source of overgeneralization is analogy.This supports the previous findings on its role (Albright & Hayes, 2003;Bybee & Moder, 1983;Katičić, 2003;MacWhinney, 1988;1989;1993;Rumelhart & MacClelland, 1986;Skousen, 2001).The newly coined verbs are inevitably guided by the language specific patterns of verb formation which pressure overgeneralizations to arise, and all parts of a verb are potentially affected -the inflectional morpheme, the stem or the thematic vowel.d) Semantic differentiation.If semantic differentiation is needed in a particular context, Serbian children use all morphological tools available in the input -prefixation, suffixation, stem alternation, which provide the basis for the construction of overgeneralizations of any part of a word.e) Lexicalization of unconventional meanings.In communication flow, children need an appropriate word which would best represent their thoughts, and if they do not find the words in the input, they tend to create neologisms on their own, ascribing to them the regular grammatical morphemes.
The results of our analysis reveal the significant role of language input, and are in accordance with the findings of previous research, which show that these processes are universally present in early children's production in different languages.Furthermore, the process of semantic differentiation between verbal expressions seems to be a notable property of the morphologically rich languages such as Serbian and other Slavic languages, where derivation is a productive tool of new word formation.

Conclusion
The exploration of verbs in Serbian child language has revealed several patterns of verbal overgeneralizations that affect both the inflectional and derivational morphology.It has been shown that all parts of a verb could possibly be a subject to overgeneralizations: the stem, prefix, suffix, thematic vowel, inflection.The analysis has revealed the typical forms of Serbian overgeneralizations, represented by the forms *hoćem 'want' 1.sg.pres., *nećem 'want' 1.sg.pres.neg., *možem 'can' 1.sg.pres.and *bidem 'be' 1.sg.pres.(inflectional and stem overgeneralizations). *Hoćem and *nećem have been detected as the developmental prerogatives among others, which bring an installation of overgeneralization as a resource for constructing the language specific verbal morphology.
A wide variety of facets has been found in the low frequent and hapax overgeneralizations recorded primarily in the sample of contentive verbs.Even though solitary and unrepeatable, their morphological properties exhibit the constraints of word formation in the Serbian language.Innovative solutions chosen by a child display both the regularities known to him/her and the features of grammar that are still missing in the system at the moment of the word composition.Since their occurrence is prolonged until the oldest age levels in the sample, the low frequent and hapax overgeneralizations reveal their significance in the growth of the vocabulary and adjustment of the new lexical items to the conjugational classes of the mother tongue.
The findings show that, although a statistically rare phenomenon, verbal overgeneralizations seem to be highly representative for the process of IN THE SERBIAN CORPUS OF EARLY CHILD LANGUAGE Serbian language acquisition since they announce the onset of inflectional and derivational morphology in child language.Therefore, in monitoring the individual language development, overgeneralizations should not be treated as errors in production, but rather as typical and most creative parts of children's language construction.

Table 2 .
Indications of language development in individual children
IN THE SERBIAN CORPUS OF EARLY CHILD LANGUAGE

Table 4 .
Number and frequency of overgeneralizations