MAXIMAL VOWEL SPACE METOD IN ANALYSIS OF VOWELS IN PRELINGUAL SPEECH PHASE

e main problems in the analysis of vowels which occur in prelingual speech phase are centralization of utterance and unknown dimension of vocal tract. Most researches in this eld are based on the analysis of maximal vowel space (MVS) because discrimination of vowels is very dicult in this early period. MVS analysis includes the estimation of vocal tract (VT) physical dimensions. e aim of this research was to estimate and dene changes in vowel pronunciation during prelingual speech phase. e analysis and voice recording were performed in a two month old child until he turned one. e recording was performed in 42 sessions, on average 4 sessions every month. Sound segments that look like vowel pronunciation were extracted from the recordings and were used for the formant frequencies estimation by PRAAT soware. e Burg method was used for formant frequency estimation. Research results showed that MVS can be used in diagnostic procedure from a child’s earliest age. MVS analysis is appropriate for a child’s earliest age as a child needs to pronounce individual phonemes, and does not need to respond to speech stimuli. ese results need to be conrmed on a larger sample when extended analysis should dene criteria for discrimination of typical and atypical formant frequencies.


INTRODUCTION
Intact organic basis for language development is necessary for a child during the period of language acquisition.at refers to intact auditory perception and discrimination, visual perception, motor control of speech organs, normal intelligence, and ability to focus and maintain attention, as well as intact social interaction and stimulation.Language is acquired by stimulation imitation and practice.A child acquires basic elements of language depending on its physical and psychical abilities and needs.
Between six and nine month of age, in the prelingual period, cackles become voices.ey appear in syllables groups and they are more similar to the voices of adults.However, they do not have phoneme function; they represent conditional audio-articulation practice.When a child uses the rst word, these voices become mother tongue voices -phonemes that are basis for the words with meaning (Kostić et al., 1995).
A child rst acquires vowels, stop consonants and nasal voices.Vowels (/a/, /e/, /i/, /o/ and /u/) of a small child stay unde ned for a long time.Open and then closed vowels are rst central vowels developed.Qualities of voices are constantly changing during development.Sovilj (2002) gave very precise explanation of phases in the prelingual period as a result of research in our speaking countries.e rst phase of cuing appears during the second half of a child's rst month.When the physiological needs are satis ed, the child is in the homeostasis and it starts to express satisfaction with singing nuclei of the future phonemes -vowels /a/, /e/, /u/ (at the beginning they are very open), /a/ middle, /e/ front /u/ back vowel and regarding tongue position /a/ is low /e/ is middle and /u/ is high.e type of vowels and order of their appearance when compared with tongue movements show that these movements are based on the act of swallowing, suckling, opening and closing the mouth.e second phase of cuing at the beginning of the second month is vocalization of certain vowels.It proceed with continuous production of neutral voice, as well as plenty variable phonics (without articulatory movement -screaming and shrieking), with high intensity and frequency of vocal vibration.In this phase, cuing is an expression of children's feelings, emotions and need for communication.It can be accepted as a medium for communication with other people.e third phase is at the beginning of the third month.During interaction with the environment a child develops more precise forms of vowels /a/, /e/ and /u/ using resonant area.e characteristics of cuing in this period are: appearance of diphthongs, shortening of vowels duration in sequence, their linkage in a certain way -three di erent vowels, where the rst and the second are short, and third is long /e-a-u/; -three vowels, where the rst and the third are the same with the di erence that the rst and the second are short and the third is long /e-a-e/.
It is interesting to perceive the possibility of the earliest phase when atypical speech can be detected.In that phase there is not classical speech or communication.A child reacts to sound, visual and other stimuli from the birth, which can be one form of communication, but not in a classic way. is period might be de ned as "prelingual".Experience of working with children at the youngest age (6-7 months), as well as the use of diagnostic procedures with a goal to monitor babies from normal and risk pregnancies (Dobrijević, 2013), have shown that atypical speech and language development can be detected in the early developmental period.During prelingual period a child pronounces di erent sounds that seem like vowels.Discrimination of voices which are pronounced is unreliable and therefore should be avoided (Schwartz et al., 2005).It is better to analyze formant frequency changing area of voice sounds instead which should de ne variation eld of geometrical VT shape, and thus the degree of speech apparatus movement.erefore, the analysis of formant frequency changing area is performed, and not the qualitative analysis of pronounced vowels.is formant frequency changing volume shows changing of VT shape volume, and thus the volume of speech apparatus movement.
Maximal vowel space (MVS) is a method of global voice analysis known from before (Boë et al., 1989;Ménard et al., 2001).e rst three formant frequencies are usually analyzed via F 1 -F 2 and F 2 -F 3 charts.Perceptual vowels discrimination is not primary in MVS analysis, only their formant frequencies.It is possible to obtain estimated MVS of person's real speech.On the other hand, if VT geometrical shape is known, then his acoustical model can be made, simulating sound spreading through that acoustical structure, and at the end, getting simulated (theoretical) MVS.Comparing of these two MVS can be useful for the discrimination of atypical speech during the development of speech mechanism.is is the basic idea of this paper: to perceive the possibility of using MVS as diagnostic method during prelingual stage of a child's development.
e rst problem is insu cient amount of information about anatomical and morphological VT structure at one year old children.Classical methods for generating VT acoustical model (Fant, 1970;Flanagan, 1972) include VT X-ray imaging during the pronunciation of steady vowels.Recently, X-ray imaging has been replaced with MRI (Story et al., 1996;Soquet et al., 2002).ese methods are not appropriate for a one year old child, so another one must be found for estimating VT shape.e second problem is the MVS simulation based on known VT acoustical model itself.MVS means resonant frequencies estimation of all possible VT con gurations.If accrue acoustical models of VT are used, then the number of possible con guration is enormous and all of them cannot be treated.A solution for this problem is to choose some part of possible con guration which is going to be representative for all the others.Con gurations that are not achievable need to be rejected from the procedure of MVS estimation.
One of the ideas is to transform VT shape of an adult to a one year old child, considering anatomical, morphologic, articulation and other di erences between an adult and a child.In engineering practice, theory of analogy is used in modeling pronunciation of a certain phoneme.Acoustic model of VT is converting into an equivalent electrical model.Further analyses are performed using standard methods of the theory of electrical circuits (Fant, 1970;Flanagan, 1972).
Acoustic model of VT is in the form of short cylindrical tubes (cascade connected), with de ned cross-sectional area.
erefore, the only parameter that must be known in order to realize the modeling is the VT cross-sectional area in the function of distance from the glottis.Figure 4 shows acoustical and equivalent electrical four-tuber VT model.General problems of generating VT acoustical model of a one year old child are consider rst.One procedure for estimating VT shape is proposed.As a nal result acoustical model is shown, thus dependence of the VT cross-section in the function of distance from the glottis is given.
e problem of MVS simulation for a grown male is done rst, and a er this process is elaborated and de ned, it was applied to a one year old child.At the end, the real MVS was estimated for one child, whose voice was being recorded from two to twelve months of age. is paper considers in detail the emergence and establishment of vowels during prelingual period as a hallmark of speech and language development.

Participants
One infant and his mother participated voluntarily in the recording.ey are Serbian.e child is a boy and he was born and raised in Belgrade, Serbia.
Procedure e child's pronunciation was recorded in the participant's home, i.e. in the living room without any kind of sound absorption treatment.H4n Handy Recorder ZOOM Corporation was used with built in stereo capacitor microphones arranged in an XY pickup pattern.e recorder was held by a mother or placed in a microphone stand during recording.Because of the child's movement, the distance between the recorder and the speaker varied.e recording was done during daily life activities, with no particular tasks, during the ten month period (from November 2011. to August 2012).e recording was done in 42 sessions, 4 recordings each month on average.e child's voice was recorded in a stereo le with sampling frequency of 44100 Hz.No forms of speech signal improvements were used, such as automatic volume control, noise reduction, intelligibility enhancement etc.
In the pre-processing phase, all signals were cleaned from unwanted noises, converted to mono type signal and re-sampled to the frequency of 22050 Hz.In the next phase, recordings were extracted where the child pronounces vowels, or voices that sound like vowels.For these extracted segments formant frequencies were estimated with PRAAT so ware (Boersma, Weenink, 1992-2005).Burg method was used for formant frequency estimation (Anderson, 1978). is method of formant frequencies estimation is a classical procedure in speech signal analysis.Accuracy of formant frequency estimation depends on many factors and it is not always ideal.Because of that, a er so ware estimation, crosscheck with wideband spectrogram was done in order to obtain the highest possible con dence in the estimation of formant frequencies.All estimated formant frequencies that did not match wideband spectrogram and that were in the wrong order were rejected.is additional correction of formant frequency estimation needed to be done, because estimation error increases if fundamental frequency is high (van der Stelt et al., 2005).

Vocal tract shape in a one year old child
If we want to estimate the VT shape of children based on data for adults, we should keep in mind the following facts: the length of children VT is smaller, the cross-sectional areas of VT are lower in children, articulation of children and adults di ers, and VT morphology of adults and children is di erent.
According to the literature (Ménard et al., 2007), the average length of VT is: 7.1 cm for a newborn, 10.5 cm for a four year old child, 16 cm for an adult female and 17.3 cm for an adult male.VT length for a one year old child is 8 cm, but it is suitable to take the half of the VT length of an adult.Smaller VT in children implies a smaller length and lower volume, i.e. smaller cross-sectional area.Comparing to an adult, VT cross-sectional areas of a one year old child are four times smaller.A linear scale of adults' VT cannot approximate the children's VT, because there are important di erences in the shape (di erent morphological structure).It is well known (Goldstein, 1980) that the length ratio of the pharyngeal and oral cavity (LHI -Larynx Height Index) di ers in children and adults.is ratio is 0.5 in a newborn and 1.1 for adult men. is means that during a child's growth, the pharyngeal cavity increases more than oral cavity.is fact means that the shape of VT in adults has to be "compressed" in the region of the pharyngeal cavity, if one wants to estimate the shape of VT in a child.Pronunciation of vowels in a one year old child is signi cantly centralized and very similar in perceptual domain.Discrimination of spoken vowels is quite complicated (Schwartz et al., 2005).In terms of physical VT shape, this means that the dynamic of cross-section area changes will be smaller and the shape of VT will look like a uniform tube.
All of these listed factors must be taken into account when modeling VT in children in order to obtain su cient precision of formant frequencies variation space.e signi cance of every single factor is analyzed through formant frequency changing.VT shapes simulation of an adult has been done to show the degree of formant frequencies changing.
e whole analysis has been done for an adult.
During the stimulation, we used the VT model with losses in which the impedances of the VT wall, glottis and sub glottis system are in nite.Radiation impedance is approximated by radiation circular piston set in a spherical ba e (Vojnović et al., 2005).Starting from Fant's vowels (Fant, 1970), the formant frequencies are estimated for the following cases: -Unchanged length of VT, than VT length was reduced by 12%, 25%, 37.5% and 50%.
Formant frequencies were calculated by program FFOR (Vojnović, 2008), which is based on algorithms given in (Badin et al., 1984). is model of VT was used in all subsequent simulations.Formant frequencies do not vary signi cantly from the VT volume, but only in its shape (Vojnović, 2013a).However, insigni cant in uence of VT volume on formant frequencies should not be neglected, because there are some other problems related to MVS simulation.
In children under the age of one the shape of VT is like a uniform tube and the mobility of articulation organs is limited.Because of that, there are no signi cant di erences in the VT shape when a child pronounces di erent vowels.In the domain of VT physical dimensions, the range of cross-sectional area change is smaller in children.
e "grade" of articulation is simulated by the gradually changed VT shape to a uniform tube; transforming VT to a uniform cylindrical tube of the same length.is transformation leads to drastic changes in formant frequencies.e rst three formant frequencies gravitate to the following frequencies: 480, 1440 and 2400 Hz.
ese frequencies correspond to quarter wave resonances for tube length of 17.5 cm. e mean VT length for ve Russian vowels is 17.6 cm.
e percentage changes are more signi cant at lower formant frequencies (the rst two formants) than the higher formants (Vojnović, 2013a).
In the simulation of di erent articulations of an adult man and a child we must not forget protruding lips.In practice this means that, together with a decrease in cross-sectional area perturbation (shallow articulation), there is a smaller range of VT length (less lips protruding).Finally, a simulation of di erent larynx height index (LHI) (Goldstein, 1980) is done.e process of converting VT with di erent LHI is illustrated in Figure 1. e rst diagram shows the initial (unchanged) VT shape during the pronunciation of vowels /a/ (Fant, 1970).On the second diagram, the lengths of all cylindrical segments in the range from 2 cm to 10 cm are increased 0.7 times, and the length of remaining cylindrical segment is increased 1.4 times.e lengths of the rst four cylindrical segments (range from 0 cm to 2 cm) are unchanged (Vojnović, 2013a).All analyzed parameters shows signi cant in uence on formant frequencies and they must be considered in the estimation of VT geometrical shape in a one year old child.In order to better simulate the centralized vowel pronunciation in children, the cross-sectional area was increased by 20% in cases where the surface was less than some referent (mean) value and vice versa.Mean cross-sectional area in an adult male is 5 cm 2 and 1.25 cm 2 (5/4 = 1.25) in a one year old child.With this correction of cross-sectional area, the shape of VT is "smoother", i.e. it has become more like a uniform cylindrical tube.
With regard to di erent articulations in adults and one year old children, a correction of the VT length was made in the sense of simulating less ability of protruding lips in children.e average length of an adult male VT is about 17.5 cm.According to the criteria adopted in this paper, the average length of a one year old child VT is 8.75 cm (17.5/2=8.75).Limited lips protruding, in some way, involve equalizing the VT length in the case of vowel pronunciation.We use the following principles of equalization of VT length: If the length of VT, during the pronunciation of a vowel, is greater than 8.75 cm, then the VT length is reduced.On the other hand, if the length of VT is smaller than 8.75 cm, then its length is increased.
Figure 2 shows estimated VT shapes of a one year old child (solid line), for ve vowels pronunciation.VT shapes for an adult male are presented with thin dashed lines, but with scaled length (reduced twice) and the crosssectional area (reduced four times).is scaling is done in order to compare VT shapes easily.As it can be seen, there are di erences caused by di erent LHI and di erences in articulation (less range of changes the cross-sectional area and the total length of VT in the case of a one year old child).e shape of VT for a newborn (Goldstein, 1980) is slightly di erent from the results presented in Figure 2. In (Goldstein, 1980) only three VT con gurations are presented: for vowels /i/, /a/ and /u/. Figure 3 shows VT con guration for these vowels together with VT con gurations estimated in Figure 2. e biggest di erences between the estimated VT shapes are the values of cross-sectional area.ey are considerably higher in estimated con gurations of (Goldstein, 1980).ere are signi cant di erences in the VT length, for example vowel /i/.In principle, in (Goldstein, 1980) VT is shorter because it is a newborn VT model.ere is a big di erence in crosssectional area of oral cavity, even three times greater for vowel /a/.Vowel /u/ also shows di erences in cross-sectional area of oral cavity.Crosssectional area of the pharynx cavity is di erent, according to (Goldstein, 1980) it is two times greater, for vowel /i/.It is clear that VT con gurations from Figure 2 are hypothetical and can be changed.ey are introduced so that MVS can be simulated.Based on the similarity of simulated and real MVS, validity of hypothetical con gurations should be estimated.If these two areas of formant frequencies are overlapping, that is enough for accepting the proposed con gurations.On the other hand, it is possible to correct them so that these two MVS can be complied.In order to de ne the borders of vowel formant frequencies in real speech, the rst step is to do MVS estimation based on simulation of vowel pronunciation. is requires knowing the shape of the VT and its acoustical model.It is the proposed process for the VT in a one year old child, and Figure 2 shows VT con gurations for pronunciation of ve vowels.e problem of MVS simulation will be considered on a case of an adult male.ere are an enormous number of con gurations, so the strategy for selection must be de ned.VT con gurations for an adult male during the pronunciations of Russian vowels are well known (Fant, 1970).It is easier to verify simulation procedure for this particular case.When this procedure is veri ed for an adult male, it is applied to the case of a one year old child, thus the case con gurations from the Figure 2.
If we assume that the average length of an adult male VT is 17.5 cm, this means that his VT can be modelled with 35 cylindrical tubes with cross-sectional area in the range from 0.16 to 16 cm 2 .According to (Fant, 2004), it should take the discrete logarithm distribution of cross-sectional area, so that this range has 16 di erent values of the cross-sectional areas.
is means that we should analyze about 16 35 »1.4´10 42 di erent acoustic con gurations, which is, of course, impossible.From this huge number of possible VT con gurations we should choose a small part, while at the same time it remains representative of the entire set. is is the main problem in estimating the MVS.

MVS estimation of an adult male
Before we get to the problems of MVS estimating for accurate VT models, a four-tube VT model (Fant, 1970;Flanagan, 1972) will be considered.is is an acoustic VT model presented on the le side of Figure 4. e acoustic model is completely determined by length (l) and cross-sectional area (A) of every cylindrical segment particular.Each of these cylindrical segments can be transformed into an electrical model by symmetrical T-network.T-network impedances (Z a and Z b ) are de ned by VT physical dimensions: cylindrical segments lengths and cross-sectional areas.In this model (Figure 4), the rst tube simulates mouth opening, the second tube mouth cavity, the third tube represents narrowing the "tongue-palate" and the fourth tube simulates pharyngeal cavity.If we assume that, the total length of VT is 17.5 cm and the lengths of four cylindrical segments are equal, then each tube shall be the length of 4.375 cm.For 16 di erent values of the cross-sectional areas, the total number of possible combinations is 16 4 = 65536.is is not a huge number of con gurations; therefore all of them can be used in the MVS simulation process.A more detailed analysis can be found in Vojnović (2013b).
On the example of a four-tube VT model, it is shown that at MVS simulation: -VT model should incorrporate mouth opening radiation impedance, -Simulation should be done for three di erent VT lengths minimum, which correspond to the vowels: /u/, /a/ and /i/.
-It should use more precise VT modeling (with more cylindrical segments, 5 mm lenght) and -Cross-sectional area values should be chosen by logarithmic distribution.
A more accurate VT model implies shorter cylindrical segments, the length not exceeding 5 mm.e number of con gurations for this VT model is huge so the some kind of random sampling method has to be used (Monte Carlo method).e results of the rst examples of MVS estimation for a more accurate VT model are shown in Figure 5. VT shapes were chosen randomly and there are 3´50,000 in total.e three groups of estimated resonant frequencies are presented that correspond to three VT lengths: 16.5, 19.5 and 17 cm.ese lengths correspond to the pronunciation of Russian vowels: /i/, /a/ and /u/, respectively.ree spaces of resonance frequencies are marked with di erent shades of grey: -Light grey -VT length of 16.5 cm (vowel /i/), -Dark grey -VT length of 17 cm (vowel /a/) and -Black -VT length of 19.5 cm (vowel /u/).ere is a clear distinction between the spaces of the resonant frequencies for VT length of 19.5 cm (marked with black dots) and the length of 17 cm (dark grey marked points).e space of resonant frequencies for 16.5 cm VT length (light grey marked points) and 17 cm is very similar so it can be seen as a clear boundary between them.It is more important to say that vowels /e/ and /i/ are not covered with estimated spaces.Vowels /a/ and /o/ are on the very border.Figure 5 shows the reason for doing MVS estimation for an adult rst.It can be seen that choosing VT con guration roughly is not adequate.Real formant frequencies (white squares) are not included in MVS, and they should have been.e conclusion is that VT con gurations that were picked are not representative.
Due to the huge number of possible VT con gurations and their random sampling, the analysis does not include the real (or close to them) VT con gurations that correspond to the pronunciation of the back and middle vowels.It just shows that estimation should be adapted to the real situation, i.e. allow sampling of only those con gurations that are possible in real speech.
A er series of analyses (Vojnović, 2013b) all of the ve vowels are in the simulated space.Only the third formants of the vowels /u/ and /i/ are not "deep" in the simulated space.e other formant frequencies of the Russian vowels are "deep" in the simulated spaces.Figure 6 shows the results that coincide with (Carré, 2009) although the resonant frequencies area was analyzed and simulated with a di erent method.

MVS estimation of a one year old child
Previously shown MVS stimulation procedures were applied to the case of a one year old child whose VT con gurations are shown in Figure 2.
e results of this simulation are shown in Figure 7.As it can be seen, formant frequencies are within the simulated space.erefore, there is good compatibility between suggested VT shape of a one year old child (Figure 2) and simulated MVS (Figure 7). is compatibility implies that this sampling strategy of VT con gurations is good.at still does not con rm that the proposed VT shapes (Figure 2) are good and match the real situation.e research presented in (Goldstein, 1980) gives a somewhat di erent VT con guration during pronunciation of three vowels: /u/, /a/ and /i/.e VT con gurations in Figure 3 are related to the infant VT.
e biggest di erence between VT con gurations in Figure 2 and Figure 3 refers to cross-sectional area.In some regions of VT, the cross-sectional area is about two to three times higher in (Goldstein, 1980), although they relate to an infant.Larger changes in a newborn VT cross-sectional area caused a signi cantly higher MVS (Figure 8).
First frequency formant changing area is greater in (Goldstein, 1980) for 450 Hz. e second and third formant have greater changing, about 950 Hz -1100 Hz, respectively.In percentage, that is 37% for the rst formant, 28% for the second and 21% for the third formant.Roughly speaking, MVS area of the rst and second formant is greater for 75% in (Goldstein, 1980).For the second and third formant MVS area is 50% greater.ese numbers con rms signi cant di erences in the MVS estimation when VT con gurations are used from Figure 3. Real con rmation of the validity of hypothetical VT models can be veri ed by analyzing the voice of a one-year-old child and the MVS estimation based on these data.

RESULTS AND DISCUSSION
Simulated MVS from Figure 7 is strongly linked with VT con gurations in Figure 2. ese results can be used for program estimation of formant frequencies.If the vowels are pronounced by a one year old child than F1 is in the range from 400 to 1600 Hz, F2 is in the range from 1350 to 4000 Hz and F3 is in the range from 2500 to 8000 Hz.Mean values of the rst, second and third formant in estimated MVS are: 1000, 3175 and 5250 Hz respectively.For the VT length 8.25 cm and sound speed 35300 cm/s, the rst four quarter wave resonant frequencies are: 1070, 3210, 5350 and 7490 Hz, what is relatively close to the estimated MVS.If it is necessary to analyze the rst three formant frequencies of vowels that a one year old child pronounced, then the upper limit frequency needs to be set up on 6500 Hz.
Figure 9 shows the results of formant frequencies estimation for real speech of a one year old child.Every single chart represents theoretical and estimated (real) MVS of the child between two and twelve months of age.Gray dots represent space of simulated resonant frequencies, and black dots represent estimated resonant frequencies of the child's speech recordings.
It should be noted that all charts are very similar, except the chart for the seventh month. is chart has small number of dots (recordings are short), which can be the reason for the di erences.Chart similarity in Figure 9 shows that articulation capability of a child is very big and that it does not change much during the rst year of a child's life.Matching is good for the rst formant, slightly lower for the second formant and the weakest for third formant.eoretical and experimentally obtained MVS are well matched in general. is fact indicates that estimated VT shapes (Figure 2) are very similar to real VT shapes of a one year old child.
In some charts (fourth and eight month) slightly wider rst formant frequency space can be seen, comparing with simulated (theoretical) space.
e reason for this is recording of the child with higher fundamental frequency (F 0 ): squeal, scream, and cry.It is known that in these cases reliability of formant frequency estimation decreases (van der Stelt et al., 2005).Partial discrepancy of the second and third formant frequency space should be analyzed in more detail.at analysis should de ne is the real MVS more narrow than simulated MVS, or the child at that age hardly or rarely pronounce front vowels: /e/ and /i/.

Figure 1 -
Figure 1 -Dependence of the VT cross-section area in function against distance from glottis for two larynx height index values (pronunciation of Russian vowel /a/) e second formants for the middle and back vowels (/a/, /o/ and /u/) have the biggest changes in the simulation of di erent LHI values.e changes of these formant frequencies are about 10%.Accordingly, this parameter has signi cant in uence on vowel formant frequencies and it should be incorporated in the process of generating VT acoustical model of a child.

Figure 2 -
Figure 2 -Estimated VT shapes of one-year-old child in the case of vowels pronunciation (thick line) and the scaled VT shapes of adult male (thin dashed line)

Figure 3 -
Figure 3 -Estimated VT shapes in a one year old child (solid line) compared with data from the results (Goldstain, 1980.)(thin dashed line)

Figure 4 -
Figure 4 -Acoustic and electric four-tube VT model

Figure 6 -
Figure 6 -Resonant frequencies modelled with four cylindrical tubes 5 mm length with disabled rapid change in the values of the cross-sectional areas

Figure 7 -Figure 8 -
Figure 7 -Estimated MVS of a one-year-old child

Figure 9 -
Figure 9 -Simulated MVS (gray dots) and estimated MVS (black dots) for a child from two to twelve months of age Formant frequencies of vowels have a tendency to decrease in time.As a child gets older, the VT length gets longer and consequently formant frequencies decrease.