Vowel height and velum position in German: Insights from a real-time magnetic resonance imaging study

: Velum position was analysed as a function of vowel height in German tense and lax vowels preceding a nasal or oral consonant. Findings from previous research suggest an interdependence between vowel height and the degree of velum lowering, with a higher velum during high vowels and a more lowered velum during low vowels. In the current study, data were presented from 33 native speakers of Standard German who were measured via non-invasive high quality real-time magnetic resonance imaging. The focus was on exploring the spatiotemporal extent of velum lowering in tense and lax /a, i, o, ø/, which was done by analysing velum movement trajectories over the course of VN and VC sequences in CVNV and CVCV sequences by means of functional principal component analysis. Analyses focused on the impact of the vowel category and vowel tenseness. Data indicated that not only the position of the velum was affected by these factors but also the timing of velum closure. Moreover, it is argued that the effect of vowel height was to be better interpreted in terms of the physiological constriction location of vowels, i


I. INTRODUCTION
This study is concerned with the temporal and spatial extent of velum movement patterns during German tense and lax vowels in nasal and oral contexts.It has been frequently reported that low vowels in nasal and sometimes also the oral environment are produced with a lower soft palate than non-low vowels (Amelot and Rossato, 2007;Bell-Berti, 1973, 1976;Kuehn, 1976;Moll and Shriner, 1967, among others).In addition, low vowels are found to require a larger amount of velopharyngeal port (VP) opening than non-low vowels to be perceived as nasalised (Abramson et al., 1981;House and Stevens, 1956;Maeda, 1993, although the opposite finding is also attested, e.g., Ali et al., 1971;Lintz and Sherman, 1961).The evidence that vowel height and velum position interact with each other has served as an exploratory approach for the process of the development of constrastive nasal vowels, in which low vowels are often affected first by contrastive nasality before mid or high vowels (Chen, 1972;Ruhlen, 1973;Schourup, 1973).
The relation between velum height and vowel height is commonly explained both by anatomical constraints and perceptual factors.First, the process of lowering and raising the soft palate during speech production involves several velopharyngeal muscles of which the levator palatini takes a primary role by showing high activity during VP closure (Bell-Berti, 1973, 1976;Lubker, 1968).Correspondingly, velum lowering is accompanied by levator relaxation (Bell-Berti, 1973;Lubker, 1968;Lubker et al., 1970), although for some speakers, the velum position during nasal stops can be different from rest position (Moll and Shriner, 1967).During vowels, levator activity is more decreased for low vowels compared to higher vowels (Bell-Berti, 1973, 1976;Lubker, 1968; but see Seaver and Kuehn, 1980, for different results) commonly resulting in a lower velum position (Lubker, 1968;Moll and Shriner, 1967).In addition to the differences in velum position relative to vowel height, the lowering gesture may be initiated at varying time points during vowels of different height, such that pre-nasal low vowels are temporally nasalised to a greater extent than high vowels (Clumeck, 1976).Compatibly, during post-nasal vowels, levator activity for raising the soft palate occurs later when the vowel is low (Bell-Berti, 1973).For front vs back vowels, no systematic lowering patterns have been reported so far (Clumeck, 1976;Lubker, 1968), although varying activities of the velopharyngeal muscles are observed for different nasal vowels (Dixit et al., 1987).
In addition to the levator palatini, the palatoglossus muscle is likely to contribute to the specific velum position observed for different vowels.As this muscle establishes a connection between the soft palate and the lateral rims of the tongue, it is considered to induce some pull-down effect on the soft palate when the tongue is in a low position, especially if levator activity is decreased in the production of nasals (Bell-Berti, 1993;Lubker et al., 1970;Moll and Shriner, 1967).Moreover, the palatoglossus is involved in raising and retracting the posterior part of the tongue body during articulation (Bell-Berti, 1973;Dixit et al., 1987;Moon et al., 1994).Higher palatoglossus electromyographic (EMG) potentials are found for central and back vowels, particularly for /a/ (Bell-Berti, 1973;Dixit et al., 1987).However, studies on palatoglossus function during speech show a wide variability of muscle activity patterns, with some speakers using the palatoglossus only for velar gestures independent of nasality (Bell-Berti, 1973, 1976) or during vegetative occasions, while others show activity during nasal stops, indicating active usage to lower the velum (Lubker et al., 1970).Moreover, the anatomical conditions of this muscle are not consistent across speakers.Kuehn and Azzam (1978) find some overall consistency of the termination of palatoglossus in the tongue rims but clearly more variation in its attachments, which sometimes are located closer to the uvula and sometimes closer to the hard palate.The authors suggest that the point of attachment may directly affect the interplay between tongue raising and velum lowering at least during the act of swallowing.
Acoustic and perceptual factors may also account for the different degrees of velum lowering during vowels.It has been frequently reported that synthesised low vowels tolerate more VP opening before they are perceived as nasalised, while for high vowels only a small amount of VP opening is required (House and Stevens, 1956;Lubker, 1968;Maeda, 1993).In general, nasal and nasalised vowels exhibit a complex acoustic spectrum consisting of both oral and nasal formants from the coupled oro-nasopharyngeal cavities.For example, while the F1 amplitude is generally lowered, its bandwidth is increased (Delvaux et al., 2002;Fujimura and Lindqvist, 1971;House and Stevens, 1956).For French, a decrease in F1 in low vowels has been observed (House, 1957;Serrurier and Badin, 2008) as well as an increase in F1 in high vowels (Delvaux et al., 2002;Fujimura and Lindqvist, 1971) and a decrease in F2 in nonback vowels (Delvaux et al., 2002).It should be taken into consideration, however, that phonemically nasal vs oral vowels are commonly produced with different tongue configurations (Carignan, 2014;Carignan et al., 2013;Shosted et al., 2012), which may also contribute to the specific patterns observed.Although there is wide variability across languages regarding the frequency shifts of F1 and F2 due to nasal coupling, the manipulation of F1 is likely to play a major role, as the general effect of nasalisation is a perceptual compression of vowel height (cf.Beddor et al., 1986, pp. 198-204, providing evidence that nasal vowels are commonly more centralised than their oral counterparts).Moreover, the shifts induced by nasal coupling have different impacts on the frequency spectra of the specific vowels, such that high vowels are affected with minimal nasal coupling, whereas low vowels tolerate much more nasal coupling before significant changes in the spectrum become apparent (House and Stevens, 1956;Maeda, 1993).The perceptual account of the relation between velum position and vowel height presumes that speakers plan speech production based on a knowledge of the acoustic consequences of velum lowering and that they are capable of controlling the soft palate gestures, such that higher vowels are produced with a higher velum to prevent nasal coupling which otherwise would distort the acoustic characteristics of the specific vowel (Bell-Berti, 1993).However, there are further factors that need to be taken into account.As pointed out by Hajek and Maeda (2000), experiments suggesting high vowels to be more readily perceived as nasalised were usually run with synthesised stimuli.Where natural stimuli were used, studies found the low vowel to preferentially elicit a nasal percept (Ali et al., 1971;Lintz and Sherman, 1961).The two conflicting outcomes are explained by the difference in the control of the nature of the vowels that were used in these studies (Beddor, 1993;Hajek and Maeda, 2000).While natural low vowels-unlike high vowels in the same contexts-are likely to be produced with an inherently lower velum, which may be perceived as nasalised, synthetic vowels are controlled for VP opening, such that at a given degree of nasal coupling, the low-frequency prominence is more affected in high vowels than in low vowels (cf.Beddor, 1993, p. 178).Hajek and Maeda (2000) propose that the conflict between the perceptual findings from experiments with synthetic vs natural stimuli may be better resolved by considering the duration of the vowel, which is typically not controlled for in natural stimuli but is for synthetic items.That vowel length is related to the degree of perceived nasalisation has been shown in previous research (Delattre and Monnot, 1968;Hajek and Watson, 1998;Whalen and Beddor, 1989), suggesting that an increased vowel duration induces an increase in perceived nasalisation.Since across languages, low vowels are typically longer than mid-high and high vowels (Clumeck, 1976;Laver, 1994), this may be the primary reason why low vowels are more readily perceived as nasalised (Hajek and Maeda, 2000, p. 11).
The experiments on velum behaviour during vowels in nasal and oral environments provide valuable findings on the basic mechanisms of velum function during fluent speech and offer considerations as to perceptual reasons for the observed patterns.However, more data are required to obtain insights into this field for several reasons.First, most of the studies on velum behavior involve only a handful of participants who in addition are often reported to show variations in their velum lowering patterns, which raises the question of how well the findings are applicable to a larger group of speakers.Second, the bulk of the data analysed has been obtained from a limited number of languages, with the focus on American English and French.Much less is known about velum movement patterns in other languages, especially in those lacking extensive coarticulatory nasalisation (some research has been provided on Spanish, see e.g., Sol e, 1992).Third, many of the measurement techniques used for the direct observation of velum movements are highly invasive and uncomfortable for the participant, such as EMG and electromagnetic articulography (EMA) measurements or experiments using fiberoptic devices (Amelot and Rossato, 2006;Bell-Berti and Hirose, 1975;Matsuya et al., 1974), which have the potential to constrict velar movement or to impede natural breathing.
The present study was designed taking into account the following factors.First, it provides data on velum behaviour during tense and lax vowels in fluent speech that were acquired via the non-invasive imaging method of high quality real-time magnetic resonance imaging (MRI), which allows for unrestricted gestural interactions between the lips, tongue, and soft palate.It should be noted, of course, that this method requires the participants to produce speech in a supine position, a body posture that may affect articulatory movements (Kitamura et al., 2005;Stone et al., 2007).However, while slight effects have been commonly recognised for tongue movements (e.g., Badin et al., 2002), only minimal variations in velopharyngeal structures were observed in upright vs supine position (Perry, 2011) and vowel-specific differences in velum height were found to be predominantly preserved (Whalen, 1990).More significant is the finding that the use of long sustained sounds in earlier investigations is much more likely to lead to atypical articulatory configurations (Engwall, 2006), an issue that can be avoided with real-time MRI.Moreover, running speech has been found to be much less prone to gravity effects than sustained sounds (Tiede et al., 2000).Second, the articulatory data were obtained from 33 (originally recorded: 36) native speakers of Standard German.The relatively high number of participants increases statistical power and helps to obtain insights across a larger group of speakers.Third, as the focus is on Standard German, this study provides findings about velum behaviour in a language that exhibits no nasal vowels or strong coarticulatory vowel nasalisation, which allows the articulatory mechanisms to be tested that are fundamental for the production of extensively nasalised or even contrastively nasalised vowels.Our study extends previous findings by focusing on German tense and lax vowels /a+, a, i+, I, o+, O, ø+, oe/ that precede either a nasal or oral consonant in CVNV and CVCV sequences.Data are analysed with respect to the spatiotemporal extent of velum lowering in both consonantal contexts.Following previous research, we expect to find pre-nasal vowels to exhibit significantly more anticipatory velum lowering than during vowels in CVCV contexts.We also expect /a/ to be accompanied by a significantly lower soft palate relative to the other vowels tested in both consonantal environments.Furthermore, we discuss whether differences in velum movement patterns for tense and lax vowels are more related to phonetic vowel height or can be better described in terms of the corresponding physiological tongue position.Phonetic vowel height basically refers to an auditory quality that is defined in acoustic terms (Jones, 1962;Ladefoged, 1971).As reference points, the highest most front cardinal vowel /i/ and the lowest most back cardinal vowel /A/ are articulatorily specified, while the other cardinal vowels are determined in auditorily/ acoustically equidistant steps relative to the two extremes.
Thus, phonetic height-and consequently the height and backness continua that are used in phonological modelsrefer to acoustic divisions along the axes of F1 and F2 rather than to the physiological tongue position.In the current study, we consider how velum position is affected by the vowel category on the one hand but also how it is related to the actual corresponding tongue position in these vowels.Moreover, the effects of vowel duration on velum position are considered in more detail in the concluding discussion.Findings may help to contextualise the broad evidence that low vowels are often the first vowels to be affected in a sound change in which coarticulatorily nasalised vowels become contrastive nasal vowels over time.

A. Speakers
Data were originally acquired from 36 monolingual native speakers of Standard German (22 female) who were aged between 19 and 35 years (mean age ¼ 24.36 years, standard deviation, SD ¼ 4.22) and recruited from the University of G€ ottingen, Lower Saxony, Germany.Detailed demographic information is included as supplementary materials. 1 To ensure comparability with respect to the pronunciation patterns, participants were speakers of Standard German sometimes with slight regional characteristics.Three speakers had grown up in the more southern part of Germany, but all participants showed clear pronunciation distinctions between lax vs tense vowels.The present study involves data from only 33 speakers due to issues with image registration for three of the participants and subsequent problems with generating the velum signal from their images.Participants gave written information about the town and region in which they grew up and went to school.All speakers reported normal hearing and speaking function and each of them filled out additional forms determining compatibility for an MRI measurement.Participants gave written consent before the MRI measurement and were paid for their participation.

B. Stimuli
The speech material used in this study consists of a subset of a larger speech corpus that originally involved 152 German monosyllabic and disyllabic natural words, which, if necessary, were inflected to achieve the required sound sequence.The items of the overall corpus were embedded in carrier phrases with varying prosodic conditions in which the location of the nuclear accent varied (an overview of the prosodic conditions is included as supplementary materials 1 ).Each speaker read out %350 stimuli.For the current study, a subset of the original corpus was selected with target words being positioned sentence medially and with nuclear accent.The stimuli consist of disyllabic CVNV and CVCV sequences (the only exception to this is the monosyllabic CVC item /ba+t/ "asked for") with the primary accent on the first vowel, which was also the target vowel in these contexts.The second consonant was either /n/ or /t/.The second vowel was either /@/ or /AE/.For each context separately, the effects of vowel height and tensity were considered with respect to the spatiotemporal extent of velum lowering.In both contexts, the target vowels were /a+, a, i+, I, o+, O, ø+, oe/.In total, the subset includes 997 items from 30 target words, divided into 500 CVNV and 497 CVCV items (a list of the target words used in this study is included as supplementary materials 1 ).

C. Imaging
Real-time magnetic resonance imaging (rt-MRI) data were acquired at the Max Planck Institute for Multidisciplinary Sciences in G€ ottingen, Germany.For image acquisition, a 3 T MRI system was used (Magnetom Prisma Fit, Siemens Healthineers, Erlangen, Germany).Participants were measured in supine position via a 64-channel head coil with the radiofrequency (RF)-spoiled FLASH sequence.This method is based on highly undersampled radial gradient echo acquisitions and is combined with serial image reconstruction by regularised non-linear inversion (Uecker et al., 2010).Individual images were obtained from a single set of nine spokes [repetition time (TR) ¼ 2.22 ms], which resulted in a reconstructed frame rate of 19.98 ms or 50.05 frames per second (fps).An in-plane pixel size of 1.41 Â 1.41 mm and a slice thickness of 8 mm was applied, which yielded images of 136 Â 136 voxels [i.e., threedimensional (3D) volume elements] in a field of view of 192 Â 192 mm.

D. Data collection
Before the MRI recording, each participant was instructed in a separate preparation meeting which took place in the same week as the MRI measurement.During preparation, participants filled out the required forms and gave written consent before they made themselves familiar with the reading task.Speakers were asked to readout the stimuli (i.e., the complete carrier phrases), which were presented in blocks on a notebook screen.Each block comprised 13 to 14 consecutive slides that switched automatically after four seconds and each block started with one dummy sentence to give participants the chance to adjust to the task.During the instruction session, participants sat in a quiet room and were asked to readout the stimuli sentences with a normal speech volume.In total, the preparation session took about one hour.
Before entering the MRI machine, participants filled out further consent forms at the institute and were checked again for MRI compatibility.All images were obtained from a mid-sagittal slice.In total, 25 reading blocks were recorded per participant.The sentences appeared on a screen projected onto a mirror just above the head coil.Depending on the exact number of sentences, one block took about 60 s of recording time.With the temporal resolution of 19.98 ms, % 2800 images were acquired per block, which resulted in a total of 70 000-80 000 images per participant.While the order of the prosodic conditions was the same for all speakers, the stimuli within the blocks and the blocks within their specific condition were randomised to avoid habituation effects.At least two blocks of the same condition were consecutively presented before another condition was introduced.In general, each target word was produced only once by each participant.In some cases, a reading block had to be repeated due to technical issues or too many mispronunciations.In addition to the image recordings, synchronous acoustic recordings were made by an optical microphone with integrated software for adaptive noise cancelling (Dual Channel-FOMRI, Optoacoustics, Yehuda, Israel).The microphone was adjusted close to the lips once at the start of the recording.The overall measurement procedure took one and a half hours per participant.

Velum movement
The images were processed in MATLAB (The Mathworks Inc., details in Carignan et al., 2021;Carignan et al., 2020).
For each speaker's data set, the images were first registered by pre-creating a region of interest (ROI) that covered the upper portion of the head.By this method, each image was aligned to the first image of the measurement such that small movements of the head that occurred during the recording were compensated.To create a velum signal, a second ROI was manually defined for each speaker around the spatial range boundaries of the velum lowering and raising gestures and comprised approximately 600 voxel sites, which were defined as dimensions in principal component analysis (PCA).As there was only one primary degree of freedom associated with the lowering and raising gesture, 2 the first principal component (PC1) necessarily referred to the velum movement and explained 52.7% (SD ¼ 9.4) of the data variance (cf.Carignan et al., 2021).To create the velum signal, the scores from PC1 were logged for each individual image, which is exemplified in Fig. 1.By this method, a timevarying signal was obtained with a sampling rate of 50.05 Hz [Fig.2(a), lower panel].Participants' individual morphology was taken into account as each speaker's data set was registered individually.The PC1 scores were scaled between 0 and 1, referring to the minimum and maximum PC1 scores for each speaker's data set.Hence, the values can be interpreted as follows: high values correspond to a lower velum position and low values indicate a more raised velum.Thus, this method does not make statements about velum opening in the sense of nasal coupling, i.e., calculating the distance between the velum and the pharyngeal wall, but allows for the investigation of velum movements that may occur even with a constantly closed velopharyngeal port, as indicated by Fig. 2(b) (panels for the first three time points).

Tongue position
In addition to the analysis of the velar movements, another method was applied to capture the lingual and pharyngeal movement patterns during the target words, based on software provided by C. Carignan. 3After image registration, a semi-polar grid consisting of 28 lines was applied semi-manually to the vocal tract, reaching from the glottis up to the alveolar ridge (Fig. 3, left).This was achieved by manually selecting the locations of the glottis, velopharyngeal port, and alveolar ridge as well as a location of air.The midpoint of the line from the alveolar ridge to the glottis was accordingly located within the genioglossus muscle in all subjects and served as the origin for the semi-polar grid.
The gridlines terminated at the automatically detected posterior or superior boundary as in Fig. 3 (right).The pixel intensities along each of the 28 gridlines were further processed in the following way: (1) Mean pixel intensity per gridline: The mean intensities per gridline were then grouped into the following five Corresponding images show the velum position at t ¼ 0.8, 0.9, 1.0, 1.1 s.These time points roughly correspond to onset of /AE/ in /vi+dAE/, midpoint of /z/ in /za+n@/, midpoint of /a+/ in /za+n@/ and the onset of the /n/ in /za+n@/.The PC score at each of these time-points is given in the title bar of each panel ("v¼").
Thus the alv signal, for example, is the mean of the mean pixel intensity for gridlines 25-28, with subsequent subject-specific scaling of the resulting minimum and maximum value from 0 to 1. (2) The pixel intensities along each gridline were thresholded and scaled to maximise the contrast between air and soft tissue and then summed.The lower threshold was defined as 0.25 * (max.pixel value-min.pixel value) on a reference gridline in a reference image with a clear contrast between air and soft tissue.The upper threshold was defined as 3 * lower threshold.Pixel values were then rescaled such that the lower threshold corresponded to zero and the upper threshold to 255.All values below 0 or above 255 were clipped to these values.Then the sum of these scaled and thresholded values was calculated for each gridline.The gridlines were then grouped into articulatory regions and normalised to 0-1 as above.We refer to the articulatory signals resulting from this second procedure as "alv2," "pal2," "velar2," "hyperph2," "hypoph2." The polarity of all the signals from approaches (1) and ( 2) is such that greater values correspond to greater articulatory constriction, i.e., more high intensity pixels corresponding to soft tissue relative to low intensity pixels corresponding to air.In practice procedures (1) and (2) give very similar results.The main difference is that procedure (2) results in signals with a greater tendency to saturate at a plateau value when there is complete articulatory constriction.
The tongue configuration for the vowels was analysed by means of PCA; the input to the PCA consisted of the eight signals pal, pal2, velar, velar2, hyperph, hyperph2, hypoph, hypoph2 at the vowel midpoint, i.e., all signals assumed to correspond to the dorsal and radical region of the tongue.

F. Acoustic analysis
The acoustic data were processed via MATLAB (version 9.3.0.713579,R2017b) to achieve further noise cancelling of the scanner tone.Acoustic analyses were performed for reasons of segmentation and measurement of vowel duration.The temporal boundaries were further used for time alignment in the subsequent analytic processes (see Sec. I).The acoustic analyses were performed manually by means of the PRAAT software (Boersma and Weenink, 2017).The onset of the target vowel was defined either as the point of release of the preceding stop (i.e., post-consonantal aspiration was part of vowel) or, if preceded by a fricative, as the transition changes from high frequencies into clear formant structures (i.e., abrupt modifications in F1, F2, F3).The vowel offset boundary was defined either at the oral stop closure or at the transition into clear spectral frequency changes for the nasal stop and coincided with the onset of the following consonant.The offset of the nasal stop was defined at the transition into frequency changes related to the following vowel.The offset of the oral stop was defined when high frequencies occurred due to the burst.In general, the acoustic energy apparent in the oscillogram was additionally used for validation.

G. Modelling the shape of the velum signal
A multi-step procedure was set up in order to model the relation between relevant shape traits of the velum signals and a number of variables related to the vowels and their properties.For reasons set out in Sec.II, context (nasal vs oral) was not modelled directly, but two distinct analyses were carried out, one for each context.Relevant shape traits were extracted by applying functional principal component analysis (FPCA, Ramsay and Silverman, 2005) separately on each context-specific subset of velum tracks.FPCA provides a data-driven parametrisation of a set of input curves, each parameter or score quantifying an independent shape variation mode consistently found in the data.Once scores were obtained, these were used as response variables in a set of linear mixed-effects (LME) models where predictors were combinations of variables describing the vowel itself, either as a discrete category (/a, i, o, ø/) or as a continuous measure of tongue position (see Sec. II), as well as its tensity.Indirectly, this is a way to relate these predictors to different aspects of the velum lowering tracks' shape, as each score modulates a distinct and independent variation mode in the tracks' data set.In the following, first we describe data pre-processing, then FPCA, and finally the structure of the LME models.

Pre-processing: Smoothing and boundary alignment
Velum signal tracks were processed following the procedure outlined in Gubian et al. (2015).First, each sampled signal underwent smooth B-spline interpolation in order to obtain a continuous functional representation of each track, which is the required input format for FPCA.Then, the acoustic boundary between the vowel and the following consonant was used as an inner boundary to time-align each track in such a way that vowel onset (start), vowel offset (inner boundary), and consonant offset (end) occur at the same time across all tracks (see landmark registration in Ramsay andSilverman, 2005, andGubian et al., 2015).By eliminating phase differences across tracks, the downstream analysis is based on a manipulated time axis that allows us to attribute shape traits as belonging to specific segments.Note that the duration difference between tense and lax vowels that is removed by the time-alignment procedure will be considered in more detail in the discussion (Sec.IV).

FPCA
FPCA (Ramsay and Silverman, 2005) provides a datadriven parametrisation of a set of input curves represented by continuous functions defined on the same time interval.The FPCA parametrisation is expressed by the following equation: where i is the index identifying each curve, F i ðtÞ is the i-th curve expressed as a function of time t, lðtÞ is the mean curve, PCk(t) are principal components (PCs), k ¼ 1; …K, which are computed on the basis of the entire curve data set, and s i;k are weights or scores, which modulate PCs differently for each curve.Formally, Eq. ( 1) follows the same structure of ordinary PCA, namely, any input curve F(t) is approximately decomposed into a linear combination of K PCs added to the data set lðtÞ.
We performed two independent FPCAs, one on the N and one on the C context data.This choice was dictated by the necessity to exploit the descriptive power of FPCA for the variables of interest, i.e., the vowels.Had FPCA been applied on the entire curve data set, most of the variance, and hence the first and most reliable PCs, would have captured the dominant shape difference between N and C context, confining the differences between vowels to higher order PCs, which are more sensitive to noise.We computed the first K ¼ 3 PCs, which when combined explained 99% of the velum curve variance both in the N and C context (N context: PC1: 85%, PC2: 12%, PC3 3%; C context: PC1: 96%, PC2: 3%, PC3: 1%).Figures 4(a) and 4(b) show, respectively, each PC's variation mode separately in the N and C contexts.The vertical dashed line indicates the segment boundary between the vowel and the following consonant.Thick black curves are the mean velum curves lðtÞ for each context, and are the same across PC panels for the same context.The colour-coded curves illustrate the modification of the velum signal shapes by each PC in Eq. (1).In each PCk panel, a range of equidistant values between -1 (blue) and þ 1 (red) standard deviations of s k (r k ) were substituted in Eq. ( 1), setting all other scores to zero.
Figures 4(a) and 4(b) allow us to relate quantitative changes in PC scores to qualitative, dynamic changes in velum signal tracks.To a first approximation, the first three PCs model similar variation modes in both N and C contexts.PC1 captures a rigid vertical translation, whereby a positive or negative s 1 moves the curve higher or lower.Thus a positive s 1 corresponds to a lower velum and when s 1 is negative, velum lowering decreases.PC2 captures a tilt, such that a positive or negative s 2 tilts the curve downwards or upwards.PC3 roughly captures a concavity/convexity feature, with positive s 3 corresponding to a U-shape.Despite these broad similarities in corresponding PCs across the two contexts, important differences need to be pointed out, which directly affect interpretation.First, there is a large scale difference in velum lowering between the two contexts, which is found both in the respective mean curves across conditions, where the N context mean spans roughly between 0.2 and 0.7 relative velum lowering, while the C context mean is basically flat at around 0.2, as well as in the larger vs smaller vertical translation effect of s 1 in N and C condition, respectively.Second, the tilt encoded by PC2 has to be interpreted differently in the two contexts.In the N context, the tilt works as a non-linear compensation to the vertical shift encoded by PC1.This is because the FPCA parametrisation in Eq. ( 1) is a linear model which cannot directly incorporate the fact that the velum curves are bounded between 0 and 1.As a consequence, curves associated with a particularly high or low s 1 (vertical shift) require high positive values of s 2 as they need to be tilted downwards, otherwise they would take values above 1 or below 0, respectively (see Appendix A for more detail).As a consequence, in the N context, s 2 is not considered in the analysis.In the C context, the tilt encoded by PC2 is less pronounced and does not fulfill any obvious compensation effect.For PC3 in N context, a negative s 3 causes a steeper shape of the curve at the beginning and a more distinct asymptote at the end.Therefore, a negative s 3 corresponds phonetically to less velum lowering in the beginning as well as an earlier peak of lowering and thus an earlier initiation of the velum raising movement (a positive s 3 has the reverse effect).For PC3 in C context, a negative s 3 causes a higher degree of bending, i.e., more negative values in the beginning, followed by a higher peak during the vowel segment and again more negative values in the consonantal segment.Thus a negative s 3 corresponds to a higher amplitude of the velum lowering gesture during the vowel and a more distinct raising gesture during the consonantal segment.Despite PC3 explaining only a small fraction of the curve variance in both contexts, its effect on the shape is sufficiently noticeable and interpretable, especially in the N case, where it is concentrated at the curve extremities (which make up a small fraction of the time axis, hence the small fraction of explained variance).

LME models
LME models were run with PC scores as response variables.The base models have vowel (factor with four levels: /a, i, o, ø/) and tensity (factor with two levels: tense, lax) as interacting fixed factors; speaker and word onset were random intercepts. 4In reporting results, canonical symbols will be used for the vowels, e.g., /I/ as opposed to lax /i/.While all 3 (PCs) Â 2 (contexts) models were estimated, the model predicting s 2 for the N context was not analysed for the reasons discussed in Sec.II and Appendix A. For each model, the following statistics were computed: (i) a type III analysis of variance (ANOVA) on the fixed factors; (ii) marginal Pseudo-R 2 (Johnson, 2014;Nakagawa and Schielzeth, 2013), which provides an estimate of the proportion of variance explained by the fixed effects only; (iii) estimated marginal means (EMMs) for each combination of vowel and tensity, which are substituted into Eq.( 1) to obtain predicted velum lowering curves; and (iv) post hoc significance tests between pairs of vowels within the same tensity value (six pairs for each tensity) and between corresponding vowels across tensity values (four comparisons).The resulting 16 tests are Bonferroni-corrected for multiple comparisons.Finally, LME models using an alternative codification of vowel as a continuous covariate related to tongue position (cf.Sec.II) were considered and compared to those with vowel as four-level factor.LME models and ANOVAs were computed using R library lmerTest (Kuznetsova et al., 2017), EMMs and post hoc comparisons with R library emmeans (Lenth, 2022), Pseudo-R 2 with R library MuMIn (Barton, 2022). 1

III. RESULTS
Results for these models are reported separately for the N and the C contexts in Secs.III A and III B, respectively, while a comparison between contexts is reported in Sec.III C. Models including tongue position as covariate are presented in Sec.III D.

A. CVNV context
Figure 5 shows the distribution of s 1 and s 3 scores separated by tensity and vowel category.Higher s 1 scores correspond to more velum lowering, while higher s 3 scores indicate a later peak of the velum lowering gesture and thus a later time point of the raising gesture.Figure 6 shows reconstructed vowel-specific velum signal curves based on the corresponding estimated marginal means of s k provided by the LME models with s 1 , s 2 , and s 3 as independent variables.The curve reconstruction in Fig. 6(a) comprises s 1 , s 2 , and s 3 .To disentangle the effects of s 1 and s 3 on the individual vowel curves, Figs.6(b) and 6(c) depict the effects of s 1 and s 3 separately.

Differences in s 1
With respect to s 1 [Figs. 5(a) and 6(b)], differences were apparent between tense and lax vowels.Within the tense vowels, /a+/ showed the highest degree of velum lowering followed by /o+/, while /i+/ and /ø+/ showed quite similar patterns of velum lowering.The patterns for lax vowels, in contrast, were split into two groups, with /a/ and /O/ on the one hand and /I/ and /oe/ on the other hand.However, the visual impression received from Figs. 5(a) and 6(b) did not always match the statistical results.There was a significant influence of tensity (F[1, 18] ¼ 5.54, p < 0.05) and vowel (F[3, 22] ¼ 77.8, p < 0.001) on s 1 as well as a significant interaction between these factors (F[3, 22] ¼ 20.7, p < 0.001).Post hoc tests showed that all tense vowels significantly differed from each other with respect to s 1 scores (p < 0.001) except for /i+-ø+/ (p ¼ 1), whereas for lax vowels, significant s 1 differences were found only for /I-a/ (p < 0.05).Moreover, significant s 1 differences between lax and tense vowels were reported for /a -a+/ (p < 0.001), but not for the other vowels.Table I provides details on the direction of the respective vowel contrasts.

Differences in s 3
Considering s 3 , Figs. 5(b) and 6(c) suggest that tense vowels showed more variation with respect to the shape of the asymptote than lax vowels, indicating that tense vowels exhibited more differences in the time point of the velum lowering peak and the initiation of velum closure during the VN sequence.Although s 3 variation between the vowels was not as distinct as for s 1 , the most evident difference was between /i+/ and /a+/, with /a+/ showing initiation of velum closure in the nasal segment, while in /i+/ no raising gesture was evident [Fig.6(c)].In addition, it is striking that tense /i+/ and /ø+/ showed highly similar s 1 contours but different s 3 patterns (note the strong overlap of the /i+/ and /ø+/ curves for s 1 and of /o+/ and /ø+/ for s 3 ).A significant effect was found for vowel category (F[3, 8] ¼ 4.78, p < 0.05) and tensity (F[1, 7] ¼ 36.1, p < 0.001) as well as their interaction (F[3, 8] ¼ 12.8, p < 0.01).Post hoc tests indicated a significant contrast between lax /O -a/ (p < 0.05), with /a/ exhibiting higher s 3 scores, and between all tense vowels (p < 0.001) except for /ø+/ vs /o+/ (p ¼ 1), with /a+/ showing lower s 3 scores than /ø+/, /o+/, and /i+/.In addition, significant s 3 score differences were reported for lax vs tense /O -o+/ (p < 0.001) and /a -a+/ (p < 0.001).Table I provides an overview of the respective directions of the contrasts.

B. CVCV context
In the C context, distribution patterns of the scores were less distinct with respect to the vowel category.Figure 7 shows the distribution of s 1 , s 2 , and s 3 scores separated by tensity and vowel category.As with the N context, a positive s 1 corresponds to more overall velum lowering and a negative s 1 indicates less velum lowering.s 2 is more related to the change in velum height across the vowel and consonant, such that negative s 2 values are associated with a steeper incline of the curve, i.e., a greater change from a raised to a more lowered velum position.A negative s 3 causes a higher degree of curve bending, i.e., more negative values in the beginning, followed by a higher peak during the vowel segment and again more negative values in the consonantal segment.Figure 8 illustrates the effect of s 1 ; s 2 ; s 3 scores on the mean curve with estimated marginal means for the individual vowels predicted by the corresponding LME model.Note that the scaling of the y axis is markedly different from that of the N context discussed before, which indicates a generally decreased range of velum lowering.When s k was combined [Fig.8(a)], tense /a+/ showed the steepest slope and the highest degree of velum lowering, with its peak just before the vowel offset.

Differences in s 2
For s 2 [Figs. 7(b) and 8(c)], tense /a+/ showed the steepest incline that continued throughout the consonant, while /o+/ and /ø+/ had near-identical curves.Thus, tense /a+/ not only exhibited the lowest velum position compared to the other vowels but it also showed the largest positional variation during the vowel, whereas tense /o+/, /ø+/, and /i+/ were only slightly affected.Lax /a/ and /O/ showed similar contours and exhibited a slightly steeper contour than /I/ and /oe/, which for their part showed quite similar patterns.s 2 was significantly affected by vowel (F[3, 458] ¼ 23.3, p < 0.001); there was also a significant interaction between vowel and tensity (F[3, 457] ¼ 3.59, p < 0.05).Post hoc tests indicated significantly lower s 2 values for lax /a/ compared to /I/ (p < 0.001) and for tense /a+/ compared to all other tense vowels (p < 0.001).Moreover, a significant s 2 difference was found for lax vs tense /a -a+/, with the lax vowel showing higher values (p < 0.01).

C. CVNV vs CVCV
To illustrate the inherent difference in velum lowering degree between N and C contexts, Fig. 9 contrasts the reconstructed velum curves for tense and lax vowels between the contexts when the respective mean curve is subtracted.Subtracting the context-specific mean curves allows for a visualization of the pure vocalic effects in the N vs C context: this is because the curves in the N context are otherwise dominated by the overall transition from V into N, which may mask the actual vocalic effects when comparing N and C contexts.The range between the lowest and highest point in the N context for tense vowels was about 0.35 and for lax vowels 0.2, while in C context, it was about 0.09 for tense vowels and 0.06 for lax vowels.Thus, in our data velum lowering was overall increased by a factor of 3.6 (tense) and 3.5 (lax) for vowels in the nasal context compared to vowels in oral environment.
Despite the large differences in the range of velum lowering, the course patterns in Figs. 6 and 8 and also in Fig. 9 show striking similarities in terms of the effects of the vowel on velum lowering.In both N and C contexts, a decrease in velum lowering is evident in the order of /a+/ > /o+/ > /i+, ø+/ FIG.(Color online) Velum signal curves as modulated by Eq. ( 1), where s 1 ; s 2 ; s 3 values are estimated marginal means predicted by the corresponding LME model for the individual vowels.Panels show the effect of s 1 ; s 2 ; s 3 in combination and in isolation.
for tense vowels and /a, O/ > /oe/ > /I/ for lax vowels.As indicated in Secs.III A 1, III A 2, and III B 1, however, the visually evident patterns are not always upheld by the statistical results (cf.Tables I and II).

D. Velum height and tongue position
So far, velum lowering has been considered in terms of vowel category and vowel tensity, i.e., abstract and categorical factors.From a physiological point of view, however, especially with respect to a possible relationship between tongue height and velum height, the velum signal data can be also considered in terms of the actual physiological tongue position captured during the different vowel types.To investigate this issue, the tongue position was determined at the vowel midpoint using the procedures described in Sec.II.
Figure 10 shows the tongue position in tense and lax vowels calculated by PCA based on combined data from the palatal, velar, hyper-, and hypopharyngeal region at the vowel midpoint.N and C contexts were combined as a preliminary inspection indicated negligible differences between the contexts for tongue positions.Increasing PC2 scores (t 2 ) on the x axis can be roughly interpreted as increasing backness of the tongue in the vocal tract.The y axis reflects the axis between the two extremes of palatal and pharyngeal constriction, i.e., between a narrow oral cavity and wide pharyngeal tract and a wide oral cavity and narrow pharyngeal tract.Decreasing PC1 scores (t 1 ) correspond to a more palatal constriction.In agreement with previous research on German tense and lax vowel production (Hoole and Mooshammer, 2002), Fig. 10 shows that lax vowels were generally produced with a more centralised tongue position compared to the physiologically more peripheral tense vowels.Large lax vs tense differences were especially apparent in /O/ vs /o+/ and in /I/ vs /i+/.Tense /ø+/ was produced more in the front than its lax counterpart.In contrast, tense /a+/ and lax /a/ showed only slight differences, suggesting that these were produced in a similar way (cf.Cunha et al., 2013;Gao et al., 2020;Hoole and Mooshammer, 2002, for similarities of the tongue position in German tense /a+/ vs lax /a/).
To investigate the relationship between velum height and tongue position more closely, Fig. 11 shows s 1 as a function of t 1 in both N and C context (as s 1 corresponds to changes specifically in velum height).In the N context, a positive relationship was apparent between t 1 and s 1 , more so for tense than for lax vowels.In the C context, such a relationship was not that obvious.LME was applied with s 1 Post hoc tests showed a significant increase in s 1 with increasing t 1 for both tense and lax vowels.Similarly, for the C context significant effects were found for tensity (F[1, 462] ¼ 28.9, p < 0.001) and t 1 (F[1, 466] ¼ 42.4,p < 0.001).As with the N context, post hoc tests showed a significant increase in s 1 with increasing t 1 for tense and lax vowels.
Although the model suggested an increase in s 1 with increasing t 1 , Figs. 11(a) and 11(b) indicate that this relationship is not always strictly reflected by the data.For example, tense /i+/ and /ø+/ showed highly similar s 1 values in the N context but were distinct in t 1 .Similar patterns were evident for the C context.Considering lax /a/ vs tense /a+/ in both C and N contexts, Figs. 11(a) and 11(b) show similar t 1 values, whereas s 1 is quite distinct.Moreover, tense /i+/ showed lower t 1 values than lax /I/, while s 1 was very similar in the N context and showed even higher values for tense /i+/ than for lax /I/ in the C context.These specifics are considered in more detail in Sec.IV.

IV. DISCUSSION
Results are in general agreement with previous findings on overall velum position differences between vowels in nasal vs oral contexts (Bell-Berti et al., 1979;Lubker et al., 1970;Moll, 1962).Vowels preceding nasal consonants were typically produced with a more lowered velum than vowels preceding oral stops, and the range of velum lowering was increased in the N context by a factor of 3.5 (lax vowels) and 3.6 (tense vowels) relative to the C context.This result indicates that the influence of the following nasal consonant is quite robust throughout the vowel.Despite the large differences in velum lowering between the contexts, the pattern of the vowel-specific velum lowering stages was strikingly similar, in the order of /a+/ > /o+/ > /i+, ø+/ in the tense vowels and /a, O/ > /oe/ > /I/ in the lax vowels for both N and C contexts.It must be mentioned, though, that this order was not statistically upheld in all cases.
One primary concern of this study was to investigate how differences in vowel height affect the extent of spatial velum lowering and whether these differences are more associated with biomechanical effects, i.e., relations between the tongue and the soft palate, rather than with phonetic height.Our results suggest that the tongue position and the velum position are indeed highly interrelated, provided that larger vowel length differences are additionally taken into account (see below).This may bring up reconsiderations about the role of the palatoglossus (PG) connection between the tongue and the soft palate, albeit in a more mechanical way than has been commonly suggested.Studies on the role of the PG during speech basically have focused on activity patterns of the velopharyngeal muscles during the production of different sounds.While the levator palatini (LP) has been identified as one of the primary muscles for velum closure, the process of velum lowering is less clear and especially the role of the PG muscle.While speakers show consistent PG activity patterns during vegetative processes such as swallowing (Bell-Berti, 1973;Lubker et al., 1973;Lubker et al., 1970), they highly differ in their usage of the PG during speech (Bell-Berti, 1976;Benguerel et al., 1977;Moon et al., 1994).Such inconsistency may come from the fact that the PG is a relatively small muscle consisting of a large amount of connective tissue and showing varying attachment locations across speakers (Gick et al., 2014;Kuehn and Azzam, 1978).Nonetheless, the PG may still be involved in the varying velum height patterns that are related to the tongue position in a more mechanical sense.By establishing a mechanical connection to the soft palate, PG activity is not required for a pull-down effect on the velum.Thus, if the tongue is in a low back position, the velum may be lowered because it is connected to the tongue via the PG, not because the PG is activated.This point of view has also been addressed by Bell-Berti, who refers to "the downward pull of palatoglossus that occurs as a result of its contraction to narrow the faucial isthmus for open vowels [Á Á Á] and its resistance to being stretched during the articulation of low vowels [Á Á Á]." (Bell-Berti, 1993, p. 69).Following this view, under the consideration that LP activity decreases during a pre-nasal vowel in anticipation of the upcoming nasal (Bell-Berti, 1973, 1976), the effect of the PG connection becomes apparent especially in environments in which LP activity is suppressed. 5In contrast, since in oral contexts LP activity is relatively high, the LP force overrides the pull-down effect of the PG that is visible in nasal context.The varying balance of forces depending on the consonantal environment may thus account for the clear difference in the velum lowering range in the N vs C context.
One key finding that was repeatedly observed in the current study is that the tense low vowel /a/ was predestined to be produced with a lower velum than the other vowels, which was true in the C context but clearly so in the N context [cf. Figs. 4(a) vs 4(b); note the large scaling difference between the two y axes].This finding is consistent with the consideration suggested above, namely, that velum position during vowel production is guided by the tongue position, but the extent of this effect depends on how much LP activity is involved in the respective sound sequence.That LP strength slightly declines during vowels in oral context was reported for other languages (e.g., for American English and French: Amelot and Rossato, 2007;Bell-Berti, 1973;Clumeck, 1976;Rossato et al., 2003) and is also compatible with our data which indicated small but visible vowelspecific differences in velum lowering.Overall, our findings are consistent with the assumption that in oral contexts, LP strength may override the effect of PG pull-down, whereas in the nasal context, it becomes much more visible due to decreasing LP activity.
Next, velum pattern differences were most distinct between tense vowels, which was consistent with the more peripheral lingual constrictions in the vocal tract during tense vowel production.In contrast, highly similar velum lowering patterns were obtained especially for lax /a, O/, which matches the similar tongue position found during the production of these vowels.Some of the findings, however, do not seem to satisfy the assumption of a biomechanical relation between the tongue and the velum at a first glance.For example, the tongue data indicated a very similar position for both tense and lax /a/, but the patterns in velum lowering highly differed.This observation may be ascribed to the duration difference between tense and lax vowels, or more precisely, to the effect of the production of the initial consonant on the subsequent velum lowering gesture.In all items under investigation, the initial consonant consisted of an oral stop or fricative and thus required a closed velopharyngeal port, i.e., a raised soft palate.The velum data indicated that some lowering in /a/ was present as soon as the vowel was initiated, but in both N and C contexts lax /a/ never reached the extent of velum lowering characteristic of the tense counterpart.We suggest that this is because velum lowering proceeds less far in the short lax vowel than in the longer tense vowels 6 due to the lingering effect of LP activity in the preceding oral consonant.That is, the short vowel duration may contribute to a less extreme low position because it may take a certain time to adjust inherent muscle activity programming and execution both for activation and relaxation (cf.Bell-Berti, 1976; Lubker, 1968, p. 14). 7 Following this argument, anticipatory lowering during lax /a/ is present and some slight pull-down effect is evident, but it does not reach the lowering level of the tense vowel.
This consideration goes also with the observation that in oral lax /I/ [Fig.8(a)], there was no tendency for velum lowering at all, suggesting that due to the extremely short vowel and subsequent oral stop, the LP is activated throughout the whole sound sequence, such that no time is available for any PG pull.
Further, the velum curves in both contexts showed very similar contours for tense /i+, ø+/, whereas there were very clear tongue height differences between these vowels.In the C context, the velum contour corresponding to tense /ø+/ was even below that of /i+/ [Fig.8(a)], although /i+/ was articulated with a clearly higher tongue position (Fig. 10).We suggest that if tongue-pull on the velum is responsible for velum lowering differences, then velum height is more likely to be conditioned by broader constriction locations of the vowels, i.e., by pre-dorsal /i, ø/, post-dorsal /o/, and radical /a/ rather than by phonetic height.In agreement with this assumption, predorsal vowel height is known to be associated more with mylohyoid support and posterior genioglossus contraction to bulge the tongue hydrostat anteriorly.Thus, height differences among the pre-dorsal vowels may just not affect the more posteriorily located PG pull very much.
Another aspect that is to be considered is the difference in the velum raising movement, i.e., differences in s 3 .The data for the vowels in the C context must be handled with care, as the score differences between the vowels were generally much smaller than in the nasal context.Nonetheless, Fig. 8 suggests that the velum lowering contours for s 2 and even more so for s 3 showed greater distinctions between tense vowels compared to lax vowels.Again, this may be related to the clearly distinct vowel length differences between tense and lax vowels, such that sufficient time was provided for the small vowel-specific velum lowering differences to become visible.
For the N context, Figs.6(a) and 6(c) and the respective results suggest that lax vowels virtually did not differ in the initiation of the raising gesture.In the tense vowels, this point was first achieved by /a+/ and last by /i+/.The overall differences between tense and lax vowels may be again explained by a length effect, as there was more time available during the tense vowels to reach the maximum of velum lowering in the nasal, and this point was achieved first in /a+/ not only due to its greater length but also due to the generally lower velum position during the vowel.Also consistent with this consideration of the length effect is the pattern evident for /i+/ vs /ø+/, with small differences in velum closure (s 3 ).Table III (see Appendix B) indicates that /ø+/ was on average considerably longer than /i+/, suggesting that in /ø+/, there was a longer time span to reach the reversal point from velum lowering to the raising gesture.The case of /i+/ vs /ø+/ brings up additional considerations about the role of vowel length for vowel nasalisation.It has been argued that contrastive vowel nasalisation preferentially affects vowels with greatest duration (Hajek and Maeda, 2000) and that in cases in which /a/ becomes contrastively nasalised, this is because low vowels often show the greatest intrinsic length in many languages.Our data suggest that although vowel duration may influence the timing of the closure gesture (s 3 ), it is unlikely to have any primary influence on the extent of velum lowering.This can be deduced from at least two findings: first, /i+/ vs /ø+/ showed considerable differences in duration and also in phonetic height, but still had highly similar s 1 patterns; and second, /ø+/ was (on average) even slightly longer than /a+/ but was produced with much less velum lowering.
Overall, our results suggest that most likely neither length nor traditional phonetic height is the primary influence for the s 1 effects, but that velum lowering patterns can be better explained by broader constriction location categories for vowels.
Looking toward the future, we note that for the interpretation of our data we have made a number of assumptions that would warrant further investigation.In particular, we have suggested that the time course of levator relaxation could be relevant for explaining some apparently durationrelated differences between vowels.Within the context of the time-normalized movement patterns, we have shown this should translate into the expectation that at the beginning of the vowel the velum lowering gradient should be more positive for the longer tense vowel than for the shorter lax vowel (since a given time increment in normalized time translates to a longer absolute time increment in the longer, i.e., tense, vowels, and thus more time for the levator to relax and the velum to open).For assessing this, Fig. 9 is the most useful figure since the subtraction of the mean curve makes it easier to view the more subtle vowel-height and tenseness related differences: By and large, it is indeed the case that the gradient of the curves for the tense vowels (dashed) at the beginning of the vowels (say up to about 0.05 in relative time) is more positive (or less negative) than the lax counterpart (solid).However it must be admitted that it would highly desirable to have explicit data on timing patterns of levator activation.Ideally, this should be for vowels of differing durations, also include synchronous information on velum height, and either through cross-language or crossspeaker variation include a range of variation in levator timing.Clearly, recording such a data-set would be quite challenging (as far as we are aware published data does not cover a sufficient range of variation, and in any case typically does not publish the results in sufficient temporal detail).Moreover, for detailed elucidation of the relation between levator timing and velum height it would also be necessary to factor in detailed consideration of the time-course of tongue movement.While MRI data of the kind analysed here (or e.g., EMA data) could perhaps provide this in a parallel recording to EMG investigation of levator activity there remains a further assumption in our interpretation that ideally also requires resolution, if the aim is to arrive at a comprehensive dynamic account of the relation between velum height on the one hand, and levator activation and tongue movement on the other.This is the idea that palatoglossus pull on the tongue is not just a question of tongue height, but also depends on the constriction location of the vowels, i.e., is less differentiated for palatal vowels (such as /i/ and /ø/) than pharyngeal vowels such as /a/.This could probably best be investigated by means of a sufficiently refined biomechanical model of the oral structures.Recent work in the Artisynth framework indicates that this may well be starting to become feasible (see e.g., Anderson et al., 2019).

V. CONCLUSION
This study has elucidated the role of vowel height on velum position during German tense and lax vowels in nasal and oral contexts, with "vowel height" being better interpreted in terms of physiological tongue position, i.e., the constriction location for vowels instead of phonetic height.Our findings are compatible with an account that considers a purely mechanical effect caused by the palatoglossus connection between the soft palate and the tongue, without palatoglossus activity being required.As the effects of such a mechanical pull-down become visible only with sufficient time (meaning the time for the slight LP relaxation we assume to be characteristic of vowels vs consonants to take effect), clear vowel length differences (as in lax vs tense vowels) must be additionally taken into consideration to account for the patterns found in our data.

ACKNOWLEDGMENTS
This research was funded by DFG (German Research Foundation) Grant No. HA 3512/15-1 "Nasal coarticulation and sound change: A real-time MRI study" (J.H. and J.F.) and supported also by European Research Council Advanced Grant No. 742289 "Human interaction and the evolution of spoken accent" (J.H.).

APPENDIX A: INTERPRETATION OF PC2 FOR THE NASAL CONTEXT
In Sec.II G 2 it is stated that PC2 in the N context has to be interpreted as a non-linear compensation to the vertical shift encoded by PC1.Here, we demonstrate that statement with the help of data visualisation.
Figure 12 shows the distribution of (s 1 , s 2 ) values for the N context data, each black dot corresponds to a velum lowering contour.Despite FPCA being bound to produce (Pearson) uncorrelated score distributions, a pattern is visible, marked by the gray triangle.The boundaries of the data distribution are remarkably straight instead of being rounded, which indicates the presence of a dependence between s 1 and s 2 in the form of a constraint.In fact, the more s 1 values are further away from zero in either direction, the more s 2 values become constrained towards high positive values.
In order to interpret the meaning of such constraint, six representative pairs of (s 1 , s 2 ) values (coloured diamonds in Fig. 12) are used to reconstruct six velum height curves by means of Eq. ( 1).The result is shown in Fig. 13, where a gray shaded area identifies the admissible values for velum lowering, i.e., between 0 and 1.The curves corresponding to probe points that lie outside the gray triangle in Fig. 12 are exactly those that partly take values outside the admissible values in Fig. 13.One is the combination ðs 1 ¼ À0:15; s 2 ¼ À0:04Þ (bottom left diamond in Fig. 12), which takes values below 0, the other is ðs 1 ¼ 0:15; s 2 ¼ À0:04Þ (bottom right diamond), which takes values above 1.
Going back to the general interpretation of PC1 and PC2 given in Sec.II G 2 [see Fig. 4(a)], PC1 approximately encodes a rigid vertical shift, while PC2 encodes a tilt.While the independence of PC scores would imply that these two shape variation modes are found in arbitrary combinations within the curve data set, the data distribution in Fig. 12 shows that this is not the case.In fact, such independence is there only for curves that are neither too high nor too low (near s 1 ¼ 0), where a tilt in either direction (applied by s 2 ) does not make the curve trespass the admissible range of values (mid panel in Fig. 13).On the other hand, when a curve is high (s 1 > 0) or low (s 1 < 0), a downwards tilt (s 2 > 0) must be applied in order to remain in the admissible 12. (Color online) Scatter plot of (s 1 , s 2 ) for the N context data (black dots).The gray shaded triangle suggests a constraint in the distribution.Diamonds are probe points used in Fig. 13.range (side panels in Fig. 13).The data model provided by FPCA [Eq.( 1)] is linear in the PC scores and does not have the expressive power to encode a constraint like the one present in this data set.As a consequence, part of the patterns of regularities are not explicitly expressed by the model and need to be interpreted post hoc.This is analogous to e.g., using ordinary linear regression to model a proportion, which is constrained between 0 and 1, but such constraint cannot be enforced by the model itself.This is why we excluded PC2 from the analysis, as it appears to have the function of applying a constraint, or correction that is not expressed by PC1 alone and it would not provide interpretable results on its own.

FIG. 1
FIG. 1. (Color online) Image analysis by PCA.PCA loadings were estimated for a ROI, i.e., the region delimited by the irregular white line in all three panels.The loadings are shown colour-coded in the ROI in (a), ranging from strongly negative (dark-blue) via zero (lightblue-green) to strongly positive (bright yellow).For computational convenience subsequent calculations use the rectangular region delimited in red in (a).This is achieved by setting all pixel loadings to zero in the region between the irregular white line (the actual ROI) and the red rectangle.Note that the color-coding for all pixels in panel (a) outside the red rectangle and for all pixels everywhere in (b) and (c) correspond to the MRI-determined pixel intensities.PC1 scores represent the correlation coefficient between each frame and the loadings, i.e., between the raw pixel intensities colour-coded in the ROI of panels (b) and (c) and the loadings colour-coded in the ROI of panel (a).High scores are obtained if the velum closely resembles the positive loadings, exemplified by the frame with low velum position in panel (b).Low scores result from a high velum, exemplified by the frame in panel (c) where high pixel intensities coincide with low loadings, and low pixel intensities with high loadings.
FIG. 3. Semi-polar grid lines for the estimation of vocal tract aperture values.Left: basic grid.Right: terminated grid lines by estimated tissue-air boundaries.

FIG. 4
FIG. 4. (Color online) Effect of the first three PCs for the nasal (a) and the oral (b) context.Each PCk panel reproduces the variation around the mean curve lðtÞ (thick black curve) as lðtÞ þ s k Á PCkðtÞ, where s k ranges from Àr k to r k , r k being the standard deviation of s k .Note that higher values on the y axis correspond to an increase in velum lowering.
FIG. 5. (Color online) Distribution of s 1 and s 3 scores.Higher s 1 scores (a) correspond to more velum lowering; higher s 3 scores (b) indicate a later velum lowering peak and a later raising gesture.
FIG. 6. (Color online) Velum signal curves modulated by Eq. (1), where s 1 ; s 2 ; s 3 are estimated marginal means predicted by the corresponding LME model for the individual vowels.(a) Curves based on s 1 ; s 2 ; s 3 scores.(b) Curves based only on s 1 .(c) Curves based only on s 3 .
FIG. online)Tongue position for individual tense and lax vowels indicated by PC1 scores (t 1 ) and PC2 scores (t 2 ).Data comprise the palatal, velar, hyper-and hypopharyngeal region at the vowel midpoint.Ellipses are based on a 95% confidence level.
FIG. 13. (Color online) Velum lowering curves obtained by substituting (s 1 , s 2 ) coordinates of the six diamond points in Fig.12into Eq. (1).The shaded gray area indicates the admissible values for the curves.

TABLE I .
Statistically significant vowel ordering differences for s 1 and s 3 (N context).
FIG. 7. (Color online)Distribution of 1 , s 2 , and s 3 .Higher s 1 scores correspond to more velum lowering; lower s 2 scores indicate a steeper velum lowering contour; lower s 3 corresponds to a more distinct elbow in the direction of velum lowering during the vowel segment.https://doi.org/10.1121/10.0016366

TABLE III .
Mean vowel duration of tense and lax vowels under consideration (N context).