Selasa, 25 Agustus 2009

Aspects of connected Speech

Aspects of connected Speech

Aspects of connected speech

Many years ago scientists tried to develop machines that produced speech from a vocabulary of pre-recorded words; the machines were designed to join the words together to form sentences. For very limited messages, such as those of a “talking clock”, this technique was useable, but for other purpose the quality of the speech was so unnatural that it was practically unintelligible. The failure of this “mechanical speech” approach (which eventually led to the development of speech synthesis by rule) has many lessons to each us about pronunciation teaching and learning, and it will be useful, in looking at connected speech, to bear in mind the difference between the way humans speak and what would be found in “mechanical speech”.

Speech is a continuous stream of sounds, without clear-cut borderlines between them, and the different aspects of connected speech help to explain why written English is so different from spoken English.

A. Rhythm

There are many parallels between speech and music, and one thing that is always found in music rhythm. In music rhythm is usually produced by making certain note in a sequence stand out from others by being louder or longer higher. We should not make the mistake of thinking that musical rhythm is just an unvarying repetition of beats at equal intervals. This may be true of commercial pop music (as can be heard coming out of someone’s headphones, or through the wall from the room next door), but throughout the world in traditional folk music and other serious musical forms we can find some amazingly complex rhythms which are still immediately recognizable as regular. In speech, we find that syllables take the place of musical notes or beats, and in many languages the stressed syllables determine the rhythm. If you were asked to clap your hands in time with the sentence

You would be most likely to clap at the points marked with the stress mark ‘. It is often claimed that English speakers try to keep an equal time between the stressed syllables, so the time between claps stress-timed, and it is claimed that the unstressed syllables between the stressed syllables are squeezed into the time available, with the result that they may become very short. In fact, this is only found in a style of speech (slow, emphatic) where the rhythm is strong, and in ordinary conversational speech it is much harder to make a convincing case for this isochronous rhythm (where the time intervals between stressed syllables are equal); as with music, we should not expect rhythm to be simple. Other languages have different rhythms (as you can easily hear by listening to them). To the ears of English speakers, Italian and Swedish have a very different rhythm from English. Spanish, French, and Chinese sound syllable-time to English-speaking listeners—it sounds as though all the syllables are equal length, and the dominant role of stressed syllables in making up the rhythm is much less noticeable. But these judgments are very subjective and finding scientific evidence about what make as hear languages as rhythmically different is proving to be very difficult. What does seem to be clear is that rhythm is useful to us in communicating: it helps us to find our way through the confusing stream of continuous speech, enabling us to divide speech into words or other units, to signal changes between topic and speaker, also to spot which items in the message are the most important.

The nation of rhythm involves some noticeable event happening at regular intervals of time; one can detect the rhythm of a heart-beat, of a flashing light or of a piece of music. It has often been claimed that English speech is rhythmical, and that the rhythm is detectable in the regular occurrence of stressed syllable; of course, it is not suggested that the timing is as regular as a clock—the regularity of occurrence is only relative. The theory that English has stress-timed rhythm implies that stress-syllable will tend to occur at relatively regular that stressed syllables are given numbers: syllables 1 and 2 are not separated by one unstressed syllable, 3 and 4 and 5 by three.

1 2 3 4 5

‘Walk ‘down the ‘path to the ‘end of the ca’nal

The stress-timed rhythm theory states that the times from each stressed syllable to the next will tend to be the same, irrespective of the number of intervening unstressed syllable. The theory also claims that while some language (e.g. Russian and Arabic) have stress-timed rhythm similar to that of English, others (such as French, Telugu and Yoruba) have a different rhythmical structure called syllable-timed rhythm; in these language, all syllable, whether stressed or unstressed, tend to occur at regular time-intervals and the time between stressed syllable will be shorter or longer in proportion to the number of syllables. Some writers have developed theories of English rhythm in which a unit of rhythm, the foot, is used (with an obvious parallel in the metrical analysis of verse) ; the foot begins with a stressed syllable and including) the following stressed syllable up to (but not including) the following stressed syllable. The example sentence given above would be divided into feet as follows:

1 2 3 4 5

‘Walk ‘down the ‘path to the ‘end of the ca’nal

It follows from what was said above that stress-timed language all the feet are supposed to be4 of roughly the same duration. Many foreign learners of English are made to practice speaking English with a regular rhythm, often with the teacher beating time or clapping hands on the stressed syllables. It must be pointed out, however, that the evidence for the existence of stress-timed rhythm is not strong. There are many laboratory techniques for measuring time in speech, and measurement of the time intervals between stressed syllable in connected English speech has not shown expected regularity; moreover, using the same measuring techniques on different language, it has not been possible to show a real difference between “stress-timed” and “syllable-timed’ language. Experiment have shown that we tend to hear speech as more rhythmical than it actually is, and one suspects that this is what the proponents of the stress-time rhythm theory have been led to do in their auditory analysis of English rhythm . However, one ought to keep an open mind on the subject, remembering that the large-scale, objective study of suprasegmental aspects of real speech is only just beginning, and there is much research that needs to be done.

There is a rather attractive compromise solution to this argument, in the form of a claim that in speaking English we vary in how rhythmically we speak: sometimes we speak ……. (this is typical of some styles of public speaking) while at another time we speak arhythmically (that is, without rhythm) – for example, when we are hesitant or nervous. Stress-timed rhythm is thus characteristic of one style of speaking, not of English speech as a whole; one always speaks with some degree of rhythmicality, but the degree will vary between a minimum value (arhythmical) and a maximum (completely stress-timed rhythm). What, then, is the practical value of the traditional”rhythm exercise” for foreign learners? The argument about rhythm should not make us forget the very important difference in English between strong and weak syllable; some language do not have such a noticeable difference (which may, perhaps, explain the subjective impression of “syllable-timing”), and for native speakers of such language learning English it can be helpful to practice repeating strongly rhythmical utterances since this forces the speaker to concentrate on making unstressed syllable weak. Speakers of language like Japanese, Hungarian and Spanish, which do not have weak syllable to anything like the same extent as English does, may well find such exercise of some value (as long as they are not overdone to the point where learners feel they have to speak English as though they were reciting verse.

It has been claimed that stress placement is conditioned to some extent by the influence of rhythm. Examples such as ‘fourteen’ for:t:n, ‘Westminster’ west’minstә (isolate forms and ‘fourteenth day’ ‘fo:ti:nθ ‘dei, ‘Westminster Abbey’ ‘westminster ‘æbi (where a stressed syllable follows the word in question) are said to be caused by a tendency in English to avoid two strong stress near each other. This explanation may have some validity, but it is difficult to see how it could be established as a proven fact rather than just an opinion.

B. Assimilation

Let us look at some examples of assimilation. In French, a word-final voiceless consonant will often become voiced if followed by a voiced segment. For example, the word ‘avec’ on its own is pronounced /avek/, but when it is followed by a word beginning with a voiced consonant such as /v/ in ‘vous’ /vu/, we usually hear /aveg/. So the phrase ‘avec vous’ is often pronounced /aveg vu/. In English, we also find assimilations of voice, but it is more common to find them in the form of loss of voice, or devoicing. If the word ‘have’ occurs in final position, its final consonant /v/ will usually have some voicing, but when that /v/ is followed by a voiceless consonant it normally becomes completely voiceless; thus ‘I have to’ is likely to have the pronunciation /aI hQf tu/.

Assimilation, then, is concerned with one sound becoming phonetically similar to an adjacent sound. The examples given so far are of anticipation, where a sound is influenced by the sound which follows it; another term frequently used for this type is regressive assimilation. We also find cases where the assimilation can be called progressive: here, not surprisingly, the process is for a sound to take on characteristics from a sound which precedes it. In general, this effect is less frequently found, though it is difficult to explain why this should be so. Historically, it must have been effective in English in order to produce the different pronunciations of the ‘-s’ ending: the plural of ‘cat’ /kQt/ is ‘cats’ /kQts/ with a final /s/; the plural of ‘dog’ /dg/ is ‘dogs’ /dgz/ with /z/. The voicing of the suffix is conditioned by the voicing of the preceding final consonant.

Assimilations are traditionally classified into three main types, though as we shall see this classification is not completely adequate.

(1) One type is assimilation of voice (we have seen examples of this taken from French and English); this may take the form of a voiced segment becoming voiceless as a consequence of being adjacent to a voiceless segment; alternatively, a voiceless segment may become voiced.

(2) Another type is assimilation of place: this refers to changes in the place of articulation of a segment (usually a consonant). A well-known case is that of English word-final alveolar consonants such as /t,d,n/: if a word ending in one of these consonants is followed by a word whose initial consonant has a different place of articulation, the word-final alveolar consonant is likely to change so that it has the same place of articulation. Thus the word ‘that’ /DQt/ may be followed by ‘boy’ /bI/ and become /DQp/ (thus ‘that boy’ /DQp bI/), or it may be followed by ‘girl’ and become /DQk/ (thus ‘that girl’ /DQk gÎùl/).

(3) A third type is assimilation of manner: here one sound changes the manner of its articulation to become similar in manner to a neighbouring sound. Clear examples of this type are not easy to find; generally, they involve a change from a “stronger” consonant (one making a more substantial obstruction to the flow of air) to a “weaker” one, and are typical of rapid speech. An English example could be a rapid pronunciation of “Get some of that soap”, where instead of the expected /get sÃm «v DQt s«Up. with /s/ replacing /t/ in two words.

We should now consider what the reason is for these processes. We must remember that in most cases several articulators are involved in making a speech sound, and that they are not capable of moving instantaneously. In the example of French consonant voicing, the final consonant is intrinsically voiceless, but in the example given, it is preceded by a fully voiced vowel, and is followed by a voiced consonant. To produce a voiceless consonant usually requires the opening of the vocal folds to prevent voicing from happening. If the vocal folds are instead left in the position appropriate for the voicing of the vowel context, the result is likely to be that the consonant is produced with voicing, and we can suppose that this is why the consonant becomes voiced. This argument suggests that when we find assimilation, we can usually find an explanation based on what we know about how the relevant sounds are produced.

An important question arises at this point, which concerns the role of the phoneme in assimilation processes. Much of the earlier writing on assimilation has suggested that assimilatory changes generally involve a change from one phoneme to another; for example, the example ‘I have to’ is expressed as showing a change from /v/ to /f/; ‘that girl’ is supposed to show final /t/ changing to /k/ in /DQk gÎùl/. Does this mean that all assimilations involve phonemic change of this sort? The answer must be ‘no’ – we can observe many cases in which there is a clear assimilation that does not involve phonemic change. An easy process to observe is the position of the lips. In a vowel such as English /iù/ (as in ‘see’), the lips are spread, as for a smile. In a vowel such as English /ù/ (as in ‘saw’), the lips are rounded and pushed forward. This spreading and rounding of the lips is quite a slow process, and it often happens that preceding and following sounds are also affected by it, even when they belong to a different word. Thus, the /s/ at the end of ‘this’ will tend to have spread lips in the phrase ‘this evening’ (where is precedes /iù/) and rounded lips in the phrase ‘this autumn’ (where it precedes /ù/). The effect is even more noticeable within a word: for example, the two /s/ sounds in ‘see-saw’, which precede /iù/ and /ù/ respectively, usually have very different lip-shapes. You can easily observe this effect in a mirror. The difference between rounded and non-rounded /s/ is not phonemic in English.

Can we always find an articulatory explanation for assimilation? These explanations seem to assume that we are basically lazy, and do as little work as possible – this is sometimes called the “principle of least effort”, and it does seem to explain a lot of human activity (or lack of it) in a very simple way. A good example is nasalization, particularly of vowels, and to understand this process we need to look at the activity of the soft palate or velum. When we produce a nasal consonant such as [m] or [n], the soft palate must be lowered to allow air to escape through the nasal cavity; however, for most vowels the velum is raised, preventing the escape of air by this route. In the English sentence “I know” /aI n«U/ we would expect that if each segment were produced independently of its neighbours the soft palate would first rise for /aI/, then be lowered for /n/, then raised again for /È«U/. But speech research has shown that the soft palate moves slowly and begins to make its movement some time before the completion of that movement is needed – in other words, we can see anticipation in its activity. As a result, the diphthong preceding [n] will be nasalized. We can see a more extreme example in a word like ‘morning’ /mùnIN/ where all the vowels are next to nasal consonants, and the soft palate is often left in the lowered position for the whole word, producing nasalisation of each of the vowels. In some languages, the difference between nasalized and non-nasalized vowels is phonemic, but this is not the case in English.

We have seen, then, that the picture of assimilation as a process which causes phonemic change is not adequate. The next point to make is that the simple idea of one sound influencing one neighbour is also unsatisfactory. Let us begin with an example where there is a regular process of a sound being changed only when it is both preceded and followed by an appropriate neighbour. In Tokyo Japanese, the vowels /i/ and /u/ regularly change into voiceless segments if they occur between voiceless consonants. Thus in the word ‘futon’ (the word for a type of bed), the /u/ vowel of the first syllable becomes a voiceless vowel, or simply a short burst of fricative noise, since the /u/ is preceded by the voiceless consonant /f/ and followed by the voiceless consonant /t/.

The device mentioned earlier that produces “mechanical speech” would contain all the words of English, each having been recorded in isolation. A significant difference in natural connected speech is the way that sounds belonging to one word can cause changes in sounds belonging to neighbouring words. Assuming that we know how the phonemes of a particular word would be realized when the word was pronounced in isolation, when we find a phoneme realized differently as a result of being near some other phoneme belonging to a neighbouring word we call this an instance of assimilation. Assimilation is something which varies extent according to speaking rate and style; it is more likely in slow, careful speech. Sometimes the difference caused by assimilation is very noticeable, and sometimes it is very slight. Generally speaking, the cases that have most often been described are all assimilations affecting consonants.

As an example, consider a case where two words are combined, the first of which starts with a construct a diagram like this:

- - - - Cf Ci - - - -

word

boundary

if Cf changes to become like Ci in some way, the assimilation is called regressive. (the phoneme that comes after it); if Ci changes to become like Cf in some ways can a consonant change? We have seen that the main differences between consonants are of three types:

1. Differences in place of articulation

2. Differences in manner of articulation

3. Differences in voicing.

In parallel with this, we can identify assimilation of place is most clearly observable in some cases where a final consonant (Cf) with alveolar place of articulation is followed by an initial consonant (Ci) with a place of articulation that is not alveolar. For example, the final consonant in ‘that’ ðæt is alveolar t. in rapid, case speech the t will become p before a bilabial conconant, as in: ‘that person ðæp pз:sņ; ‘ light blue’ laip blu:; ‘meat pie’ mi:p paґ. Before a dental consonant, t will change to a dental plosive, for which the symbol is ţ, as in ‘that thing’ ðæţ θґn ; get those’ geţ ðәUz; ‘cut through’ k ţ θru:. Before a velar consonant, the t will become k, quite good’ kwaik g d. In similar contexts d would become b, d and g, respectively, and n would become m, ņ and η. However the same is not true of the other alveolar consonant : s and z behave differently, the only noticeable change being that s becomes or j, as in : ‘this shoe’ ðә з jiәz. It is important to note that the consonants that have undergone assimilation have not disappeared; in the above example, the duration of the consonants remains more or less what one would expect for a two-consonant cluster. Assimilation of place is only noticeable in this regressive assimilation of alveolar consonants; it is not something that foreign learners need to learn to do.

Assimilation of manner is much less noticeable, and is only found in the most rapid and casual speech ; generally speaking the tendency is again for regressive a assimilation and the change in manner is likely to be towards an “easier” consonant – one which makes less obstruction to the airflow. It is thus possible to find cases where a final plosive becomes a fricative or nasal (e.g. ‘that side’ ðæs said, ‘good naight’ gυn naǐt), but most unlikely that a final fricative or nasal would become a plosive.

Assimilation of voice also found, but again only in a limited way. Only regressive assimilation of voice is found across word boundaries, and then only of one type ; since this matter is important for foreign learners we will look at it in some detail. If Cf is a lenis (i.e. “voiced”) consonant and Ci is fotris (‘voiceless’) we often find that the lenis consonants has no voicing ; this is not a very noticeable case of assimilation, since, as was explained in Chapter 4, initial and final lenis consonants usually have little or no voicing any way. When Cf is fortis (“voiveless”) and Ci lenis (“voiced”), a context in which in many languages Cf would become voiced, assimilation of voice never takes place; consider the following example :

‘Ilike that black dog’ = aґ laґk ðæt blæk dơg.

It is typical dog. It is typical of many foreign learners of English to allow regressive assimilation of voicing to change the final k of ‘like’ to g, the final t of ‘that’ to d and the final k of ‘black’ to g. this creates a very strong impression of a foreign accent, and is something that should obviously be avoided.

Tidak ada komentar:

Posting Komentar