

Grammar


Tenses


Present

Present Simple

Present Continuous

Present Perfect

Present Perfect Continuous


Past

Past Simple

Past Continuous

Past Perfect

Past Perfect Continuous


Future

Future Simple

Future Continuous

Future Perfect

Future Perfect Continuous


Parts Of Speech


Nouns

Countable and uncountable nouns

Verbal nouns

Singular and Plural nouns

Proper nouns

Nouns gender

Nouns definition

Concrete nouns

Abstract nouns

Common nouns

Collective nouns

Definition Of Nouns

Animate and Inanimate nouns

Nouns


Verbs

Stative and dynamic verbs

Finite and nonfinite verbs

To be verbs

Transitive and intransitive verbs

Auxiliary verbs

Modal verbs

Regular and irregular verbs

Action verbs

Verbs


Adverbs

Relative adverbs

Interrogative adverbs

Adverbs of time

Adverbs of place

Adverbs of reason

Adverbs of quantity

Adverbs of manner

Adverbs of frequency

Adverbs of affirmation

Adverbs


Adjectives

Quantitative adjective

Proper adjective

Possessive adjective

Numeral adjective

Interrogative adjective

Distributive adjective

Descriptive adjective

Demonstrative adjective


Pronouns

Subject pronoun

Relative pronoun

Reflexive pronoun

Reciprocal pronoun

Possessive pronoun

Personal pronoun

Interrogative pronoun

Indefinite pronoun

Emphatic pronoun

Distributive pronoun

Demonstrative pronoun

Pronouns


Pre Position


Preposition by function

Time preposition

Reason preposition

Possession preposition

Place preposition

Phrases preposition

Origin preposition

Measure preposition

Direction preposition

Contrast preposition

Agent preposition


Preposition by construction

Simple preposition

Phrase preposition

Double preposition

Compound preposition

prepositions


Conjunctions

Subordinating conjunction

Correlative conjunction

Coordinating conjunction

Conjunctive adverbs

conjunctions


Interjections

Express calling interjection

Phrases

Sentences


Grammar Rules

Passive and Active

Preference

Requests and offers

wishes

Be used to

Some and any

Could have done

Describing people

Giving advices

Possession

Comparative and superlative

Giving Reason

Making Suggestions

Apologizing

Forming questions

Since and for

Directions

Obligation

Adverbials

invitation

Articles

Imaginary condition

Zero conditional

First conditional

Second conditional

Third conditional

Reported speech

Demonstratives

Determiners


Linguistics

Phonetics

Phonology

Linguistics fields

Syntax

Morphology

Semantics

pragmatics

History

Writing

Grammar

Phonetics and Phonology

Semiotics


Reading Comprehension

Elementary

Intermediate

Advanced


Teaching Methods

Teaching Strategies

Assessment
Basic issues in speech perception
المؤلف:
Paul Warren
المصدر:
Introducing Psycholinguistics
الجزء والصفحة:
P105
2025-11-04
302
Basic issues in speech perception
As already mentioned, human auditory perception is especially well tuned to speech sounds. Our hearing is most sensitive to sounds in the frequency range in which most speech sounds are found, i.e. between 600 Hz and 4000 Hz.
It is also the case that the human perceptual system streams language and non-language signals, i.e. treats them as separate inputs, thereby reducing the distracting effect of non-speech signals on speech perception. This has been shown through phoneme restoration effects Samuel, 1990. When listeners hear words in which a speech sound phoneme has been replaced by a non-speech sound such as a cough, they are highly likely to report the word as intact, i.e. the cough is treated as part of a separate stream. There is a clear linguistic influence here, as the restoration effect is stronger with real words than with nonsense words. It has also been shown that when the word-level information is ambiguous, then the word that is restored is one which matches the sentence context. For example, the sequence /# il/, where # indicates a non-speech sound replacing or overlaid on a consonant, could represent many possible words (deal, feel, heal, etc.). In the different conditions shown in 7.2, a word will be reported that is appropriate to the context shown by the final word in the utterance (originally demonstrated by Warren & Warren, 1970). So if that word is orange, then peel is reported, if it is table, then meal, and so on. Linguistic effects on perception are powerful, and will be dis cussed in more detail later in this chapter.
The language-specific nature of streaming effects is indicated by anecdotal evidence from students who are asked to listen to recordings from a language with a very different sound inventory from their own, and who experience some of the sounds as non-speech sounds external to the speech stream. A good example of this is when English-speaking students first listen to recordings of a click language, with many students reporting the click consonants as a tapping or knocking sound happening separately from the speech.
Despite the evidence that listeners segregate speech and non-speech signals, it is also clear that the perceptual system will integrate these if at all plausible. That is, if a non-speech sound could be part of the simultaneous speech signal, we will generally perceive it as such. For example, if the final portion of the /s/ sound in the word slit is replaced by silence, then this silence is interpreted as a /p/ sound, resulting in the word s l being heard see the exercises at the end of this chapter. A stretch of silence is one of several cues to a voiceless plosive such as /p/ the silence results from the closure of the lips with no simultaneous voicing noise, and is sufficient in this context to result in the percept of a speech sound.
Frequently there are multiple cues to a speech sound, or to the distinction between that speech sound and a very similar one. The voiceless bilabial plosive /p/ sound is cued not just by the silence during the lip closure portion of that consonant, but also by changes that take place in the formant structure of any preceding vowel as the lips come together to make the closure, by the duration of a preceding vowel voiceless stops in English tend to be preceded by shorter variants of a vowel than voiced stops such as /b/ , by several properties of the burst noise as the lips are opened, and so on. While some of these cues may be more important or more reliable than others, it is clear that the perception of an individual sound depends on cue integration, involving a range of cues that distinguish this sound from others in the sound inventory of the language.
A fascinating instance of cue integration comes from studies of speech perception that involve visual cues. We are often able to see the people we are listening to, and their faces tell us much about what they are saying. A particular set of cues comes from the shape and movements of the mouth. For a bilabial plosive /b/ or /p/ there will be a visible lip closure gesture; for an alveolar plosive /d/ or /t/ it might be possible to see the tongue making a closure at the front of the mouth, just behind the top teeth; for a velar plosive /g / or /k/ the closure towards the back of the mouth will be visually less evident. Normally, these visual cues will be compatible with the auditory cues from the speech signal, and therefore will supplement them. If however, the visual cues and the auditory cues have been experimentally manipulated so that they are no longer compatible, then they can merge on a percept that is different from that signalled by either set of cues on their own. This is known as the McGurk effect, after one of the early researchers to identify the phenomenon (McGurk & Maconald, 1976). For instance, if the auditory information indicates a /ba/ syllable, but the visual information is from a / a/ syllable, showing no lip closure, then the interpretation is that the speaker has said /da/. Examples of this effect are available on the website for this book and see also the exercises at the end of this chapter.
Another and at first glance somewhat bizarre cue integration effect has been reported in what have been referred to as the puff of air experiments. In these experiments participants listen to stimulus syllables that are ambiguous between, say, /ba/ and /pa/. For speakers of English and many other languages, one of several characteristics that distinguish the /b/ and /p/ sounds in these syllables is that there is a stronger puff of air that accompanies the /p/ than is found with the /b/. In phonetics terminology, the /p/ is aspirated and the /b/ is unaspirated. In the experiments, it was found that participants were more likely to report the ambiguous stimulus as /pa/ if they also felt a puff of air that was presented simultaneously with the speech signal. The effect was found whether the puff of air was directed at the hand or at the neck (Gick & Derrick, 2009), or even at the ankle (Derrick & Gick, 2010).
It has also been shown that cue trading is involved in speech perception. For instance, if the release burst of a /p/ is unclear, perhaps because of some non-speech sound that happened at the same time, then the listener may assign greater perceptual significance to other cues such as the relative duration of the preceding vowel and movements in the formants at the end of that vowel.
These cues in the formant movements are a result of coarticulation – the articulation of one sound is influenced by the articulation of a neighbouring sound. It appears that our perceptual system is so used to the phenomenon of coarticulation that it will compensate for it in the perception of sounds. For instance, Elman and McClelland 1988 asked participants to identify a word as capes or tapes. Their experiment hinged on the fact that a /k/ is pronounced further forward in the mouth, so closer to a /t/, when it follows /s/ (as in Christmas capes) than when it follows /ʃ/ (as in foolish capes). This is because /s/ is itself further forward in the mouth than /ʃ/ and there is coarticulation of the following /k/ towards the place of articulation of the /s/. Elman and McClelland manipulated the first speech sound in capes or tapes to make it sound more /k/-like or more /t/-like. One of the cues to the difference between the /k/ and /t/ sounds is the height in the frequency scale of the burst of noise that is emitted when the plosive is released – it is higher for front sounds like /t/ than it is for back sounds like /k/. In the experiment, the noise burst of the initial consonant in tapes or capes was manipulated to produce a range of values that were intermediate between the target values for /t/ and /k/. Participants heard tokens from this range of tapes/capes stimuli after either Christmas or foolish, and had to report whether they heard the word as tapes or capes. The results were very clear – after the word Christmas, tokens on the /t/--/k/ continuum were more likely to be heard as /k/ than when the same tokens followed foolish see Figure 7.4. That is, the participants expected the coarticulation effect to lead to a fronted’ /k/ after /s/, and compensated for this in their interpretation of the frequency level of the burst noise.
Our perception and comprehension of speech is also affected by signal continuity. That is, listeners are better able to follow a stream of speech if it sounds like it comes in a continuous fashion from one source. This lies behind the cocktail party effect, where we are able to follow one speaker in a crowded room full of conversation despite other talk around us Arons, 1992. This effect can be demonstrated in various ways. In one task, participants hear two voices over stereo headphones, and are asked to focus on what is being said on just one of the headphone channels, the left channel for example. If the voice on the left channel switches to the right part way through the recording, then participants find that their attention follows the voice to the right channel. They then report at least some of what is then said on the right channel, despite the instructions to focus on the left channel (Treisman, 1960). The strength of this effect is reduced if the utterance prosody is disrupted at the switch. The importance of signal continuity is also demonstrated in the relative unnatural ness of some computer-generated or concatenated speech, such as is found for instance in the automated speech of some phone-in banking systems.
Active and passive speech perception
There have been numerous attempts to frame aspects of speech perception in models or theories (some of these are reviewed by Klatt, 1989). One distinction that has been made between different models concerns the degree of involvement of the listener as speaker, characterised as a difference between active and passive perception processes.
Passive models of speech perception assume that we have a stored system of patterns or recognition units, against which we match the speech that we hear. Depending on the specific claims of the model, these stored patterns might be phonetic features or perhaps templates for phonemes or diphone sequences, and so on. A phoneme-based perception model might for example include a template for the /i/ phoneme that shares some of the common characteristics of the spectrogram slices shown in Figure 7.3. A feature-based model might include a voicing detector that examines the input for the presence of the regular repetition of speech waves that corresponds to vocal cord vibration, and would have similar detectors for other features that define a speech sound.
Incoming speech data is matched against the templates, and a score given for how well the data matches the templates. These scores are evaluated and a best match determined. Many automatic speech recognition systems operate like this – they have templates for each recognition unit and match slices of the input speech data against these templates. Such systems perform best when they have had some training, usually requiring the user to repeat some standard phrases so that the speech processing system can develop appropriate templates.
Active models of speech perception argue that our perception is influenced by or depends on our capabilities as producers of speech. One model of this type involves analysis-by-synthesis. Here, the listener matches the incoming speech data not against a stored template for input units of speech, but against the patterns that would result from the listener’s own speech production, i.e. synthesises an output or a series of alternative outputs and matches that against the analysis of the input.
الاكثر قراءة في Linguistics fields
اخر الاخبار
اخبار العتبة العباسية المقدسة
الآخبار الصحية

قسم الشؤون الفكرية يصدر كتاباً يوثق تاريخ السدانة في العتبة العباسية المقدسة
"المهمة".. إصدار قصصي يوثّق القصص الفائزة في مسابقة فتوى الدفاع المقدسة للقصة القصيرة
(نوافذ).. إصدار أدبي يوثق القصص الفائزة في مسابقة الإمام العسكري (عليه السلام)