Language
Speech Recognition and Processing
The BonOBOS – Video Clip
The Bonobo Brain
From ape to human. Magnetic resonance images of a bonobo brain are warped onto the shape of a human cortex, viewed from (left to right) the side, top, and front. Red and yellow areas in the temporal region (linked to language) and in the prefrontal and occipital regions had to be stretched the most to reach the human configuration, whereas blue areas are similar in apes and humans
Note that Brodmann’s 44 is enlarged in apes & bonobos.
A Model of Speech Comprehension and Production
This lecture concerns processes on the right of this diagram
How do we process sentences?
Sometimes the context helps:
Please pass me the book on the table.
Often the context does not help:
Giant lizard-like creatures are descending
from spaceships and attacking Tableview.
A body-building Austrian nicknamed “The
Terminator” will be elected to a major political office.
The Nature of the Acoustic Signal
· Spectogram shows frequency against time, intensity shown by darkness
The Nature of the Acoustic Signal
Gaps do not occur between words but occur with certain consonants (that restrict flow of air)
Problem of segmenting continuous input (ice cream v. I scream)
Problem of inter-speaker differences; pitch affected by age and sex; different dialects, talking speeds etc.
Co-articulation = consecutive speech sounds blend into each other due to mechanical constraints on articulators
¡ E.g. “add” and “adder” (d is softer when followed by vowel)
How Do Listeners Deal with Variability in Acoustic Input?
Categorical perception: continuous changes in input are mapped on to discrete percepts
¡ E.g.Voicing “Da” and “Ta” do not sound the same
¡ E.g. Voicing “Add” and “Adder” sound the same
These may be mapped on to abstract representations that specify nature of acoustic signal (e.g. voicing, timing), phonemes, syllables
How Do Listeners Deal with Variability in Acoustic Input?
Could also be mapped on to units of articulation (i.e. understanding what other people are saying by figuring out how I could say it) – motor theory of speech perception
There is a link between the speech production and speech processing areas in the brain
Evidence - Watkins and Paus (2004) increased motor excitability during speech perception (Transcranial Magnetic Stimulation (TMS), PET)
Speech Perception in the Brain
Auditory nerve passes through medial geniculate nucleus on way to primary auditory cortex (A1) in temporal lobes, Heschl’s gyrus (gyrus = ridge, sulci = fold)
Thalamus involved in processing sensory information (lesions can lead to thalamic aphasia)
Tendency for different ears to project to opposite cortex
(also some subcortical routes in hearing as in vision)
Primary auditory cortex is tonotopically organised (i.e. organised according to the frequency of sound to which they respond best)
Neuroanatomy
Hierarchical brain systems for
Word recognition:
Word recognition:
First, the stream of auditory information proceeds from auditory cortex in Heschl’s gyri to the superior temporal gyrus (STG). Here, no distinction is made between speech and non-speech sounds.
Distinction is made between speech and non-speech sounds in the adjacent superior temporal sulcus (STS), but no lexical-semantic information is processed in this area.
Hierarchical brain systems for
Word recognition:
Word recognition:
From the STS, the information proceeds to the middle and inferior temporal gyri, where phonological and lexical-semantic aspect of word is processed.
The next stage involves analysis in the angular gyrus (involved in naming).
Broca area may be important for processing syntactic information.
Another area for syntactic processing is area 22 in STG.
Temporal LOBE
Superior Temporal Sulcus (orange)
¡ divides the superior temporal gyrus (peach) from middle temporal gyrus (lime)
Inferior Temporal Sulcus (blue)
¡ not usually very continuous
¡ divides middle temporal gyrus from inferior temporal gyrus (lavender)
Naming of objects not arbitrary
A remote tribe calls one of these shapes Booba
and the other Kiki. Decide which is which.
Connecting visual stimuli to language
Abnormalities in FG involved with Synesthesia
(e.g. seeing music), also out-of-body experiences
Speech Perception in the Brain
Primary auditory cortex responds equally to speech and other sounds in both left and right hemispheres – lesions result in a loss of awareness of sound, but patients can still react reflexively to sound
Areas more anterior to this in left hemisphere respond more to intelligible speech relative to unintelligible speech of similar acoustic complexity (Scott et al., 2000)
Left hemisphere damage can result in a type of auditory agnosia (pure word deafness) in which environmental sounds and music are identified, but not speech – speech appears "too fast" or is “distorted”
Disconnection theory proposes that inputs from both Heschl gyri are cut off from input into the left hemisphere Wernicke’s area where sounds are decoded into language.
Dual Routes for Speech Perception: "What" and "How"
· As with vision, it has been suggested that speech perception has 2 functionally distinct pathways
· Phonological buffer refers to holding phonologically coded information but capable of maintaining the information only for brief periods (think of STM)
Dual Routes for Speech Perception: "What" and "How"
"What" route…
¡ Ventral route along temporal lobe
¡ Recognizes speech acoustically
¡ Important for speech comprehension (i.e.
makes contact with semantic knowledge)
green = dorsal (back)
purple = ventral (bottom)
Dual Routes for Speech Perception: "What" and "How"
"How" route…
¡ Dorsal route involving parieto-frontal circuit
¡ Recognizes speech motorically (i.e. motor theory of speech perception)
¡ Used to say and learn unfamiliar words
¡ Part of Wernicke’s area responds to silent articulation (by speaker) and also viewing lip movements in others (Wise et al.)
¡ Evidence for phonological STM in angular gyrus (Paulesu et al., 1993), which may be refreshed by frontal rehearsal mechanisms (e.g. phonological loop component of working memory)
Dual Routes for Speech Perception: "What“ (ventral) and "How“ (Dorsal)
Deficits in repeating and learning new phonology linked to phonological STM impairments
¡ = deficit in "how" route, intact "what"?
Patients with deep dysphasia cannot repeat non-words and make semantic errors in repetition (e.g. hear "cat", say "dog")
¡ = deficit in "how" route, rely on impoverished "what"?
Language
Speech Processing
Recognizing Spoken Words: The Cohort Model
Recognizing Spoken Words: The Cohort Model
All candidates considered in parallel
Candidates eliminated as more evidence becomes available in the speech input
Uniqueness point occurs when only one candidate remains
Reaction time priming shows that uncommon words activated less (speed > species)
However, semantic context does not alter the pattern
Suggests semantics occurs late (i.e. after spoken word recognition)
Recognizing Spoken Words: The Cohort Model
· Evidence for a late influence of semantics comes from N400 in ERPs (Event Related Potentials in Electrophysiological studies)
Putting Words into Sentences: Role of Syntax and Semantics
Parsing = putting words into sentences
A & B have different meaning but same syntax
A & C have same meaning but different syntax
A = The boy hit the girl
B = The girl hit the boy
C = The girl was hit by the boy
Putting Words into Sentences: Role of Syntax and Semantics
When parsing a sentence are all possible sentence constructions considered in parallel or is just one syntactic structure considered? Read this…
¡ "The fireman told the man that he had risked his life for to install a smoke detector"
Did it make sense? The fact that it probably didn’t implies that not all syntactic frames were considered (garden-path sentences) The early part of the sentence biases a certain syntactic interpretation that turns out to be incorrect.
However, semantics can bias syntax. E.g. if preceded by “The fireman braved a dangerous fire in a hotel. He rescued one of the men at great danger to himself”
Category Specificity in other domains of knowledge
A Historical Preamble
· Previously believed that Broca’s aphasia (and Broca’s area) was related to speech production and Wernicke’s area (and Wernicke’s aphasia) related to speech comprehension
Broca’s and Wernicke’s area
An Example of Broca’s Aphasia
· “cookie jar… fall over… chair… water… empty…”
Against the 19th-Century Model
Broca’s aphasic patients also have some problems in comprehension
A deficit in "motor images" doesn’t explain the main symptom (agrammatism)
Many patients who meet criteria for Broca’s aphasia have damage in temporal lobes not Broca’s area (Dronkers)
Wernicke’s aphasic patients also have problems in speech production (e.g. neologisms)
Wernicke’s area involved in a variety of functions including linking acoustic information with visual and motoric information
So What does Broca’s Area do? Compute Syntax?
· In 1970s the view shifted from a deficit in speech production to a deficit in syntax (in both
comprehension and production)
So What does Broca’s Area do? Challenges to the Syntactic Theory
In 1970s the view shifted from a deficit in speech production to a deficit in syntax (in
both comprehension and production)
This has also fallen out of favour because…
¡ Broca’s aphasia not necessarily a reliable or unitary disorder – can be agrammatic in some respects but
not other
¡ Some Broca’s aphasics are agrammatic in production but not comprehension
¡ Sentence-processing deficits could be explained by use of Broca’s area in working memory rather than syntax
So What does Broca’s Area do? The Current Consensus
UNLIKELY to be a central syntactic device (although some people still believe)
UNLIKELY to store actual motor programs for speech (Broca’s original idea) but may be involved in higher-level planning of speech
LIKELY to be involved in linking action perception to action production (region 44), this includes speech as well as other actions (mirror neurons)
LIKELY to be involved in verbal working memory
(region 45) including establishing semantic and
thematic coherence
Language
Speech Production and Retrieval
Retrieving Spoken Words
After syntactic and semantic aspects of utterance are put in place, speaker must select the individual words comprising the utterance
Lexicalization = selecting a single word based on the meaning one wishes to convey (constrained by pragmatics)
Grammatical properties of words must be specified (noun, verb, etc.)
Word form (phonemes, syllables, etc.) must be retrieved
Evidence that these occur in different stages
Retrieving Spoken Words: What Matters?
Studies of Speech Errors
Provide evidence of separate linguistic units (words, morphemes, phonemes) because like tends to substitute for like
Semantic errors: say "dog", intend "cat" (evidence of competition at semantic level)
Freudian slip: say "weapons of mass distraction", intend "destruction"? (Freud believed these errors revealed hidden intentions of the speaker)
More Studies of Speech Errors
Lexical transpositions: "guess whose mind came to name" (noun for noun) or "I randomed some samply" (stranded morphemes)
Malapropisms: say "hysterical" instead of "historical" (evidence of competition at word-form level)
Spoonerisms: say "hissed mystery lectures", intend "missed history lectures" (phonemes in same position swapped)
Tip-of-tongue Phenomenon
Speaker knows, conceptually, intended word but is unable to produce spoken output
Accompanied by "feeling of knowing" and very frustrating
More common in low frequency words (and in older people)
Speakers often know first letter, length of word, etc.
Italian speakers often know word gender
Suggests that spoken word retrieval occurs in chunks rather than "all or nothing"
Levelt’s Discrete Stages Model
Levelt’s Discrete Stages Model: Evidence in Favour
Explains TOT as retrieval of lemma with partial or no retrieval of lexeme
Explains patients with anomia (pathological word finding problems), which can arise at two stages – choosing a concept (accompanied by deficits in semantic memory) versus choosing a word (intact semantic memory)
Discrete stages because…
¡ "sheep" primes "goat" (competition at semantic level)
¡ "goat" primes "goal" (competition at phonological level)
¡ "sheep" does NOT prime "goal" (suggests 2 stages of
competition do not interact)
Levelt’s Discrete Stages Model: Evidence Against
High proportion of mixed errors (with both semantic
and phonological characteristics) suggests interaction between levels? E.g.
"rat" for cat, "oyster" for lobster
Caramazza and Miozzo: in TOT possible to retrieve gender without first letter (lemma access without lexeme access) but also
possible to retrieve first letter without
gender (lexeme access without
lemma access). The latter is not allowed in
Levelt’s model
Dell’s Interactive Stages Model
· Similar stages to Levelt's but interactivity explains
high proportion of mixed errors
Articulating an Utterance
Patients with articulation problems have lesions in insula and basal ganglia but not Broca’s area (Dronkers, 1996)
This is called apraxia for speech (can sound like a foreign accent sometimes)
fMRI of articulation relative to speech perception also activates insula and frontal-motor regions but not Broca’s area (Wise et al., 1999)
Others (e.g. Indefrey & Levelt) suggest that Broca’s area is, however, involved in overt and covert planning of speech production even if motor commands do not reside there – evidence for mirror neurons here could be consistent with this
No comments:
Post a Comment