RESPIRATION

s/z ratio: s/z ratio is a standard test of vocal function, which is computed by dividing the time of the longest duration for the unvoiced sound /s/ by the longest duration for the voiced sound /z/. The voiceless fricative is produced by constricting airflow through a narrow passage without vocal fold vibration, whereas /z/ is produced with vocal fold vibration. The s/z ratio differentiates between respiratory inefficiency – both sounds would be short – and laryngeal/vocal fold dysfunction – /s/ would be considerably longer than /z/. Typically, the ratio would be close to 1, meaning both sounds can be sustained for similar lengths of time. A ratio considerably greater than 1 may indicate laryngeal dysfunction or airflow issues (Boone et al., 2010). The ratio and the maximum durations it is based on can be highly variable (Kent et al., 1987).

PHONATION

Maximum Sound Prolongation (MSP): This is a measure of how long a vowel (usually /a/) can be sustained in seconds (s) (Kent et al., 1987; Wit et al., 1993). The vowel /a/ is produced with open vocal folds, creating a continuous flow of air through the vocal tract. The measure is used to assess phonatory efficiency and respiratory support. Sometimes MSP is also referred to as maximum phonation time or maximum phonation duration (Kent et al., 1987). MSP varies with age; variability in typical children’s performance can be large (Kent et al., 1987).
Fundamental frequency (F0): F0 is determined by the physical properties of the vocal folds during vibration. The rate of these vibrations (measured in cycles per second) sets the F0, which corresponds to the perceptual cue of pitch – that is, how high or low a voice sounds. F0 and its range are key measures in voice and speech assessments to evaluate vocal function and speech characteristics. For instance, increased laryngeal tension makes the vocal folds vibrate faster, raising the F0.

Perturbation measures

Shimmer, jitter, Harmonics-to-Noise Ratio (HNR) and (Smoothed) Cepstral Peak Prominence (CPPS) are perturbation measures. Perturbation refers to small variations in the fundamental frequency (F0) and amplitude of the voice signal across successive glottal cycles. It reflects irregularities in vocal fold vibration and is therefore commonly used in the assessment of speech and voice disorders. 

Shimmer: Shimmer measures the cycle-to-cycle variation in the amplitude of a vocal signal. It is a measure indicative of voice stability, influenced by physiological factors such as vocal fold tension, subglottal pressure fluctuations, airflow irregularities. The higher the shimmer values, the higher the perturbation, which may indicate voice quality issues. Literature indicates values up to 8% fall within those observed in typically developing 4-12 year old children (Tavares et al., 2010). 
Jitter: Jitter measures the cycle-to-cycle pitch variation in a vocal signal (Tavares et al., 2010). Like shimmer, it is a measure indicative of voice stability, influenced by physiological factors such as vocal fold tension, sub glottal pressure fluctuations, airflow irregularities. The higher the jitter values, the higher the perturbation and may indicate voice quality issues. Literature indicates values up to 2% fall within those observed in typically developing 4-12 year old children (Tavares et al., 2010). 
Harmonics-to-Noise Ratio (HNR): Another measure of voice quality that compares the level of additive noise in a speech signal between the periodic and aperiodic components, therefore reflecting the degree of hoarseness or breathiness in a voice. The lower the HNR values, the more likely voice quality issues may be present. HNR should continue to rise in typically developing children before reaching adult levels. HNR of 0.11-0.18 have been observed in typically developing 4-12 year olds (Tavares et al., 2010). 
(Smoothed) Cepstral Peak Prominence (CPPS): Considered a robust indicator of voice disorders as it measures the amount of extra noise in the vocal signal. The smoothed version (CPPS) improves accuracy by reducing small fluctuations in the measurement. Unlike jitter, shimmer, or HNR, CPPS is less affected by pitch variability, making it more reliable for clinical voice assessment. A higher CPP value indicates more periodic (i.e. harmonic) speech; values between 2.85-17.71dB have been reported in 4-18 year olds (Kent et al., 2021).

RESONANCE 

Formant frequency difference between amplitudes of first formant and extra peak (A1–P1): This is an acoustic measure that is primarily used in speech analysis research to quantify nasalisation. Specifically, A1 refers to the amplitude of the first formant (F1) of the vowel; P1 refers to an additional spectral peak near 1kHz (Chen, 1997). The difference between these two values has been suggested to be an acoustic correlate of vowel nasalisation (Chen, 1997). A lower A1–P1 value can indicate increased nasalisation as A1 decreases and P1 increases during vowel nasalisation.

ARTICULATION

Maximum Repetition Rate (MRR): MRR is also referred to as Diadochokinetic (DDK) rate (Diepeveen et al., 2019), and measures how fast syllables can be repeated (Rvachew, 2005; Thoonen et al., 1996). This can include repetition of non-word monosyllables (e.g. /papapa/), bisyllabic sequences (e.g. /pata/, /taka/) and tri-syllabic sequences (e.g. /pataka/). MRR can provide information on the speed and accuracy of articulators, i.e. the lips, tongue and jaw when producing speech (Wit et al., 1993).
Speech Rate: Speech rate measures the number of words spoken in a minute including pause intervals, hesitations and fillers. Whilst speech rate varies depending on context, language and speaker intent, it plays a critical role in diagnosing and assessing motor speech disorders such as dysarthria.
Articulation Rate: Similar to speech rate, articulation rate is a relevant measure to assess motor speech disorders. It is considered a global index of speech production ability, as it is influenced by coordination and timing at all subsystem levels.

References