Bibliographie

send mail DeutschEnglish
Peer-reviewed articles    |    Grants, scholarships and awards    |    Books and book chapters    |    Posters (presented at conferences)    |    Other publications    |    Conference talks and lectures    |    Media appearances (TV & radio)
Peer-reviewed articles (total impact factor: 347.595)

highlighted
A71. Christian T. Herbst, Angela S. Stoeger, Roland Frey, Jörg Lohscheller, Ingo R. Titze, Michaela Gumpenberger, W. Tecumseh Fitch (2012). How Low Can You Go? Physical Production Mechanism of Elephant Infrasonic Vocalizations. Science, 337 (6094), 595 - 599 - show abstract
Elephants can communicate using sounds below the range of human hearing ("infrasounds" below 20 hertz). It is commonly speculated that these vocalizations are produced in the larynx, either by neurally controlled muscle twitching (as in cat purring) or by flow-induced self-sustained vibrations of the vocal folds (as in human speech and song). We used direct high-speed video observations of an excised elephant larynx to demonstrate flow-induced self-sustained vocal fold vibration in the absence of any neural signals, thus excluding the need for any "purring" mechanism. The observed physical principles of voice production apply to a wide variety of mammals, extending across a remarkably large range of fundamental frequencies and body sizes, spanning more than five orders of magnitude.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A70. Andrea Ravignani, Christian T. Herbst (2023). Voices in the Ocean. Science, 379 (6635), 881-882 - show abstract
The ability of humans to sing and speak requires precise neural control of the larynx and other organs to produce sounds. This neural control is limited in most mammals (1). For animals that create complex sounds, less is known about how peripheral anatomical structures enable vocal feats (2). On page 928 of this issue, Madsen et al. (3) demonstrate that toothed whales, such as dolphins and killer whales, have a distinct nasal structure that produces diverse sounds in a broad frequency range that spans >4 orders of magnitude.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A69. Takeshi Nishimura, Isao T. Tokuda, Shigehiro Miyachi, Jacob C. Dunn, Christian T. Herbst, Kazuyoshi Ishimura, Akihisa Kaneko, Yuki Kinoshita, Hiroki Koda, Jaap P. P. Saers, Hirohiko Imai, Tetsuya Matsuda, Ole Naesbye Larsen, Uwe Jürgens, Hideki Hirabayashi, Shozo Kojima, W. Tecumseh Fitch (2022). Evolutionary loss of complexity in human vocal anatomy as an adaptation for speech. Science, 377 (6607), 760--763 - show abstract
Human speech production obeys the same acoustic principles as vocal production in other animals but has distinctive features: A stable vocal source is filtered by rapidly changing formant frequenci...
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A68. Christian T. Herbst, Tamara Prigge, Maxime Garcia, Vit Hampala, Riccardo Hofer, Gerald E. Weissengruber, Jan G. Svec, W. Tecumseh Fitch (2023). Domestic cat larynges can produce purring frequencies without neural input. Current Biology, early online - show abstract
Most mammals and birds produce vocal sounds according to the myo-elastic aero-dynamic (MEAD) principle, through the self-sustaining oscillation of laryngeal or syringeal tissue [#VandenBerg1958, #Titze1980a]. In contrast, cats have long been believed to produce their low-frequency purr vocalizations through a radically different mechanism involving active high-speed muscle contractions (AMC), where neurally driven EMG burst patterns (typically at 20 -- 30 Hz for cat purrs) cause the intrinsic laryngeal muscles to actively modulate the respiratory airflow. Direct empirical evidence for this AMC mechanism is sparse [#Remmers1972].

Here, the fundamental frequency (f_{o}) ranges of eight domestic cats (Felis silvestris catus) were investigated in an excised larynx setup using computer-controlled and manual pressure sweeps, to test the prediction of the AMC hypothesis that vibration should be impossible without neuromuscular activity, and thus unreachable in an excised larynx setup based on MEAD principles.

Surprisingly, all eight larynges produced self-sustained oscillations at the typical rates of cat purring. In six of the eight specimens gradual f_{o} variation in the ranges of about 15 to 200 Hz occurred, thus creating an f_{o} continuum between purrs and other stereotypical call types. Histological analysis of the investigated larynges revealed the presence of connective tissue masses, up to 4 mm in diameter, embedded in the vocal fold [#Szedenik2008]. This vocal fold specialization appears to be responsible for achieving the low f_{o} values observed, as validated in a computer simulation.

While our data do not outright reject the AMC mechanism for purring, they show that cat larynges can easily produce sounds in the purr regime with fundamental frequencies of 25 to 30 Hz without neural input or muscular contraction. This indicates that the physical and physiological basis of cat purring may instead rely on the same MEAD-based mechanisms as other cat vocalizations (e.g. meows) and most other vertebrate vocalizations.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A67. Christian T. Herbst, Stellan Hertegard, Daniel Zangger-Borch, Per-Ake Lindestad (2017). Freddie Mercury -- Acoustic Analysis of Speaking Fundamental Frequency, Vibrato and Subharmonics. Logopedics Phoniatrics Vocology, 42 (1), 29-38 - show abstract
Freddie Mercury was one of the 20th Century's best known singers of commercial contemporary music. This study presents an acoustical analysis of his voice production and singing style, based on perceptual and quantitative analysis of publicly available sound recordings. Analysis of six interviews revealed a median speaking fundamental frequency of 117.3 Hz, which is typically found for a baritone voice. Analysis of voice tracks isolated from full band recordings suggested that the singing voice range was 37 semitones, within the pitch range of F#2 (about 92.2 Hz) to G5 (about 784 Hz). Evidence for higher phonations up to a fundamental frequency of 1347 Hz was not deemed reliable. Analysis of 240 sustained notes from 21 a-cappella recordings revealed a surprisingly high mean fundamental frequency modulation rate (vibrato) of 7.0 Hz, reaching the range of vocal tremor. Quantitative analysis utilizing a newly introduced parameter to assess the regularity of vocal vibrato corroborated its perceptually irregular nature, suggesting that vibrato (ir)regularity is a distinctive feature of the singing voice. Imitation of subharmonic phonation samples by a professional rock singer, documented by endoscopic high-speed video at 4132 frames per second, revealed a 3:1 frequency locked vibratory pattern of vocal folds and ventricular folds.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A66. Christian T. Herbst, Hanspeter Herzel, Jan G. Svec, Megan Wyman, W. T. S. Fitch (2013). Visualization of system dynamics using phasegrams. Journal of the Royal Society Interface, 10 (85), 1-14 - show abstract
A new tool for visualization and analysis of system dynamics is introduced: the phasegram. Its application is illustrated with both classical nonlinear systems (logistic map and Lorenz system) and with biological voice signals. Phasegrams combine the advantages of sliding-window analysis (such as the spectrogram) with well-established visualization techniques from the domain of nonlinear dynamics. In a phasegram, time is mapped onto the x-axis, and various vibratory regimes, such as periodic oscillation, subharmonics or chaos, are identified within the generated graph by the number and stability of horizontal lines. A phasegram can be interpreted as a bifurcation diagram in time. In contrast to other analysis techniques, it can be automatically constructed from time-series data alone: no additional system parameter needs to be known. Phasegrams show great potential for signal classification and can act as the quantitative basis for further analysis of oscillating systems in many scientific fields, such as physics ( particularly acoustics), biology or medicine.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:

others (in chronological order)
A65. Christian T. Herbst, Kate Emerich, Michaela Anna Mayr, Ansgar Rudisch, Christian Kremser, Helena Talasz, Markus Kofler (2023). Time-synchronized MRI-assessment of respiratory apparatus sub-systems - a feasibility study. Journal of Voice, early online - show abstract
The thorax (TH), the thoracic diaphragm (TD), and the abdominal wall (AW) are three sub-systems of the respiratory apparatus whose displacement motion has been well studied with the use of magnetic resonance imaging (MRI). Another sub-system, which has however received less research attention with respect to breathing, is the pelvic floor (PF). In particular, there is no study that has investigated the displacement of all four sub-systems simultaneously. Addressing this issue, it was the purpose of this feasibility study to establish a data acquisition paradigm for time-synchronous quantitative analysis of dynamic MRI data from these four major contributors to respiration and phonation (TH, TD, AW, and PF). Three healthy females were asked to breathe in and out forcefully while being recorded in a 1.5-Tesla whole body MR-scanner. Spanning a sequence of 15.12 seconds, 40 MRI data frames were acquired. Each data frame contained two slices, simultaneously documenting the mid-sagittal (TH, TD, PF) and transversal (AW) planes. The displacement motion of the four anatomical structures of interest was documented using kymographic analysis, resulting in time-varying calibrated structure displacement data. After computing the fundamental frequency of the cyclical breathing motion, the phase offsets of the TH, PF, and AW with respect to the TD were computed. Data analysis revealed three fundamentally different displacement patterns. Total structure displacement was in the range of 0.94 cm (TH) to 4.27 cm (TD). Phase delays of up to 90∘ (i.e., a quarter of a breathing cycle) between different structures were found. Motion offsets in the range of -28.30∘ to 14.90∘ were computed for the PF with respect to the TD. The diversity of results in only three investigated participants suggests a variety of possible breathing strategies, warranting further research.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A64. Theodora Nestorova, Manuel Brandner, Bruno Gingras, Christian T. Herbst (2023). Vocal Vibrato Characteristics in Historical and Contemporary Opera, Operetta, and Schlager. Journal of Voice, early online - show abstract
Objectives/Hypothesis
Vibrato is a core aesthetic element in singing. It varies considerably by both genre and era. Though studied extensively in Western classical singing over the years, there is a dearth of studies on vibrato in contemporary commercial music. In addressing this research gap, the objective of this study was to find and investigate common crossover song material from the opera, operetta, and Schlager singing styles from the historical early 20th to the contemporary 21st century epochs.

Study Design/Methods
A total of 51 commercial recordings of two songs, ``Es muss was Wunderbares sein'' by Ralph Benatzky, and ``Die ganze Welt ist himmelblau'' by Robert Stolz, from "The White Horse Inn" ("Im weißen Rößl") were collected from opera, operetta, and Schlager singers. Each sample was annotated using Praat and analyzed in a custom Matlab- and Python-based algorithmic approach of singing voice separation and sine wave fitting novel to vibrato research.

Results
With respect to vibrato rate and extent, the three most notable findings were that (1) fo and vibrato were inherently connected; (2) Schlager, as a historical aesthetic category, has unique vibrato characteristics, with higher overall rate and lower overall extent; and (3) fo and vibrato extent varied over time based on the historical or contemporary recording year for each genre.

Conclusions
Though these results should be interpreted with caution due to the limited sample size, conducting such acoustical analysis is relevant for voice pedagogy. This study sheds light on the complexity of vocal vibrato production physiology and acoustics while providing insight into various aesthetic choices when performing music of different genres and stylistic time periods. In the age of crossover singing training and commercially available recordings, this investigation reveals important distinctions regarding vocal vibrato across genres and eras that bear beneficial implications for singers and teachers of singing.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A63. Christian T. Herbst, Brad H. Story, David Meyer (2023). Acoustical theory of vowel modification strategies in belting. Journal of Voice, early online - show abstract
Various authors have argued that belting is to be produced by ``speech-like'' sounds, with the first and second supraglottic vocal tract resonances (fR1 and fR2) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1≈2fo). It is not clear how both these concepts -- (a) phonating with ``speech-like,'' unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 -- can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1, such as [i] or [u], might have to be modified considerably (by raising fR1) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from −4 to −30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo≈0.5fR1. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation -- considering the relative prevalence of vowels in American English -- suggests that, historically, belting with fR1≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that -- on acoustical grounds -- the pedagogical commandment for belting with unmodified, ``speech-like'' vowels can not always be fulfilled.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A62. Christian T. Herbst, Coen P. H. Elemans, Isao Tokuda, Vasileios Chatziioannou, Jan G. Svec (2023). Dynamic system coupling in voice production. Journal of Voice, early online - show abstract
Voice is a major means of communication for humans, non-human mammals and many other vertebrates like birds and anurans. The physical and physiological principles of voice production are described by two theories: the MyoElastic-AeroDynamic (MEAD) theory and the Source-Filter Theory (SFT). While MEAD employs a multiphysics approach to understand the motor control and dynamics of self-sustained vibration of vocal folds or analogous tissues, SFT predominantly uses acoustics to understand spectral changes of the source via linear propagation through the vocal tract. Because the two theories focus on different aspects of voice production, they are often applied distinctly in specific areas of science and engineering. Here, we argue that the MEAD and the SFT are linked integral aspects of a holistic theory of voice production, describing a dynamically coupled system. The aim of this manuscript is to provide a comprehensive review of both the MEAD and the source-filter theory with its nonlinear extension, the latter of which suggests a number of conceptual similarities to sound production in brass instruments. We discuss the application of both theories to voice production of humans as well as of animals. An appraisal of voice production in the light of non-linear dynamics supports the notion that it can be best described with a systems view, considering coupled systems rather than isolated contributions of individual sub-systems.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A61. Sarah Lehoux, Christian T. Herbst, Martin Dobi{\'a}{ s}, Jan G. Švec (2023). Frequency jumps in excised larynges in anechoic conditions: A pilot study. Journal of Sound and Vibration, early online (117607) - show abstract
Sudden fundamental frequency jumps between chest and falsetto registers are among the least understood phenomena occurring in human voice. Such jumps are recognized as bifurcation events and have been assumed to result from nonlinear- dynamic properties of the vocal folds interacting with acoustic resonances of the vocal tract and of the subglottal tract (ST). Here, we explored an anechoic (resonance-free) ST and investigated these frequency jumps in five excised human larynges under two conditions: (a) anechoic and (b) subglottally resonant without a vocal tract. When smoothly elongating the vocal folds, we observed consistent jumps in anechoic conditions, proving that subglottal and vocal tract resonances are not necessary for the jumps to arise. The presence of a resonant ST did not result in more numerous jumps compared to anechoic conditions, indicating that the inherent nonlinear-dynamic properties of the larynges were the primary cause for the jumps. Nevertheless, the resonant ST slightly altered the initial and terminating frequencies of the jumps suggesting that the role of interaction of the vocal fold oscillations with subglottal acoustics should not be neglected. These experimental findings should be considered when validating mathematical models simulating the nonlinear-dynamic behavior of the human vocal apparatus.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A60. Ales Zita, Adam Novozamsky, Barbara Zitova, Michal Šorel, Christian T. Herbst, Jitka Vydrova, Jan G. Švec (2022). Videokymogram Analyzer Tool: Human--computer comparison. Biomedical Signal Processing and Control, 78, 103878 - show abstract
Videokymography (VKG) is a modern video recording technique used in laryngology and phoniatrics to examine vocal fold vibrations. To obtain quantitative information on the vocal fold vibration, VKG image analysis is needed but no software has yet been validated for this purpose. Here, we introduce a validated software tool that aids clinicians to evaluate diagnostically important vibration characteristics in VKG and other types of kymographic recordings. State-of-the-art methods for automated image evaluation were implemented and tested on a set of videokymograms with a wide range of vibratory characteristics, including healthy and pathologic voices. The automated image segmentation results were compared to manual segmentation results of six evaluators revealing average differences smaller than one pixel. Furthermore, the automatically categorized vibratory parameters precisely agreed with the average visual assessment in 84 and 91 percent of the cases for pathological and healthy patients, respectively. Based on these results, the newly developed software was found to be a valid, reliable automated tool for the quantification of vocal fold vibrations from VKG images, offering a number of novel features relevant for clinical practice.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A59. Christian Thomas Herbst, Brad H. Story (2022). Computer simulation of vocal tract resonance tuning strategies with respect to fundamental frequency and voice source spectral slope in singing. Journal of the Acoustical Society of America, 152 (6), 3548 - show abstract
A well-known concept of singing voice pedagogy is ``formant tuning,'' where the lowest two vocal tract resonances (fR1, fR2) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of fR1, fR2 variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of --6, --12, and --18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A58. Christian T. Herbst (2021). The Snake Pit of Voice Pedagogy PART II: Mixed Voice, Vocal Tract Influences, Individual Teaching Systems. Journal of Singing, 77 (3), 345-358
A57. Josipa Bainac Hausknecht, Kristen Murdaugh, Elke Nagl, Christian T. Herbst (2021). Global Inventory and Similarity Rating of Singing Voice Assessment Terms used at English Speaking Academic Institutions. Journal of Voice - show abstract
The choice of terms to describe and assess the singing voice is an essential part of vocal pedagogy. However, previous work suggested that singing terminology used in academia may be somewhat ambiguous. To address this issue, the authors a) compiled a comprehensive inventory of singing voice assessment terms used by English-speaking academic institutions worldwide and b) with the help of 22 highly experienced singing voice teachers, grouped the most prevalent terms based on their conceptual similarity. Only about a fifth of all targeted institutions provided materials and information online. Overall, a total of 292 different terms were found in the 64 available sources. This surprisingly large number of terms could be reduced by approximately 61% through lexical grouping. In the resulting data set, only 24 of the 114 terms occurred in at least 20 % of the online sources, suggesting a rather low current density of information as well as little to no systematic and coordinated use of terms across institutions The singing voice expert's similarity rating of the 24 most prevalent terms revealed a non-uniform distribution, suggesting that only some of these terms can be used interchangeably. Overall, these findings hint at the underlying complexity of voice assessment on a descriptive and qualitative level, highlighting the need for further research in this area.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A56. Christian T. Herbst, Takeshi Nishimura, Maxime Garcia, Kishin Migimatsu, Isao T. Tokuda (2021). Effect of ventricular folds on vocalization fundamental frequency in domestic pigs (Sus scrofa domesticus). Journal of Voice, early online access - show abstract
This study investigates the effect of the ventricular folds on fundamental frequency (f_{o}) in the voice production of domestic pigs (Sus scrofa domesticus). The excised larynges of six subadult pigs were phonated in two preparation stages, with the ventricular folds present (PS1) and removed (PS2). Vocal fold resonances were tested with a laser vibrometer, and a four-mass computational model was created. Highly significant f_{o} differences were found between PS1 and PS2 (means at 93.7 Hz and 409.3 Hz, respectively). Two tissue resonances were found at 115 Hz and 250--290 Hz. The computational model had unique solutions for abducted and adducted ventricular folds at about 150 Hz and 400 Hz, roughly matching the f_{o} measured ex vivo for PS1 and PS2. The differing f_{o} encountered across preparation stages PS1 and PS2 is explained by distinct activation of either a high or a low eigenfrequency mode, depending on the engagement of the ventricular folds. The inability of the investigated larynges to vibrate at frequencies below 250 Hz in PS2 suggests that in vivo low-frequency calls of domestic pigs (pre-eminently grunts) are likely produced with engaged ventricular folds. Allometric comparison suggests that the special, mechanically coupled ``double oscillator'' has evolved to prevent signaling disadvantages. Given these traits, the porcine larynx might -- apart from special applications relating to the involvement of ventricular folds -- not be an ideal candidate for emulating human voice production in excised larynx experimentation.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A55. Kristen Murdaugh, Josipa Bainac Hausknecht, Christian T. Herbst (2021). In-Person or Virtual? -- Assessing the Impact of COVID-19 on the Teaching Habits of Voice Pedagogues. Journal of Voice, early online access - show abstract
The social distancing measures implemented world-wide in the wake of the novel Coronavirus (COVID-19) crisis have forced voice pedagogues to alter their teaching habits, likely shifting from customary in-person teaching to virtual teaching. An online survey, distributed world-wide in April/May 2020, investigated how singing voice pedagogues were impacted by the COVID-19 crisis. The collected responses from 387 survey participants suggest that, overall, voice teachers were only moderately satisfied with having to teach virtually, indicating that virtual voice teaching is not a sufficient replacement for in-person teaching. The participants indicated that during virtual teaching the singing voice can be assessed relatively well through features which provide both acoustic and visual clues. In contrast, depending on utilized technology, it may be harder to judge those aspects of the singing voice that are solely defined acoustically, such as dynamic range and spectral composition. This may be explained by limitations imposed by ``out of the box'' technology for online communication, which is typically optimized for speech instead of singing. This calls for better information on technological solutions for virtual voice teaching.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A54. Christian T. Herbst (2021). Register -- die Schlangengrube der Gesangspädagogik (Teil 5): Klangbeispiel und Pädagogische Relevanz. Vox Humana, 17 (1)
A53. Christian T. Herbst (2021). Register -- die Schlangengrube der Gesangspädagogik (Teil 6): Individuelle Didaktische Systeme, Schlussbemerkung. Vox Humana, 17 (2)
A52. Matthias Echternach, Christian T. Herbst, Marie Köberlein, Brad Story, Michael Döllinger, Donata Gellrich (2021). Are source-filter interactions detectable in classical singing during vowel glides?. Journal of the Acoustical Society of America, 149 (6), 4565 - show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A51. Christian T. Herbst (2020). Electroglottography -- an update. Journal of Voice, 34 (4), 503-526 - show abstract
Electroglottography (EGG) is a low-cost, non-invasive technology for measuring changes of relative vocal fold contact area during laryngeal voice production. EGG was introduced about 60 years ago and has gone through a ``golden era'' of increased scientific attention in the late 1980s and early 90s. During that period, four eminent review papers were written. Here, an update to these reviews is given, recapitulating some earlier landmark contributions and documenting noteworthy developments during the past 25 years.

After presenting an algorithmic bibliographic analysis, some methodological aspects pertaining to measurement technology, qualitative and quantitative analysis, and respective interpretation are discussed. In particular, the interpretation of landmarks in the (first derivative of the) EGG waveform is critically examined. It is argued that, because of inferior-superior and anterior-posterior phase differences of vocal fold vibration, vocal fold (de)contacting does not occur instantaneously, but over an interval of time. For this reason, instants of vocal fold closing and opening cannot be resolved exactly from the EGG signal. Consequently, any quantitative analysis parameter relying on the determination of (de)contacting events (such as the EGG contact quotient) should be interpreted with great care.

Finally, recent developments are reviewed for the various fields of application of EGG, including basic voice science and voice production physiology, speech signal processing and classification, clinical practice including swallowing, phonetics, hearing sciences, psychology, singing, trumpet playing, and mammalian and avian bioacoustics. Overall, EGG has over the past six decades developed into a mature technology with a wide range of applications. However, due to current limitations, the full potential of the methodology has as yet not been fully exploited. Future development may occur on three levels: (a) rigorous validation of existent measurement approaches; (b) introduction and rigorous validation of novel quantitative and interpretative approaches; and (c) advancement of the measurement technology itself.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A50. Christian T. Herbst (2020). Performance evaluation of subharmonic-to-harmonic ratio (SHR) computation. Journal of Voice, early online access - show abstract
Subharmonics are an important class of voice signals, relevant for speech, pathological voice, singing, and animal bioacoustics. They arise from special cases of amplitude (AM) or frequency modulation (FM) of the time-domain signal. Surprisingly, to date there is only one open source subharmonics detector available to the scientific community: Sun's subharmonic-to-harmonic ratio (SHR) [Sun, 2000]. Here, this algorithm was subjected to a formal evaluation with two data sets of synthesized and empirical speech samples.

Both data sets consisted of electroglottographic (EGG) signals, i.e., a physiological correlate of vocal fold oscillation that bypasses vocal tract acoustics. Data Set I contained of 2560 synthesized EGG signals with varying degrees of AM and FM, fundamental frequency (fo), periodicity, and signal-to-noise ratio (SNR). Data Set II was made up of 25 EGG samples extracted from the CMU Arctic speech data base. For a ``ground truth'' of subharmonicity, these samples were manually annotated by a group of five external experts.

Analysis of the synthesized data suggested that the SHR metric is relatively robust as long as the subharmonic modulation extent is below 0.35 and 0.7 for the FM and AM scenarios, respectively. In the CMU Arctic speech data samples, the SHR analysis reached a maximum sensitivity of about 87 % at a specificity of over 90 %, but only for adaptive algorithm parameter settings. In contrast, the algorithm's default parameter settings could only successfully classify about 9 % of all subharmonic instances.

The SHR is a useful metric for assessing the degree of subharmonics contained in voice signals, but only at adaptive parameter settings. In particular, the frequency ceiling should be chosen as five times the highest fo, and the frame length as at least five times the largest fundamental period of the analyzed signal. For subharmonic classification a threshold of SHR ≥ 0.01 is recommended.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A49. Christian T. Herbst (2020). Register -- die Schlangengrube der Gesangspädagogik (Teil 2): laryngeale Mechanismen. Vox Humana, 16 (1), 16-19
A48. Christian T. Herbst (2020). Register -- die Schlangengrube der Gesangspädagogik (Teil 3): laryngeale Mechanismen (Fortsetzung). Vox Humana, 16 (2), 14-16
A47. Christian T. Herbst (2020). Register -- die Schlangengrube der Gesangspädagogik (Teil 4): Der Einfluss des Vokaltraktes. Vox Humana, 16 (3), 14-16
A46. Christian T. Herbst (2020). The Snake Pit of Voice Pedagogy PART I: Proprioception, Perception, and Laryngeal Mechanisms. Journal of Singing, 77 (2), 173-188
A45. Christian T. Herbst (2019). Physiologische Grundlagen der Sängerstütze (Teil 2): Eine konzeptionelle Begriffserweiterung. Vox Humana, 15 (1), 36-39
A44. Christian T. Herbst (2019). Register -- die Schlangengrube der Gesangspädagogik (Teil 1): propriozeptive und psychoakustische Wahrnehmung. Vox Humana, 15 (3), 44-48
A43. Jeppe Have Rasmussen, Christian T. Herbst, Coen Elemans (2018). Quantifying syringeal dynamics in vitro using electroglottography. Journal of Experimental Biology, 221: jeb172247 - show abstract
Birdsong is an indispensable model for imitative vocal learning in humans. Considerable knowledge of the neurological circuits responsible for vocal learning and control in songbirds is now available but unfortunately the associated peripheral biomechanics are still poorly understood. To get the full picture of how activity in individual neurons is translated into vocalization we need to better understand the kinematics in the song organ, the syrinx. Using an experimental setup we examinatine of the kinematics in vitro and we investigate whether electroglottography (EGG) can be used to quatify the observed dynamics. We surgically insert miniaturized EGG electrodes into pigeon (Columba livia) syrinx and examine the correlation between the EGG signal and the observed kinematics. The pigeon syrinx only contains one sound generator in contrast to the two sound generators of songbirds, making it an ideal species for initial investigations validating the use of EGG in birds. We establish that there is no dorso-ventral phase component in the motion of the sound producing labia and we establish the avian equivalent to vocal fold contact area. EGG is a very solid predictor of the fundamental frequency of the sound being produced and a good indicator of the timing of key events happening during sound producing oscillations.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A42. Christian T. Herbst, Hiroki Koda, Takumi Kunieda, Juri Suzuki, Maxime Garcia, W. Tecumseh Fitch, Takeshi Nishimura (2018). Japanese macaque phonatory physiology. Journal of Experimental Biology, 221: jeb171801 - show abstract
While the call repertoire and its communicative function is relatively well explored in Japanese macaques (Macaca fuscata), little empirical data is available on the physics and the physiology of this species' vocal production mechanism. Here, a 6 year old female Japanese macaque was trained to phonate under an operant conditioning paradigm. The resulting "coo" calls, and spontaneously uttered "growl" and "chirp" calls, were recorded with sound pressure level (SPL) calibrated microphones and electroglottography (EGG), a non-invasive method for assessing the dynamics of phonation. A total of 448 calls were recorded, complemented by ex vivo recordings on an excised Japanese macaque larynx. In this novel multidimensional investigative paradigm, in vivo and ex vivo data were matched via comparable EGG waveforms. Subsequent analysis suggests that the vocal range (range of fundamental frequency and SPL) was comparable to that of a 7-10 year old human, with the exception of low-intensity chirps, whose production may be facilitated by the species' vocal membranes. In coo calls, redundant control of fundamental frequency in relation to SPL was also comparable to humans. EGG data revealed that growls, coos, and chirps were produced by distinct laryngeal vibratory mechanisms. EGG further suggested changes in the degree of vocal fold adduction in vivo, resulting in spectral variation within the emitted coo calls, ranging from "breathy" (including aerodynamic noise components) to "non-breathy". This is again analogous to humans, corroborating the notion that phonation in humans and non-human primates is based on universal physical and physiological principles.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A41. Christian T. Herbst, Jacob C. Dunn (2018). Non-invasive documentation of primate sound production using electroglottography. Anthropological Science, 126 (1), 19-27 download PDF - show abstract
Electroglottography (EGG) is a low-cost, non-invasive method for documenting laryngeal sound production during vocalization. The EGG signal represents relative vocal fold contact area and thus delivers physiological evidence of vocal fold vibration. While the method has received much attention in human voice research over the last five decades, it has seen very little application in other mammals.

Here, we give a concise overview of mammalian vocal production principles. We explain how mammalian voice production physiology and the dynamics of vocal fold vibration can be documented qualitatively and quantitatively with EGG, and we summarize and discuss key issues from research with humans.

Finally, we review the limited number of studies applying EGG to non-human mammals, both in vivo and in vitro. The potential of EGG for non-invasive assessment of non-human primate vocalization is demonstrated with novel in vivo data of Cebus albifrons and Ateles chamek vocalization. These examples illustrate the great potential of EGG as a new minimally invasive tool in primate research, which can provide important insight into the `black box' that is vocal production. A better understanding of vocal fold vibration across a range of taxa can provide us with a deeper understanding of several important elements of speech evolution, such as the universality of vocal production mechanisms, the independence of source and filter, the evolution of vocal control, and the relevance of non-linear phenomena.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A40. Christian T. Herbst, Jacob C. Dunn (2018). Fundamental frequency estimation of low-quality electroglottographic signals. Journal of Voice, in press - show abstract
Fundamental frequency (fo) is often estimated based on electroglottographic (EGG) signals. Due to the nature of the
method, the quality of EGG signals may be impaired by certain features like amplitude or baseline drifts, mains hum or
noise. The potential adverse effects of these factors on fo estimation has to date not been investigated. Here, the
performance of thirteen algorithms for estimating fo was tested, based on 147 synthesized EGG signals with varying
degrees of signal quality deterioration. Algorithm performance was assessed through the standard deviation σfo of the
difference between known and estimated fo data, expressed in octaves. With very few exceptions, simulated mains
hum, and amplitude and baseline drifts did not influence fo results, even though some algorithms consistently
outperformed others. When increasing either cycle-to-cycle fo variation or the degree of subharmonics, the SIGMA
algorithm had the best performance (max. σfo = 0.04). That algorithm was however more easily disturbed by typical
EGG equipment noise, whereas the NDF and Praat's auto-correlation algorithms performed best in this category (σfo
= 0.01). These results suggest that the algorithm for fo estimation of EGG signals needs to be selected specifically for
each particular data set. Overall, estimated fo data should be interpreted with care.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A39. Maxime Garcia, Christian T. Herbst (2018). Excised Larynx Experimentation: history, current developments, and prospects for bioacoustics research. Anthropological Science, 126 (1), 9-17 - show abstract
Sound production mechanisms in vertebrates is a crucial, yet understudied aspect of vocal communication research. Excised larynx experimentation (ELE) provides unique insights into the understanding of these mechanisms in vitro and allows inference to in vivo conditions. Here we provide a historical overview of how this method was implemented since antiquity until the state-of-the-art setups. We review how useful the application of ELE has been to human voice and biophysics research. We then highlight the promising research output resulting from ELE in animal bioacoustics, a research field that has overlooked the use of this method until very recently and that is increasingly relying on this tool. We continue by exposing the limitations posed by ELE, and which should be accounted for depending on the focus of investigation. Finally, we suggest how this approach should be implemented and potentially benefit various research questions. We conclude by underlining the value of ELE contribution in the comprehension of human voice and animal vocal communication within an interdisciplinary approach.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A38. Matthias Echternach, Fabian Burk, Florian Rose, Christian T. Herbst, Michael Burdumy, Michael Döllinger, Bernhard Richter (2018). Auswirkungen von Phonationsverdickungen auf die Biomechanik der Stimmlippenschwingungen in den Registerübergangsregionen bei professionellen Sängerinnen. HNO, 66 (4), 308-320 - show abstract

Einleitung: Der Einfluss von funktionellen Phonationsverdickungen auf das Schwingungsverhalten der Stimmlippen bei stimmlich herausfordernden Aufgaben ist im Detail nicht verstanden.

Material und Methoden: In dieser Studie wurden Glissandi von 220 Hz bis 440 Hz und von 440 Hz bis 880 Hz auf dem Vokal [a] bei je 4 professionellen Sängerinnen (a) ohne organischen Befund und ohne Dysphonie (Gruppe A), (b) mit funktionellen Phonationsverdickungen (Gruppe B) und (c) mit organischer Dysphonie (Gruppe C) mittels Hochgeschwindigkeitsendoskopie (HSDI, 20000 Bildern pro Sekunde) akustischen und elektroglottographischen (EGG) Signalen untersucht. Anhand der EGG sample entropy wurden Zeitfenster zur Analyse von Registerübergangsphänomen gebildet. Ferner wurden alle Stimmsignale (glottal area waveform (GAW), akustisches und EGG-Signal) einer perzeptiven Bewertung hinsichtlich des Auftretens von Registrierungsvorgängen unterzogen.

Ergebnisse: Die absolute sample entropy zeigte Maxima in Grundfrequenzbereichen, in denen typischerweise Registerübergänge zu finden sind. Die absoluten Werte der sample entropy lagen für die Gruppe C nur für das untere Glissando oberhalb der beiden anderen Gruppen. Gruppe B unterschied sich weder im Rating noch in den Werten der sample entropy deutlich von Gruppe A.

Fazit: Funktionelle Phonationsverdickungen wirken sich nicht negativ in Hinblick auf die Biomechanik in stimmtechnisch herausfordernden Bereichen wie Registerübergängen aus. Die Verwendung der sample entropy als Kriterium zur Detektion von Registerübergängen ist vielversprechend, bedarf jedoch weiterer Validierung.
A37. Christian T. Herbst (2018). Physiologische Grundlagen der Sängerstütze (Teil 1): Et hi tres unum sunt -- Wechselwirkungen der Teilsysteme des Stimmapparates. Vox Humana, 14 (3), 50-55
A36. Matthias Echternach, Fabian Burk, Michael Burdumy, Christian T. Herbst, Marie Köberlein, Michael Döllinger, Bernhard Richter (2017). The influence of vocal mass lesions on the passaggio region of professional singers. Laryngoscope, 127 (6), 1392-1401 - show abstract
OBJECTIVES/HYPOTHESIS: In professional classical singing, an even voice quality throughout the entire singing voice range is essential. Transitions between vocal registers (passaggio) are the technically most challenging aspects in classical singing. It is hypothesized that they are most affected by vocal fold mass lesions (VFML).

STUDY DESIGN: Cohort study.

METHODS: In this study, the effect of VFML on vocal fold vibration in the passaggio regions was analyzed in four female and three male singers suffering from organic dysphonia. The singers were asked to sing an ascending glissando through the passaggio regions, before and after treatment. The vocal fold vibration was documented with transnasal endoscopic high-speed imaging recordings at 20,000 frames per second, supplemented by synchronized acoustic and electroglottographic recordings.


RESULTS: Major irregularities were found in the passaggio region of four singers before treatment, whereas the respective phonations below the passaggio were almost regular. In two female singers only the upper, but not the lower passaggio was affected. In all four of these participants, the passaggio region was more regular after treatment. In the remaining three participants, the VFML showed no effect on the passaggio region. However, the singers' ability to reach higher pitches was impaired, but was resolved after treatment.


CONCLUSIONS: The data in this case study strongly suggest that the passaggio region could be affected by VFML, even if phonation outside the passaggio regions is unimpaired. When planning surgical procedures for professional singers, clinical examination protocols should therefore include phonatory tests across the passaggio regions.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A35. Matthias Echternach, Fabian Burk, Marie Köberlein, Christian T. Herbst, Michael Döllinger, Bernhard Richter (2017). Oscillatory characteristics of the vocal folds across the tenor passaggio. Journal of Voice, 31 (3), 381.e5--381.e14 - show abstract
Introduction: Recent research has revealed that classically trained tenors tend to constrict epilaryngeal structures when singing in and above the passaggio (ie, the frequency region where register events typically occur). These constrictions complicate visibility of vocal fold oscillatory patterns with transoral rigid high-speed video endoscopy, thus limiting the current understanding of laryngeal dynamics in the passaggio region of tenors.

Materials and Methods: This investigation analyzed seven professionally trained western classical tenors using high-speed digital imaging (HSDI) at 20,000 frames per second via transnasal flexible endoscopy. The participants produced transitions (a) from modal to falsetto register and (b) from modal to stage voice above the passaggio (SVaP) during ascending pitch glides from A3 (220 Hz) to A4 (440 Hz) on vowel /i/. HSDI data were complemented by simultaneous acoustic and electroglottographic recordings.

Results: For many subjects both transition types were associated with constrictions of the epilaryngeal structures during the pitch glide. These constrictions appeared to be more distinct for the SVaP than for falsetto. No major irregularities of vocal fold oscillations in the sense of fundamental frequency jumps were observed for either transition type. However, during the transitions, the open quotient derived from the glottal area waveform (OQGAW) increased; in falsetto, the OQGAW was greater and the electroglottographic cepstral peak prominence was lower than in SVaP.

Conclusions: Epilaryngeal constrictions should be considered typical for tenors singing at high fundamental frequencies. Vocal fold oscillatory patterns are changing not only for the register shift from modal to falsetto but also for the transition from modal to SVaP, indicating a need for laryngeal adjustments during these transitions.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A34. Maxime Garcia, Christian T. Herbst, Daniel L. Bowling, Jacob Dunn, W. Tecumseh Fitch (2017). Acoustic allometry revisited: morphological determinants of fundamental frequency in primate vocal production. Scientific Reports, 7 (10450), 1 - 11 - show abstract
A fundamental issue in the evolution of communication is the degree to which signals convey accurate (``honest'') information about the signaler. In bioacoustics, the assumption that fundamental frequency (fo) should correlate with the body size of the caller is widespread, but this belief has been challenged by various studies, possibly because larynx size and body size can vary independently. In the present comparative study, we conducted excised larynx experiments to investigate this hypothesis rigorously and explore the determinants of fo. Using specimens from eleven primate species, we carried out an inter-specific investigation, examining correlations between the minimum fo produced by the sound source, body size and vocal fold length (VFL). We found that, across species, VFL predicted minimum fo much better than body size, clearly demonstrating the potential for decoupling between larynx size and body size in primates. These findings shed new light on the diversity of primate vocalizations and vocal morphology, highlighting the importance of vocal physiology in understanding the evolution of mammal vocal communication.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A33. Matthias Echternach, Fabian Burk, Marie Köberlein, Andreas Selamtzis, Michael Döllinger, Michael Burdumy, Bernhard Richter, Christian T. Herbst (2017). Laryngeal evidence for the first and second passaggio in professionally trained sopranos. PLoS ONE, 12 (5), e0175865 - show abstract
Introduction
Due to a lack of empirical data, the current understanding of the laryngeal mechanics in the passaggio regions (i.e., the fundamental frequency ranges where vocal registration events usually occur) of the female singing voice is still limited.

Material and Methods
In this study the first and second passaggio regions of 10 professionally trained female classical soprano singers were analyzed. The sopranos performed pitch glides from A3 (fo = 220 Hz) to A4 (fo = 440 Hz) and from A4 (fo = 440 Hz) to A5 (fo = 880 Hz) on the vowel [i:]. Vocal fold vibration was assessed with trans-nasal high speed videoendoscopy at 20,000 fps, complemented by simultaneous electroglottographic (EGG) and acoustic recordings. Register breaks were perceptually rated by 12 voice experts. Voice stability was documented with the EGG-based sample entropy. Glottal opening and closing patterns during the passaggi were analyzed, supplemented with open quotient data extracted from the glottal area waveform.

Results
In both the first and the second passaggio, variations of vocal fold vibration patterns were found. Four distinct patterns emerged: smooth transitions with either increasing or decreasing durations of glottal closure, abrupt register transitions, and intermediate loss of vocal fold contact. Audible register transitions (in both the first and second passaggi) generally coincided with higher sample entropy values and higher open quotient variance through the respective passaggi.

Conclusions
Noteworthy vocal fold oscillatory registration events occur in both the first and the second passsagio even in professional sopranos. The respective transitions are hypothesized to be caused by either (a) a change of laryngeal biomechanical properties; or by (b) vocal tract resonance effects, constituting level 2 source-filter interactions.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A32. Christian T. Herbst, Vit Hampala, Maxime Garcia, Riccardo Hofer, Jan G. Svec (2017). Hemi-laryngeal setup for studying vocal fold vibration in three dimensions. Journal of Visualized Experiments, 129 (e55303), 1-9 - show abstract
The voice of humans and most non-human mammals is generated in the larynx through self-sustaining oscillation of the vocal folds. Direct visual documentation of vocal fold vibration is challenging, particularly in non-human mammals. As an alternative, excised larynx experiments provide the opportunity to investigate vocal fold vibration under controlled physiological and physical conditions. However, the use of a full larynx merely provides a top view of the vocal folds, thus excluding crucial portions of the oscillating structures from observation during their interaction with aerodynamic forces. This limitation can be overcome by utilizing a hemi-larynx setup where one half of the larynx is mid-sagittally removed, thus providing both a superior and a lateral view of the remaining vocal fold during self-sustained oscillation.

Here, a step-by-step guide for the anatomical preparation of hemi-laryngeal structures and their mounting on the laboratory bench is given. Exemplary phonation of the hemi-larynx preparation is documented with high-speed video data captured by two synchronized cameras (superior and lateral views), showing three-dimensional vocal fold motion and corresponding time-varying contact area. The documentation of the hemi-larynx setup in this publication will facilitate application and reliable repeatability in experimental research, thus providing voice scientists with the potential to better understand the biomechanics of voice production.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A31. Christian T. Herbst, Harm K. Schutte, Daniel L. Bowling, Jan G. Svec (2017). Comparing chalk with cheese -- The EGG contact quotient is only a limited surrogate of the closed quotient. Journal of Voice, 31 (4), 401--409 - show abstract
The electroglottographic (EGG) contact quotient (CQegg), an estimate of the relative duration of vocal fold contact per vibratory cycle, is the most commonly used quantitative analysis parameter. The purpose of this study is to quantify the CQegg's relation to the closed quotient, a measure more directly related to glottal width changes during vocal fold vibration and the respective sound generation events.

Thirteen singers (six females) phonated in four extreme phonation types, while independently varying the degree of breathiness and vocal register. EGG recordings were complemented by simultaneous videokymographic (VKG) endoscopy, which allows for calculation of the videokymographic closed quotient (CQvkg). The CQegg was computed using five different algorithms, all used in previous research.

All CQegg algorithms produced CQegg values that clearly differed from the respective CQvkg, with standard deviations around 20 % of cycle duration. The difference between CQvkg and CQegg was generally greater for phonations with lower CQvkg. The largest differences were found for low-quality EGG signals with a signal-to-noise ratio (SNR) below 10 dB, typically stemming from phonations with incomplete glottal closure. Disregarding those low-quality signals, the best match between CQegg and CQvkg was found for a CQegg algorithm operating on the first derivative of the EGG signal.

These results show that the terms ``closed quotient'' and ``contact quotient'' should not be used interchangeably. They relate to different physiological phenomena. Phonations with incomplete glottal closure having an EGG SNR below 10 dB are not suited for CQegg analysis.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A30. Christian T. Herbst (2017). A review of singing voice sub-system interactions - towards an extended physiological model of "support". Journal of Voice, 31 (2), 249.e13--249.e19 - show abstract
During phonation, the respiratory, the phonatory and the resonatory parts of the voice organ can interact, where physiological action in one sub-system elicits a direct effect in another. Here, three major of these synergies are reviewed, creating a model of voice sub-system interactions: (a) Vocal tract adjustments can influence the behavior of the voice source via non-linear source-tract interactions; (b) the type and degree of vocal fold adduction controls the expiratory airflow rate; and (c) the tracheal pull caused by the respiratory system affects the vertical larynx position and thus the vocal tract resonances.
The relevance of the presented model is discussed, suggesting, amongst others, that functional voice building work concerned with a particular voice sub-system may evoke side effects or benefits on other sub-systems, even when having a clearly defined and isolated physiological target.
Finally, four seemingly incongruous historic definitions of the concept of singing voice "support" are evaluated, showing how each of these pertain to different voice sub-systems at various levels of detail. It is argued that presumed discrepancies between these definitions can be resolved by putting them into the wider context of the sub-system interaction model presented here, thus offering a framework for reviewing and potentially refining some current and historical pedagogical approaches.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A29. Christian T. Herbst, Jakob Unger, Hanspeter Herzel, Jan G. Svec, Jörg Lohscheller (2016). Phasegram analysis of vocal fold vibration documented with laryngeal high-speed video endoscopy. Journal of Voice, Feb 12. pii: S0892-1997(15)00257-X. doi: 10.1016/j.jvoice.2015.11.006. [Epub ahead of print] - show abstract
Introduction. In a recent publication, the phasegram, a bifurcation diagram over time, has been introduced as an intuitive visualization tool for assessing the vibratory states of oscillating systems. Here, this non-linear dynamics approach is augmented with quantitative analysis parameters, and it is applied to clinical laryngeal high-speed video (HSV) endoscopic recordings of healthy and pathologic phonations.
Methods. HSV data from a total of 73 females diagnosed as healthy (n=42), or with functional dysphonia (n=15) or unilateral vocal fold paralysis (n=16), were quantitatively analyzed. Glottal area waveforms (GAW) as well as left and right hemi-GAWs (hGAW) were extracted from the HSV recordings. Based on Poincaré sections through phase space embedded signals, two novel quantitative parameters were computed: The phasegram entropy (PE), and the phasegram complexity estimate (PCE), inspired by signal entropy and correlation dimension computation, respectively.
Results. Both PE and PCE assumed higher average values (suggesting more irregular vibrations) for the pathological as compared to the healthy participants, significantly discriminating the healthy from the paralysis group (p=0.02 for both PE and PCE). Comparisons of individual PE or PCE data for the left and right hGAW within each subject resulted in asymmetry measures for the regularity of vocal fold vibration. The PCE-based asymmetry measure revealed significant differences between the healthy and the paralysis group (p=0.03).
Conclusions. Quantitative phasegram analysis of GAW and hGAW data is a promising tool for the automated processing of HSV data in research and in clinical practice.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A28. Maxime Garcia, Bruno Gingras, Daniel L. Bowling, Christian T. Herbst, Markus Böckle, Yann Locatelli, W. Tecumseh Fitch (2016). Structural classification of Wild Boar (Sus scrofa) vocalizations. Ethology, 122 (4), 329--342 - show abstract
Determining whether a species' vocal communication system is graded or discrete requires definition of its vocal repertoire. In this context, research on domestic pig (Sus scrofa domesticus) vocalizations for example has led to significant advances in our understanding of communicative functions. Despite their close relation to domestic pigs, little is known about wild boar (Sus scrofa) vocalizations. The few existing studies, conducted in the 1970's, relied on visual inspections of spectrograms to quantify acoustic parameters and lacked statistical analysis. Here, we use objective signal processing techniques and advanced statistical approaches to classify 616 calls recorded from semi-free ranging animals. Based on four spectral and temporal acoustic parameters - quartile Q25, duration, spectral flux and spectral flatness - extracted from a multivariate analysis, we refine and extend the conclusions drawn from previous work and present a statistically validated classification of the wild boar vocal repertoire into four call types: grunts, grunt-squeals, squeals and trumpets. While the majority of calls could be sorted into these categories using objective criteria, we also found evidence supporting a graded interpretation of some wild boar vocalizations as acoustically continuous, with the extremes representing discrete call types. Using objective criteria based on modern techniques and statistics in respect to acoustic continuity, examining both production and perception levels, advances the understanding on vocal variation. Integrating our findings with recent studies on domestic pig vocal behavior and emotions, emphasize the importance of grunt-squeals for acoustic approaches to animal welfare and underline the need of further research investigating the role of domestication on animal vocal communication.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A27. Brian P. Gill, Christian T. Herbst (2016). Voice Pedagogy - What Do We Need?. Logopedics Phoniatrics Vocology, 41 (4), 168-173 - show abstract
The final keynote panel of the 10th Pan-European Voice Conference (PEVOC) was concerned with the topic ``Voice Pedagogy -- What do we need?'' In this communication the panel discussion is summarized and the authors provide a deepening discussion on one of the key questions, addressing the roles and tasks of people working with voice students. In particular, a distinction is made between (a) voice building (derived from the German term "Stimmbildung"), primarily comprising the functional and physiological aspects of singing; (b) coaching, mostly concerned with performance skills; and (c) singing voice rehabilitation. Both public and private educators are encouraged to apply this distinction to their curricula, in order to arrive at more efficient singing teaching and to reduce the risk of vocal injury to the concerned singers.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A26. Laura Enflo, Christian T. Herbst, Johan Sundberg, Anita McAllister (2016). Comparing vocal fold contact criteria derived from electroglottographic and acoustic signals. Journal of Voice, 30 (4), 381-388 - show abstract

Objectives: Collision threshold pressure (CTP), i.e., the lowest subglottal pressure facilitating vocal fold contact during phonation, is likely to reflect relevant vocal fold properties. The amplitude of an electroglottographic (EGG) signal or the amplitude of its first derivative (dEGG) has been used as criterion of such contact. Manual measurement of CTP is time-consuming, making the development of a simpler, alternative method desirable.

Method: In this investigation we compare CTP values measured manually to values automatically derived from dEGG, and to values derived from a set of alternative parameters, some obtained from audio and some from EGG signals. One of the parameters was the novel EGG wavegram, which visualizes sequences of EGG or dEGG cycles, normalized with respect to period and amplitude. Raters with and without previous acquaintance with EGG analysis marked the disappearance of vocal fold contact in dEGG and in wavegram displays of /pa:/-sequences produced with continuously decreasing vocal loudness by seven singer subjects.

Results: Vocal fold contact was equally accurately identified in displays of both dEGG amplitude and wavegram. Automatically derived CTP values showed high correlation with those measured manually, and with those derived from the ratings of the visual displays. Seven other parameters were tested as criteria of such contact. Mainly due to noise in the EGG signal, most of them yielded CTP values differing considerably from those derived from the manual and the automatic methods, while the EGG spectrum slope showed a high correlation.

Conclusion: The possibility of measuring CTP automatically seems promising for future investigations.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A25. Vit Hampala, Maxime Garcia, Jan G. Svec, Ronald C. Scherer, Christian T. Herbst (2016). Relationship between the Electroglottographic Signal and Vocal Fold Contact Area. Journal of Voice, 30 (2), 161-171 - show abstract
Objective. Electroglottography (EGG) is a widely used non-invasive method that purports to measure changes in relative vocal fold contact area (VFCA) during phonation. Despite its broad application, the putative direct relation between the EGG waveform and VFCA has to date only been formally tested in a single study, suggesting an approximately linear relationship. However, in that study flow-induced vocal fold vibration was not investigated. A rigorous empirical evaluation of EGG as a measure of VFCA under proper physiological conditions is therefore still needed.

Methods/Design. Three red deer larynges were phonated in an excised hemilarynx preparation utilizing a conducting glass plate. The time varying contact between the vocal fold and the glass plate was assessed by high-speed video recordings at 6000 fps, synchronized to the EGG signal.

Results. The average differences between the normalized [0,1] VFCA and EGG waveforms for the three larynges were 0.180 (+-0.156), 0.075 (+-0.115) and 0.168 (+/-+-0.184) in the contacting phase, and 0.159 (+-0.112), -0.003 (+-0.029) and 0.004 (+-0.032) in the de-contacting phase.

Discussion and Conclusion: Overall there was a better agreement between VFCA and the EGG waveform in the de-contacting phase than in the contacting phase. Disagreements may be caused by non-uniform tissue conductance properties, electrode placement, and electroglottograph hardware circuitry. Pending further research, the EGG waveform may be a reasonable first approximation to change in medial contact area between the vocal folds during phonation. However, any quantitative and statistical data derived from EGG should be interpreted cautiously, allowing for potential deviations from true VFCA.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A24. Coen Elemans, Jeppe Have Rasmussen, Christian T. Herbst, Daniel Düring, Sue Anne Zollinger, Henrik Brumm, Kyle Srivastava, Niels Svane, Ming Ding, Ole Larsen, Samuel Sober, Jan G. Svec (2015). Universal mechanisms of sound production and control in birds and mammals. Nature Communications, 6 (8978), 1-13 - show abstract
As animals vocalize, their vocal organ transforms motor commands into vocalizations for social communication. In birds, the physical mechanisms by which vocalizations are produced and controlled remain unresolved because of the extreme difficulty in obtaining in vivo measurements. Here, we introduce an ex vivo preparation of the avian vocal organ that allows simultaneous high-speed imaging, muscle stimulation and kinematic and acoustic analyses to reveal the mechanisms of vocal production in birds across a wide range of taxa. Remarkably, we show that all species tested employ the myoelastic-aerodynamic (MEAD) mechanism, the same mechanism used to produce human speech. Furthermore, we show substantial redundancy in the control of key vocal parameters ex vivo, suggesting that in vivo vocalizations may also not be specified by unique motor commands. We propose that such motor redundancy can aid vocal learning and is common to MEAD sound production across birds and mammals, including humans.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A23. Ingo R. Titze, Ronald Baken, Kenneth Bozeman, Svante Granqvist, Nathalie Henrich-Bernardoni, Christian T. Herbst, David Howard, Eric Hunter, Dean Kaelin, Raymond Kent, Jody Kreiman, Malte Kob, Anders Lofqvist, Scott McCoy, Donald Miller, Hubert Noe, Ronald C. Scherer, John Smith, Brad Story, Jan G. Svec, Sten Ternström, Joe Wolfe (2015). Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization. Journal of the Acoustical Society of America, 137 (5), 3005-3007 - show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A22. Hana Sramkova, Svante Granqvist, Christian T. Herbst, Jan G. Svec (2015). The softest sound levels of the human voice in normal subjects. Journal of the Acoustical Society of America, 137 (1), 407-418 - show abstract
Accurate measurement of the softest sound levels of phonation presents technical and methodological challenges. This study aimed at (1) reliably obtaining normative data on sustained softest sound levels for the vowel [a:] at comfortable pitch; (2) comparing the results for different frequency and time weighting methods; and (3) refining the Union of European Phoniatricians' recommendation on allowed background noise levels for scientific and equipment manufacturers' purposes. Eighty healthy untrained participants (40 females, 40 males) were investigated in quiet rooms using a head-mounted microphone and a sound level meter at 30 cm distance. The one-second-equivalent sound levels were more stable and more representative for evaluating the softest sustained phonations than the fast-time-weighted levels. At 30 cm, these levels were in the range of 48-61 dB(C)/41-53 dB(A) for females and 49 - 64 dB(C)/35-53 dB(A) for males (5% to 95% quantile range). These ranges may serve as reference data in evaluating vocal normality. In order to reach a signal-to-noise ratio of at least 10 dB for more than 95% of the normal population, the background noise should be below 25 dB(A) and 38 dB(C), respectively, for the softest phonation measurements at 30 cm distance. For the A-weighting, this is 15 dB lower than the previously recommended value.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A21. Shaheen N. Awan, Andrew R Krauss, Christian T. Herbst (2015). An Examination of the Relationship Between Electroglottographic Contact Quotient (CQEGG), EGG Decontacting Phase Profile, and Acoustical Spectral Moments. Journal of Voice, 29 (5), 519-529 - show abstract
OBJECTIVES: To date, only a few studies have examined the possible relationship between electroglottographic (EGG) data and spectral characteristics of the voice. This study examined the possible association between EGG signal data (contact quotient [CQ] and decontacting phase profile) and spectral moments of the acoustic signal (spectral mean, spectral standard deviation (SD), spectral skewness, and spectral kurtosis). Furthermore, the possible effects of gender on these measurements were analyzed.
METHODS: Sustained vowel /ɑ/ productions were obtained from 48 normophonic individuals (24 adult males and 24 adult females). The central 1-second portions of the acoustic vowel samples were analyzed for spectral moments, and the EGG signal was analyzed for CQ (CQEGG), fundamental frequency (F0), and decontacting phase profile.
RESULTS: Across all subjects, the spectral characteristics of the voice (in particular, spectral SD, skewness, and kurtosis) are significantly related to changes in the relative duration of vocal fold contact (as measured via CQEGG). In addition, significant effects of the profile of the EGG decontacting phase (ie, concave down/"knee" vs concave up/"no knee") on spectral SD were also observed, as well as a strong trend for decontacting phase profile to influence the spectral mean.
DISCUSSION: Although the degree of vocal fold contact and differences in decontacting phase profile may have an influence on the spectral characteristics of the acoustic voice signal, the strength of correlations between CQEGG values and measures of spectral moments only accounted for approximately 13-16% of the variation in spectral distribution characteristics. These results stress the importance of the transformative role of the supraglottal vocal tract in producing an acoustic output that maintains some of the characteristics of the glottal source, but which modifies the source characteristics in ways not completely accounted for by single parameters such as CQEGG or EGG profile.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A20. Christian T. Herbst, Jinook Oh, Jitka Vydrova, Jan G. Svec (2015). DigitalVHI -- a freeware open source software application to capture the Voice Handicap Index and other questionnaire data in various languages. Logopedics Phoniatrics Vocology, 40 (2), 70-74 - show abstract
In this short report we introduce DigitalVHI, a free open-source software application for obtaining Voice Handicap Index (VHI), and other questionnaire data, which can be put on a computer in clinics and used in clinical practice. The software can be downloaded from http://www.christian-herbst.org/DigitalVHI/
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A19. Christian T. Herbst, Markus Hess, Frank Müller, Jan G. Svec, Johan Sundberg (2015). Glottal adduction and subglottal pressure in singing. Journal of Voice, 29 (4), 391-402 - show abstract
Previous research suggests that independent variation of vocal loudness and glottal configuration (type and degree of vocal fold adduction) does not occur in untrained speech production. This study investigated whether these factors can be varied independently in trained singing, and how subglottal pressure is related to average glottal airflow, voice source properties and sound level under these conditions.

A classically trained baritone produced sustained phonations on the endoscopic vowel [i:] at pitch D4 (approx. 294 Hz), exclusively varying either (a) vocal register; (b) phonation type (from ``breathy'' to ``pressed'' via cartilaginous adduction); or (c) vocal loudness, while keeping the others constant. Phonation was documented by simultaneous recording of videokymographic, electroglottographic, airflow and voice source data, and by percutaneous measurement of relative subglottal pressure.

Register shifts were clearly marked in the EGG wavegram display. As compared with chest register, falsetto was produced with greater pulse amplitude of the glottal flow, H1-H2, mean airflow, and with lower MFDR, subglottal pressure, and sound pressure. Shifts of phonation type (breathy/flow/neutral/pressed) induced comparable systematic changes. Increase of vocal loudness resulted in increased subglottal pressure, average flow, sound pressure, MFDR, glottal flow pulse amplitude and H1-H2.

When changing either vocal register or phonation type, subglottal pressure and mean airflow showed an inverse relationship, i.e, variation of glottal flow resistance. The direct relation between subglottal pressure and flow when varying only vocal loudness demonstrated independent control of vocal loudness and glottal configuration. Achieving such independent control of phonatory control parameters would be an important target in vocal pedagogy and in voice therapy.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A18. Christian T. Herbst (2015). Glottale Adduktion im Gesang. Vox Humana, 11 (1), 12-17
A17. David Howard, Jenevora Williams, Christian T. Herbst (2014). "Ring" in the solo child singing voice. Journal of Voice, 28 (2), 161-169 - show abstract
Objectives/Hypothesis. Listeners often describe the voices of solo child singers as being `pure' or `clear', these terms would suggest that the voice is not only pleasant but also clearly audible. The audibility or clarity could be attributed to the presence of high-frequency partials in the sound: a `brightness' or `ring'. This paper aims to investigate spectrally the acoustic nature of this `ring' phenomenon in children's solo voices, and in particular, relating it to their `non-ring' production. Additionally, this is set in the context of establishing to what extent, if any, the spectral characteristics of `ring' are shared with those of the singer's formant cluster associated with professional adult opera singers in the 2.5 to 3.5 kHz region.

Methods. A group of child solo singers, acknowledged as outstanding by a singing teacher who specializes in teaching professional child singers, were recorded in a major UK concert hall performing Come unto him, all ye that labour, from the aria He shall feed his flock from The Messiah by GF Handel. Their singing was accompanied by a recording of a piano played through in-ear headphones. Sound pressure recordings were made from well within the critical distance in the hall. The singers were observed to produce notes with and without `ring', and these recordings were analyzed in the frequency domain to investigate their spectra.

Results. The results indicate that there is evidence to suggest that `ring' in child solo singers is carried in two areas of the output spectrum: firstly in the singer's formant cluster region, centered around 4 kHz, which is more than 1000 Hz higher than what is observed in adults; and secondly in the region around 7.5-11 kHz where a significant strengthening of harmonic presence is observed. A perceptual test has been carried out demonstrating that 94% of 62 listeners label a synthesized version of the calculated overall average `ring' spectrum for all subjects as having `ring' when compared to a synthesized version of the calculated overall average `non-ring' spectrum.

Conclusions. The notion of `ring' in the child solo voice manifests itself not only with spectral features in common with the projection peak found in adult singers but also in a higher frequency region. It is suggested that the formant cluster at around 4 kHz is the children's equivalent of the singers' formant cluster; the frequency is higher than in the adult, most likely due to the smaller dimensions of the epilaryngeal tube. The frequency cluster observed as a strong peak at about 7.5-11 kHz, when added to the children's singers' formant cluster, may be the key to cueing the notion of 'ring' in the child solo voice.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A16. Christian T. Herbst (2014). Glottal efficiency of periodic and irregular in vitro red deer voice production. Acta Acustica united with Acustica, 100 (4), 724-733 - show abstract
Two female red deer larynges were artificially phonated in an excised larynx setup by varying subglottal pressure as the independent parameter. The acquired data were annotated as periodic, subharmonic and irregular by means of the recently developed phasegram technique. Glottal efficiency was non-linearly dependent on subglottal pressure. Above 1 kPa subglottal pressure the glottal efficiency increased linearly by about 3.1 and 3.7 dB per kPa, respectively, in the two larynges. At subglottal pressures above 1.5 kPa the glottal efficiency of the irregular segments was in average about 2.5 to 3 dB greater than that of the periodic and subharmonic segments. The results of this pilot study suggest that an irregular sound production mechanism at higher subglottal pressures could be a means to gain an energetic advantage in animal vocal communication when converting metabolic to acoustic energy.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A15. Christian T. Herbst, Jörg Lohscheller, Jan G. Svec, Nathalie Henrich Bernadoni, Gerald Weissengruber, W. Tecumseh Fitch (2014). Glottal opening and closing events investigated by electroglottography and super-high-speed video recordings. Journal of Experimental Biology, 217 (6), 955-963 - show abstract
Previous research has suggested that the peaks in the first derivative (dEGG) of the electroglottographic (EGG) signal are good approximate indicators of the events of glottal opening and closing. These findings were based on high-speed video (HSV) recordings with frame rates 10 times lower than the sampling frequencies of the corresponding EGG data. The present study attempts to corroborate these previous findings, utilizing super-HSV recordings. The HSV and EGG recordings (sampled at 27 and 44 kHz, respectively) of an excised canine larynx phonation were synchronized by an external TTL signal to within 0.037 ms. Data were analyzed by means of glottovibrograms, digital kymograms, the glottal area waveform and the vocal fold contact length (VFCL), a new parameter representing the time-varying degree of `zippering' closure along the anterior--posterior (A--P) glottal axis. The temporal offsets between glottal events (depicted in the HSV recordings) and dEGG peaks in the opening and closing phase of glottal vibration ranged from 0.02 to 0.61 ms, amounting to 0.24--10.88% of the respective glottal cycle durations. All dEGG double peaks coincided with vibratory A--P phase differences. In two out of the three analyzed video sequences, peaks in the first derivative of the VFCL coincided with dEGG peaks, again co-occurring with A--P phase differences. The findings suggest that dEGG peaks do not always coincide with the events of glottal closure and initial opening. Vocal fold contacting and de-contacting do not occur at infinitesimally small instants of time, but extend over a certain interval, particularly under the influence of A--P phase differences.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A14. Daniel L. Bowling, Christian T. Herbst, W. Tecumseh Fitch (2013). Social Origins of Rhythm? Synchrony and Temporal Regularity in Human Vocalization. PLoS ONE, 8 (11), e80402 - show abstract
Humans have a capacity to perceive and synchronize with rhythms. This is unusual in that only a minority of other species exhibit similar behavior. Study of synchronizing species (particularly anurans and insects) suggests that simultaneous signal production by different individuals may play a critical role in the development of regular temporal signaling. Accordingly, we investigated the link between simultaneous signal production and temporal regularity in our own species. Specifically, we asked whether inter-individual synchronization of a behavior that is typically irregular in time, speech, could lead to evenly-paced or ``isochronous'' temporal patterns. Participants read nonsense phrases aloud with and without partners, and we found that synchronous reading resulted in greater regularity of durational intervals between words. Comparison of same-gender pairings showed that males and females were able to synchronize their temporal speech patterns with equal skill. These results demonstrate that the shared goal of synchronization can lead to the development of temporal regularity in vocalizations, suggesting that the origins of musical rhythm may lie in cooperative social interaction rather than in sexual selection.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A13. Christian T. Herbst, Jan G. Svec, Jörg Lohscheller, Roland Frey, Michaela Gumpenberger, Angela S. Stoeger, W. Tecumseh Fitch (2013). Complex vibratory patterns in an elephant larynx. Journal of Experimental Biology, 216, 4054-4064 download PDF - show abstract
Elephant low-frequency vocalizations are produced by flow-induced self-sustaining oscillations of laryngeal tissue. To date, little is known in detail about the vibratory phenomena in the elephant larynx. Here we provide a first descriptive report of the complex oscillatory features found in the excised larynx of a 25 year old female African elephant (Loxodonta africana), the largest animal sound generator ever studied experimentally.

Sound production was documented with high-speed video, acoustic measurements, airflow and sound pressure level recordings. The anatomy of the larynx was studied with computed tomography (CT) and dissections. Elephant CT vocal anatomy data were further compared to the anatomy of an adult human male.

We observed numerous unusual phenomena, not typically reported in human vocal fold vibrations. Phase delays along both the inferior-superior and anterior-posterior (A-P) dimension were commonly observed, as well as transverse travelling wave patterns along the A-P dimension, as yet not documented in the literature. Acoustic energy was mainly created during the instant of glottal opening. The vestibular folds, when adducted, participated in the tissue vibration, effectively increasing the generated sound pressure level by 12 dB.

The complexity of the observed phenomena is partly attributed to the distinct laryngeal anatomy of the elephant larynx, which is not simply a large-scale version of its human counterpart. Travelling waves may be facilitated by low fundamental frequencies and increased vocal fold tension. A travelling wave model is proposed, to account for three types of phenomena: A-P travelling waves, ``conventional'' standing wave patterns, and irregular vocal fold vibration.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A12. Bruno Gingras, Markus Boeckle, Christian T. Herbst, W. Tecumseh Fitch (2013). Call acoustics reflect body size across four clades of anurans. Journal of Zoology, 289 (2), 143-150 - show abstract
An inverse relationship between body size and advertisement call frequency has been found in several frog species. However, the generalizability of this relationship across different clades and across a large distribution of species remains underexplored. We investigated this relationship in a large sample of 136 species belonging to four clades of anurans (Bufo, Hylinae, Leptodactylus and Rana) using semi-automatic, high-throughput analysis software. We employed two measures of call frequency: fundamental frequency (F0) and dominant frequency (DF). The slope of the relationship between male snout-vent length (SVL) and frequency did not differ significantly among the four clades. However, Rana call at a significantly lower frequency relative to size than the other clades, and Bufo call at a significantly higher frequency relative to size than Leptodactylus. Because the relationship between F0 and body size may be more straightforwardly explained by biomechanical constraints, we confirmed that a similar inverse relationship was observed between F0 and SVL. Finally, spectral flatness, an indicator of the tonality of the vocalizations, was found to be inversely correlated with SVL, contradicting an oft-cited prediction that larger animals should have rougher voices. Our results confirm a tight and widespread link between body size and call frequency in anurans, and suggest that laryngeal allometry and vocal fold dimensions in particular are responsible.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A11. Jakob Unger, Tobias Meyer, Christian T. Herbst, W. Tecumseh Fitch, Michael Döllinger, Jörg Lohscheller (2013). Phonovibrographic wavegrams: Visualizing vocal fold kinematics. Journal of the Acoustical Society of America, 133 (2), 1055-1064 - show abstract
Recently, endoscopic high-speed laryngoscopy has been established for commercial use as a state-of-the-art technique to examine vocal fold kinematics. Since modern cameras provide sampling rates of several thousand frames per second, a high volume of data has to be considered for visual and objective analysis. A method for visualizing endoscopic high speed videos in three-dimensional cycle-based graphs combining and extending the approaches of phonovibrograms and electroglottographic wavegrams is presented. To build a phonovibrographic wavegram, individual cycles of a phonovibrogram are segmented, normalized in cycle duration, and concatenated over time. For analyzing purposes, the emerging three-dimensional scalar field is visualized with different rendering techniques providing information of different aspects of vocal fold kinematics. The phonovibrographic wavegram incorporates information about the glottal closure type, size, and location of the amplitudes, symmetry, periodicity, and phase information. The potential of the approach to visualize the characteristics of vocal fold vibration in a compact and intuitive way is demonstrated within two healthy and three pathologic subjects. The phonovibrographic wavegram allows a comprehensive analysis of vocal fold kinematics and reveals information that remains hidden with other visualization techniques.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A10. Angela S. Stoeger, Daniel Mietchen, Sukhun Oh, Shermin de Silva, Christian T. Herbst, Soonwhan Kwon, W. Tecumseh Fitch (2012). An Asian Elephant Imitates Human Speech. Current Biology, 22, 1-5 - show abstract
Vocal imitation has convergently evolved in many species, allowing learning and cultural transmission of complex, conspecific sounds, as in birdsong. Scattered instances also exist of vocal imitation across species, including mockingbirds imitating other species or parrots and mynahs producing human speech. Here, we document a male Asian elephant (Elephas maximus) that imitates human speech, matching Korean formants and fundamental frequency in such detail that Korean native speakers can readily understand and transcribe the imitations. To create these very accurate imitations of speech formant frequencies, this elephant (named Koshik) places his trunk inside his mouth, modulating the shape of the vocal tract during controlled phonation. This represents a wholly novel method of vocal production and formant control in this or any other species. One hypothesized role for vocal imitation is to facilitate vocal recognition by heightening the similarity between related or socially affiliated individuals. The social circumstances under which Koshik's speech imitations developed suggest that one function of vocal learning might be to cement social bonds and, in unusual cases, social bonds across species.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A9. Christian T. Herbst, Elke Duus, Harald Jers, Jan G. Svec (2012). Quantitative Voice Class Assessment of Amateur Choir Singers: A Pilot Investigation. International Journal of Research in Choral Singing (IJRCS), 4 (1), 47-59 download PDF - show abstract
The required pitch range (RPR), i.e. the pitch range that is determined by the music to be sung, is dependent on voice class (most commonly: soprano, alto, tenor or bass). Ideally, it should lie well within the boundaries of the physiologic voice range. In amateur choir singing however, the individual singer's choice of voice class does not necessarily result in optimal use of vocal potential. This study tries to establish an objective, quantitative method to determine voice class, and to highlight unused potential as regards voice range.

Twenty-one members of an amateur choir (15 female, 6 male) were examined by means of standard voice range profile (VRP) measurement. The RPR (as defined by the singers' chosen voice class) was compared to maximum phonational frequency range (MPFR) as determined by the physiological VRP measurement. The difference between the upper limit of the RPR and the highest pitch in the VRP, expressed in semitones, was defined as ``upper reserve'' (UR); the difference between the lower limit of the RPR and the lowest pitch measured with the VRP was defined as the ``lower reserve'' (LR). The ``tessitura shift'' (TS) was defined as half the difference between upper and lower reserve [ TS = (LR - UR) / 2 ]. It is a measure of the offset of the RPR in relation to the MPFR, expressed in semitones.

The average physiologic voice range was 37.7 semitones (min 31, max 45). With the exception of the sopranos, all voice classes had more upper reserve than lower reserve, which was reflected by the average TS per voice class: soprano 2.33; other voice classes: -2.83 to -6.3. Results imply that individual singers might profit from changing their voice class (from soprano to alto, or vice versa), in order to better exploit their physiological voice range.

We concluded that upper and lower reserve measurements are well suited to indicate the degree of voice usage in extreme frequency ranges, whereas the TS can be used as an indicator of the ``alignment'' of RPR within the physiological voice range. Amateur choir singers' choice of voice class is a strategic decision that might crucially influence the singers' phonatory behavior, and thus their long-term vocal health. The indicators presented in this study may be useful for making such a decision.
A8. Christian T. Herbst, Jan G. Švec (2012). Adjustment of glottal configurations in singing. Journal of Singing, 70 (3), 301-308
A7. Christian T. Herbst (2012). Freddie Mercury - Akustische Stimm-Analyse. L.O.G.O.S. Interdisziplinär, 20 (3), 174-183 download PDF - show abstract
In dieser Studie wurde das öffentlich zugängliche Tonmaterial des Sängers Freddie Mercury akustisch analysiert. Es wurde eine mittlere Sprechstimmlage von ungefähr 109 bis 128 Hertz und ein Singstimmumfang von drei Oktaven (G bis g'', ca. 98 -- 784 Hz) festgestellt. Freddie Mercury war von der Sprechstimmlage her Bariton, sang jedoch meistens in Tenorlage. Das Stimmtimbre zeigte sich sehr variabel. Freddie Mercury sang sowohl im Brust- als auch im Falsett-Register, der Grad der glottischen Adduktion wurde abhängig vom ästhetischen Kontext entlang der Dimension "behaucht"/"gepresst" variiert. Die Stimme hatte ein unregelmäßiges und schnelles Vibrato (ca. 7 Hz) mit relativ weiter Auslenkung (ca. 1.5 Halbtöne). Das stellenweise "raue" Stimmtimbre ist auf subharmonische Oszillations-Phänomene (Periodenverdopplung) im Larynx zurückzuführen. Der Gesamteindruck einer Stimme, welche bis ans Limit ausgereizt wurde, ist durchaus kompatibel mit der exzentrischen Künstlerpersönlichkeit Freddie Mercurys.

This study provides an acoustical analysis of Freddie Mercury's voice, mostly based on the commercially available a-cappella sound material. The average speaking fundamental frequency was in the range of 109 -- 128 Hz and the singing voice range stretched across three octaves (G2 -- G5, ca. 98 -- 784 Hz). Theses results suggest that Freddie Mercury was a Baritone who sang as a Tenor. Being able to flexibly adjust his voice timbre, he sang in both chest and head (falsetto) voice. He was capable of manipulating glottal adduction along the dimension of breathy vs. pressed, varying with aesthetical context. Freddie Mercury's voice was characterized by an irregular and fast vibrato (ca. 7 Hz) with a relatively large amplitude of about 1.5 semi-tones. The perceptually rougher sounds were likely to be caused by subharmonic oscillatory phenomena (period doubling, tripling and quadrupling) in the larynx. In conclusion, the collected data suggests that Freddie Mercury drove his voice well to its limits, which is in good agreement with his eccentric stage persona.
A6. Christian T. Herbst, Qingjun Qiu, Harm K. Schutte, Jan G. Švec (2011). Membranous and cartilaginous vocal fold adduction in singing. Journal of the Acoustical Society of America, 129 (4), 2253-2262 - show abstract
While vocal fold adduction is an important parameter in speech, relatively little has been known on the adjustment of the vocal fold adduction in singing. This study investigates the possibility of separate adjustments of cartilaginous and membranous vocal fold adduction in singing. Six female and seven male subjects, singers and non-singers, were asked to imitate an instructor in producing four phonation types: ``aBducted falsetto'' (FaB), ``aDducted falsetto'' (FaD), ``aBducted Chest'' (CaB), and ``aDducted Chest'' (CaD). The phonations were evaluated using videostroboscopy, videokymography (VKG), electroglottography (EGG), and audio recordings. All the subjects showed less posterior (cartilaginous) vocal fold adduction in phonation types FaB and CaB than in FaD and CaD, and less membranous vocal fold adduction (smaller closed quotient) in FaB and FaD than in CaB and CaD. The findings indicate that the exercises enabled the singers to separately manipulate (a) cartilaginous adduction and (b) membranous medialization of the glottis though vocal fold bulging. Membranous adduction (monitored via videokymographic closed quotient) was influenced by both membranous medialization and cartilaginous adduction. Individual control over these types of vocal fold adjustments allows singers to create different vocal timbres.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A5. Christian T. Herbst, W. T. S. Fitch, Jan G. Švec (2010). Electroglottographic wavegrams: a technique for visualizing vocal fold dynamics noninvasively. Journal of the Acoustical Society of America, 128 (5), 3070-3078 - show abstract
A method for analyzing and displaying electroglottographic (EGG) signals (and their first derivative, DEGG) is introduced: the electroglottographic wavegram ("wavegram" hereafter). To construct a wavegram, the time-varying fundamental frequency is measured and consecutive individual glottal cycles are identified. Each cycle is locally normalized in duration and amplitude, the signal values are encoded by color intensity and the cycles are concatenated to display the entire voice sample in a single image, similar as in sound spectrography. The wavegram provides an intuitive means for quickly assessing vocal fold contact phenomena and their variation over time. Variations in vocal fold contact appear here as a sequence of events rather than single phenomena, taking place over a certain period of time, and changing with pitch, loudness and register. Multiple DEGG peaks are revealed in wavegrams to behave systematically, indicating subtle changes of vocal fold oscillatory regime. As such, EGG wavegrams promise to reveal more information on vocal fold contacting and de-contacting events than previous methods.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A4. Christian T. Herbst, David Howard, Josef Schlömicher-Thier (2010). Using electroglottographic real-time feedback to control posterior glottal adduction during phonation. Journal of Voice, 24 (1), 72 - 85 - show abstract
The goal of this pilot study was to determine whether the ability to change the degree of posterior glottal adduction (PGA) during phonation can be acquired more easily with the aid of electroglottographic (EGG) real-time feedback. The subject was a 37-year-old untrained female with habitually breathy voice. Before the experiment, she participated in one voice coaching session where exercises for increasing PGA were explained and executed. During the experiment, phonation has been monitored simultaneously with videostroboscopy, electroglottography, and audio recording. While phonating, the subject saw amplitude and period normalized EGG waveform representing one glottal cycle consecutively changing over time. The assignment was to increase the width of the EGG waveform during phonation. Laryngeal imaging revealed a posterior glottal chink during habitual phonation. The subject could only introduce intentional changes into the EGG waveform after its relevance had been explained, and after recapitulation of the exercises of the voice coaching session: An increase of the EGG waveform width coincided with the increase of high-frequency partials and an increase of PGA. For pitches B3 and B4, full glottal closure could be achieved. At G5, a reduction of the posterior glottal chink occurred. The findings of this study suggest that the skill to control the degree of PGA can be acquired, and that EGG real-time feedback can be a crucial element in optimizing the process of skill acquisition, but only if (1) the context and nature of the feedback is explained and (2) proper instructions are provided. The EGG contact quotient might not be sensitive to changes of PGA in falsetto phonation.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A3. Christian T. Herbst, Jan G. Švec, Sten Ternström (2009). Investigation of four distinct glottal configurations in classical singing - a pilot study. JASA-EL, 125 (3), EL104-EL109 - show abstract
This study investigates four qualities of singing voice in a classically trained baritone: "naïve falsetto", "countertenor falsetto", "lyrical chest" and "full chest". Laryngeal configuration and vocal fold behavior in these qualities were studied using laryngeal videostroboscopy, videokymography, electroglottography, and sound spectrography. The data suggest that the four voice qualities were produced by independently manipulating mainly two laryngeal parameters: (1) the adduction of the arytenoid cartilages and (2) the thickening of the vocal folds. An independent control of the posterior adductory muscles versus the vocalis muscle is considered to be the physiological basis for achieving these singing voice qualities.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
A2. Christian T. Herbst (2007). Der Knabensolist in der Oper - Ein akustisches Portrait. L.O.G.O.S. Interdisziplinär, 15 (3), 166-174 - show abstract
Ein Mezzosopran des Tölzer Knabenchors sang bei den Salzburger Osterfestspielen 2006 den Yniold in Claude Debussy's "Pelléas et Mélisande". Während der Hauptprobe der Inszenierung wurde das Audio-Signal mit einem mit fixem Abstand zum Mund des Sängers befestigten Mikrofon abgenommen. Zusätzliche Evidenz wurde mittels Elektroglottografie bzw. Video-Endoskopie gewonnen. Ziel der Studie war die Klärung der Frage, ob durch die Sammlung und Interpretation objektiver Daten Aussagen über die angewandte Gesangstechnik getroffen werden können.

Mihilfe eines automatisierten computergestützten Prozesses wurden aus dem akustischen Signal Segmente mit Vokalen und stimmhaften Konsonanten extrahiert. Das darauf basierende Tessiturogramm zeigt eine auf h' zentrierte Normalverteilung der Tonhöhe. Im Schalldruckpegel-Histogramm kommen Ton-Segmente mit Schalldruckpegel um ca. 90 dB am häufigsten vor, als Spitzenwert wurden 112 dB bei einem Messabstand von 30 cm zwischen Mund und Mikrofon gemessen. Der Schalldruckpegel steigt quasi linear um 10 dB pro Oktave. Das Langzeitspektrum weist Formanten-Cluster zwischen 3000 und 4000 Hz bzw. 7300 und 8200 Hz auf.

Der berechnete Scheitelfaktor (crest factor) nimmt um ca. 3 dB pro Oktave ab, was als Indikator für einen Rückgang der hochfrequenten Partialtöne mit zunehmender Höhe anzusehen ist. Auffällig ist ein abrupter Abfall des Scheitelfaktors ab f'' (ca. 700 Hz). Entsprechende elektroglottografische Daten zeigen eine deutliche Veränderung der EGG-Wellenform ab f'', eine Verkürzung der glottischen Verschlussphase ist evident. Eine zusätzliche Untersuchung mittels Video-Endoskopie zeigt, dass ohne vom Sänger intendierten Registerausgleich keine Tonhöhensteigerung mehr möglich ist, sobald der musculus vocalis den Punkt seiner maximalen Kontraktion erreicht hat.

Es kann vermutet werden, dass ein reines "Lauter Singen" als Strategie für die grosse Bühne nicht ausreichend ist. Die geforderte Steigerung des Schalldruckpegels muss mit einer entsprechenden physiologischen Prädisposition und einer exzellenten Gesangstechnik einhergehen. Die gegenständliche Studie zeigt, dass der adäquate Register- bzw. Lagenausgleich ein wesentliches Merkmal der bühnentauglichen Stimme ist.
A1. Christian T. Herbst, Sten Ternström (2006). A comparison of different methods to measure the EGG contact quotient. Logopedics Phoniatrics Vocology, 31 (3), 126-138 - show abstract
The results from six published electroglottographic (EGG-based) methods for calculating the EGG contact quotient (CQEGG) were compared to closed quotients derived from simultaneous videokymographic imaging (CQKYM). Two trained male singers phonated in falsetto and in chest register, with two degrees of adduction in both registers. The maximum difference between methods in the CQEGG was 0.3 (out of 1.0). The CQEGG was generally lower than the CQKYM. Within subjects, the CQEGG co-varied with the CQkym, but with changing offsets depending on method. The CQEGG cannot be calculated for falsetto phonation with little adduction, since there is no complete glottal closure. Basic criterion-level methods with thresholds of 0.2 or 0.25 gave the best match to the CQKYM data. The results suggest that contacting and de-contacting in the EGG might not refer to the same physical events as do the beginning and cessation of airflow.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
top of page
Grants, scholarships and awards
G16. Van Lawrence Fellowship. National Association of Teachers of Singing (NATS); The Voice Foundation, May 1, 2023. [for the project: "Estimating vocal loudness across the singing voice range"]
G15. Research Grant. Land Salzburg (Salzburg County Government), Referat für Wissenschaft, Erwachsenenbildung, öffentliche Bibliotheken, September 1, 2019
G14. Annual Sataloff Award for Young Investigators. The Voice Foundation, April 1, 2016. [awarded for the Journal of Voice publication ``Phasegram analysis of vocal fold vibration documented with laryngeal high-speed video endoscopy'' by Christian T. Herbst, Jakob Unger, Hanspeter Herzel, Jan G. Svec, and Jörg Lohscheller.]
G13. APART Grant [Austrian Programme for Advanced Research and Technology]. Austrian Academy of Sciences, November 1, 2014
G12. Award. Croatian Choral Directors Association, April 1, 2014. [for Scientific Research in the Field of Chorusology]
G11. Society of Experimental Biology, Young Scientist Award, Animal Section: Runner Up. SEB Annual Main Meeting 2013, July 1, 2013. [for the contribution Christian T. Herbst, Angela S. Stoeger, Roland Frey, Jörg Lohscheller, Ingo R. Titze, Michaela Gumpenberger, W. Tecumseh Fitch. Sound production mechanism in elephant infrasound vocalizations.]
G10. AQL Best Paper Award. 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research, June 1, 2013. [for the contribution Christian T. Herbst, W. Tecumseh Fitch, Jörg Lohscheller, Jan G. Svec. Estimation of the vertical glottal shape based on empirical high-speed video and electroglottographic data.]
G9. 2nd Annual Hamdan International Presentation Award. The Voice Foundation's 42nd Annual Symposium, June 1, 2013. [for the contribution Christian T. Herbst, Angela S. Stoeger, Roland Frey, Jörg Lohscheller, Ingo R. Titze, Michaela Gumpenberger, W. Tecumseh Fitch. Sound production mechanism in elephant infrasound vocalizations.]
G8. Van Lawrence Prize. British Voice Association, May 1, 2012. [in recognition of the contribution to the field of voice: Christian T. Herbst, Jan G. Švec, J. Schlömicher-Thier, W. Tecumseh Fitch: Analyzing the female 'middle register' with EGG wavegrams] - show abstract
The choice of singing register and the degree of vocal fold adduction are two concepts that are not easily discriminated by inexperienced singers. This is particularly true for the mid range (pitch C4 -- C5) of untrained female classical singers, where adducted falsetto, the desired sound quality in this range, is rarely observed. As an underlying physiological principle, vocal fold adduction can be separately controlled by (a) cartilaginous adduction, i.e. the adduction of the posterior glottis via the arytenoids (controlled by the singer with the degree of ``breathiness'' / ''pressedness''); and by (b) membranous medialization through vocal fold bulging (controlled by the choice of vocal register, i.e. chest vs. falsetto) [1].

In this study, singing exercises and instructions for adjusting adductory settings (cartilaginous adduction vs. membranous medialization) in the female mid-range were performed by both trained and untrained female classical singers. Phonation was monitored by acoustic recording, electroglottography (EGG) and laryngeal imaging. EGG wavegrams [2], a novel method for displaying EGG signals, were used for data analysis.

EGG wavegram data revealed distinct differences between the targeted phonation types for each individual. The observed differences established themselves as (a) presence/absence of vocal fold contact; (b) duration of vocal fold contact per glottal cycle; (c) changes in the overall EGG signal amplitude; (d) distinctness of opening/closing events; (e) perturbations seen in the wavegrams. Inter-subject data variation suggests that the individual's anatomy influences vocal fold contact in singing. EGG wavegrams proved to be useful in documenting changes of both singing register and glottal adduction.
G7. Research Grant: VOICE -- Vision On Innovation for Choral music in Europe. Research programme on vocal health for amateur singers. European Commission, Education, Audiovisual and Culture Executive Agency (EACEA), March 1, 2012
G6. Dean's Prize. Palacký University Olomouc, Faculty of Science, December 1, 2011. [for the publication Herbst CT, Qiu Q., Schutte HK, Švec JG: Membranous and cartilaginous vocal fold adduction in singing. Journal of the Acoustical Society of America 129(4): 2253-2262 (2011)]
G5. Dean's Prize. Palacký University Olomouc, Faculty of Science, May 1, 2011. [for the best PhD project in Physics]
G3. SEMPRE Conference Award. Society for Education, Music and Psychology Research, May 1, 2006
G2. Promotion Grant ("Förderstipendium"). University Mozarteum, December 1, 2004
G1. ERASMUS Mobility Grant. University Mozarteum, December 1, 2003
top of page
Books and book chapters
B6. Mauro Fiuza, Flavia Pl. Caraibas, Filipa M. B. La, Christian T. Herbst (2022). Proceedings of the 7th International Physiology and Acoustics of Singing Conference. Instituto de Forma{\c c}{\~a}o em Voz - IFV; Faculdade Novo Horizonte FNH; Universidad Nacional de Educaci{\'o}n a Distancia (UNED) download PDF
B5. Christian T. Herbst (2020). Stimmanalyse und -visualisierung -- leicht gemacht?. in: Stimmen hören - Potentiale entwickeln - Störungen behandeln. Logos Verlag, Fuchs, Michael, 155-178
B4. Christian T. Herbst, David Howard, Jan G. Svec (2019). The sound source in singing -- basic principles and muscular adjustments for fine-tuning vocal timbre. in: The Oxford Handbook of Singing. Oxford University Press, D. Howard, J. Nix, G. Welch Eds.
B3. Christian T. Herbst (2016). Biophysics of Vocal Production in Mammals. in: Vertebrate Sound Production and Acoustic Communication. Springer, Fitch, W. Tecumseh and Popper, Arthur and Suthers, Rod, 159-189 - show abstract
Most mammals, including humans, produce sound in agreement with the myoelastic-aerodynamic theory (MEAD): by converting aerodynamic energy into acoustic energy via flow-induced self-sustaining oscillation of the vocal folds or other laryngeal tissue. The generated laryngeal sound is filtered by the vocal tract and radiated from the mouth and/or the nose.

In this chapter, some basic biophysical principles of the MEAD theory are explained, mostly based on research done in humans. Empirical evidence and concepts for nonhuman mammals are provided when available and applicable.

In particular, biomechanical properties of vibrating laryngeal tissue and respective vibratory modes are described, and the oscillatory components and forces necessary for flow-induced self-sustaining vibration are discussed. The notions of fundamental frequency and its control, periodicity, and irregularity are explored, followed by a basic description of non-linear phenomena (NLP) such as bifurcations, subharmonics, or chaos. Subglottal pressure and glottal airflow are essential parameters of voice production, and their influence on the generated voice source spectrum is considered. Finally, linear and non-linear effects of the vocal tract are reviewed, and the efficiency sound production is discussed.
B2. Christian T. Herbst, Jan G. Svec (2015). Basics of voice acoustics -- a tutorial. in: Sataloff's Textbook of Otolaryngology. JP medical publishers, Sataloff, R. T.
B1. Christian T. Herbst (2012). Investigation of glottal configurations in singing. Palacký University in Olomouc, the Czech Republic (Doctoral Dissertation) download PDF
top of page
Posters (presented at conferences)
P9. Michaela Mayr, Kate Emerich, Markus Kofler, Christian Kremser, Ansgar Rudisch, Helena Talasz, Christian T. Herbst (2022). Time-synchronized observation of pelvic floor, abdomen, and thorax during singing using MRI -- a feasibility study. 7th International Physiology and Acoustics of Singing Conference (PAS7+), Instituto de Forma{\c c}{\~a}o em Voz - IFV; Faculdade Novo Horizonte FNH; Universidad Nacional de Educaci{\'o}n a Distancia (UNED), May 7, 2022. download PDF - show abstract
The pelvic floor (PF) plays a crucial and often-mentioned role in many pedagogical treatises of singing. In
contrast, surprisingly little empirical evidence about the PF's actual contribution to respiration in singing
is available. In particular, it is unknown to date in which fashion movement of the PF is synchronized
with that of the abdomen and the thorax.
Addressing this issue, a series of studies currently pursued by our group investigates the possibility of
synchronized quantitative assessment of the movement of the PF, the abdomen, and the thorax during
breathing, speech, and singing. For this purpose, a cohort of thirteen female singers performed a number
of phonatory tasks during dynamic magnetic resonance imaging (MRI) recordings in a 1.5-Tesla whole
body MR-scanner. Here, we present the prospective data analysis approach, applied to a limited pilot
data set.
Within the DICOM time-series of a sung phrase, anatomically determined scan lines were used to generate
kymographic data of the thoracic diaphragm, the thorax diameter, the pelvic floor, the anterolateral
abdominal muscle thickness, and abdominal diameter at the umbilical level. The respective structures
were then traced with manually determined Bézier curves. These Bézier curves were then algorithmically
converted to time-varying displacement offsets of the structures of interest, and the resulting data were
calibrated in time and space using information from the MRI frame rate and voxel size. Due to the
quasi-sinusoidal nature of the structure displacements during the analyzed phonatory task, simple sine
waves could be fitted to the calibrated displacement data with a Bayesian-enhanced linear regression
method. The resulting sine wave phase data enables quantitative assessment of the phase differences in
the movement of the structures of interest, allowing further data aggregation and statistical comparison
across participants and phonatory conditions.
The results from our series of studies are expected to corroborate known and produce novel insights into
the synchronized movement of the various sub-systems of the breathing apparatus during speaking and
singing, in particular adding novel and dearly needed knowledge about the contribution of the pelvic
floor.
P8. Manuel Brandner, Theodora Nestorova, Bruno Gingras, Christian T. Herbst (2022). Vocal vibrato characteristics in Opera, Operetta, and Schlager over the years. 7th International Physiology and Acoustics of Singing Conference (PAS7+), Instituto de Forma{\c c}{\~a}o em Voz - IFV; Faculdade Novo Horizonte FNH; Universidad Nacional de Educaci{\'o}n a Distancia (UNED), May 5, 2022. download PDF - show abstract
One core aspect of the aesthetics in singing is vocal vibrato which was already extensively discussed in
1936 by Seashore. Four parameters, namely the rate and the extent of the modulation of the frequency
and the amplitude, were identified. Since then, vocal vibrato have been studied vastly in the scientific
community. To the authors` knowledge no study yet has compared vibrato characteristics of different
genres of the last decades. A data selection was undertaken, specifically looking for songs that have
been sung in different styles (Opera, Operetta, and Schlager). A focus of this work has been directed
to frequency modulation. The two most notable findings were that (a) overall, vibrato rate was slightly
higher in Schlager as in opera and operetta; and (b) in Schlager, vibrato rate decreased over time from
about 7 Hz in 1930 to about 6 to 6.5 Hz in 2019. Although these results should be interpreted with
caution due to the limited sample size, our data suggest that Schlager, as a historical aesthetic category,
has some unique characteristics with respect to vibrato.
P7. Christian T. Herbst, Tamara Mauder, Maxime Garcia, Vit Hampala, Ingo R. Titze, W. Tecumseh Fitch, Gerald Weissengruber (2018). Active or passive? -- A Closer Look at the ``Purring'' Sound Production Mechanism of Domestic Cats (Felis silvestris catus). 47th Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 31, 2018. - show abstract
Most mammals and birds produce vocal sounds according to the myo-elastic aero-dynamic (MEAD) principle, through self-sustaining oscillation of laryngeal or syringeal tissue. In contrast, purring cats are believed to produce their low-frequency vocalizations through active muscle contractions (AMC), where neurally driven EMG burst patterns (typically at 20 -- 30 Hz for cat purrs) cause the intrinsic laryngeal muscles to actively modulate the respiratory airflow. Unfortunately, direct empirical evidence for this AMC mechanism is sparse [1].

Here, the fundamental frequency (fo) ranges of eight domestic cats (Felis silvestris catus) were investigated in an excised larynx setup with computer-controlled and manual pressure sweeps, in an attempt to rule out the MEAD voice production mechanism for low-frequency vocalizations.

Surprisingly, all eight larynges produced self-sustaining oscillations at the typical rates of cat purring. In six of the eight specimens gradual fo variation in the ranges of about 15 to 200 Hz occurred, thus creating an fo continuum between purrs and other stereotypical call types. Histological analysis of the investigated larynges revealed the presence of connective tissue embedded in the vocal fold, with up to 4 mm in diameter [2]. This added mass might be responsible for achieving the low fo values observed.

Our data demonstrate the possibility for purring-like sound production at typical frequencies of 25 to 30 Hz according to the MEAD principle, without the need for cyclic activation of the intrinsic musculature at the rate of vocal fold vibration (AMC). Short of constituting an alternative hypothesis for the purring vocal production in cats, our findings give reason to assume that cat purring is facilitated by special anatomical adaptation, at least in domestic cats.
P6. Jan G. Svec, Hana Sramkova, Svante Granqvist, Christian T. Herbst (2016). Update on the Recommended Maximum Background Noise Levels for Voice Measurements. 10th International Conference on Voice Physiology and Biomechanics (ICVPB), Universidad Tecnica Federico Santa Maria, Vina del Mar, Chile. March 17, 2016. presented by Christian T. Herbst. download PDF
P5. Christian T. Herbst, Hiroki Koda, Takumi Kunieda, Juri Suzuki, Takeshi Nishimura (2016). Electroglottographic assessment of in vivo Japanese Macaque sound production. 10th International Conference on Voice Physiology and Biomechanics (ICVPB), Universidad Tecnica Federico Santa Maria, Vina del Mar, Chile. March 16, 2016. download PDF
P4. Christian T. Herbst, W. T. S. Fitch, Jan G. Švec (2010). Wavegrams: A new technique for visualizing vocal fold dynamics noninvasively using electroglottographic signals. COST Action 2103 Summer School - Modeling and Assessment of the Human Voice, Erlangen, Germany. September 1, 2010. download PDF - show abstract
Electroglottography (EGG) is a non-invasive low-cost method to monitor relative vocal fold contact area (VFCA) during phonation. Increase and decrease of VFCA is related to glottal closing and opening, respectively. In this study, a new method for analyzing and displaying EGG signals (and their first derivative, DEGG) is introduced: the electroglottographic wavegram (short: wavegram). It (a) allows monitoring the EGG (or DEGG) signal over time; and (b) provides an intuitive means for quickly assessing the duration of glottal closure and its variation over time.

Based on the EGG or DEGG signal, the time-varying fundamental frequency is calculated and consecutive individual glottal cycles are identified. Each cycle is locally normalized in duration and amplitude and the cycles are then plotted consecutively. The plotting process resembles that of a spectrogram, but instead of spectral amplitudes, the signal deflections are encoded by color intensity. The wavegram presents the time on the x-axis, normalized cycle duration on the y-axis and the signal deflection on the color-intensity-coded z-axis.

The wavegram reveals changes of vocal fold contact duration in time. It also shows phenomena that remain overlooked in traditional EGG-display techniques, such as multiple DEGG peaks. While these phenomena have usually been considered artifacts, the wavegram displays revealed consistent behavior of these peaks in a large number of subjects. They indicate subtle changes of vocal fold oscillatory regime.

Wavegram analysis suggests that the phenomenon of vocal fold closing and opening is more complex than commonly assumed. Rather than a single event, vocal fold opening and closing should be considered a sequence of events, taking place over a certain period of time. Data show that the sequence of these events can change with pitch, loudness and register. The EGG signal thus promises to reveal more (physiological) information on vocal fold closure and opening events than previously thought.
P3. Josef Schlömicher-Thier, Donald G. Miller, Hubert Noe, Christian T. Herbst (2009). Yodeling - Acoustic and Physiologic Properties. The Voice Foundation's 38th Annual Symposium, June 1, 2009. download PDF - show abstract
Yodelling is sustained phonation with nonsensical combinations of vowels and consonants. It is characterized by drastic timbral changes, caused by (a) abrupt changes of laryngeal mechanism (chest vs. falsetto registers); and (b) typical choice of vowels. The register transitions coincide with relatively large intervallic leaps.

The goal of this study was to better understand physiologic and acoustic properties of yodelling. In particular, the relationship between voice source characertistics and the vocal tract was investigated. Two yodellers (one female, one male), originating from the Austrian regions of Salzburg and Styria, were examined by means of flexible video-endoscopy, electroglottography and recording of acoustic data.

Preliminary results suggest that formant tuning plays an important role in yodelling. It is hypothesized that yodellers intuitively choose certain combinations of fundamental frequency and vowel, in order to facilitate the abrupt changes of laryngeal mechanism that are typical for yodelling.
P2. Christian T. Herbst, Elke Duus, Harald Jers (2009). Voice category assessment of amateur choir singers. 4th International Conference on the Physiology and Acoustics of Singing, January 1, 2009. download PDF - show abstract
The tessitura, i.e. the pitch range that is determined by the music to be sung, is dependent on voice category. Ideally, it lies well within the boundaries of the physiologic voice range. In amateur choir singing however, the individual singer's choice of voice category does not necessarily result in optimal use of vocal potential. This study tries to establish an objective, quantitative method to determine voice category and to highlight unused potential as regards voice range.

21 members of an amateur choir (15 female, 6 male) have been examined by means of standard voice range profile (VRP) measurement. In order to collect data of 'habitual' singing, the subjects have also been asked to sing a short piece of music of their own choice in convenient key, tempo and loudness. The tessitura (as determined by the singers chosen voice category) has been compared to (a) the pitch range as determined by the VRP measurement; and (b) with the tessitura of the 'habitual' singing. The difference between the upper limit of the tessitura and the highest pitch in the VRP, expressed in semi-tones, has been defined as 'upper reserve' (UR); the difference between the lower limit of the tessitura and the lowest pitch measured with the VRP has been defined as the 'lower reserve' (LR). The 'reserve index' (RI) has been defined as the relation between upper and lower reserve [ RI = (OR - UR) / (OR + UR) ].

In average, the sopranos were 10 years older than the altos (overall average age: 49,7 years). The average physiologic voice range was 37,9 semitones (min 31, max 45). Older females had less physiologic voice range, but their habitual singing was generally higher. With the exception of the sopranos, all voice categories had more upper reserve than lower reserve, which is reflected by the average reserve index per voice category: soprano -0,36; other voice categories: 0,11 - 0,54. The reserve index was inversely related to age ( RI = 0,855 - age * 0,013).

In the examined choir, older females sang soprano in a relatively high tessitura (in some cases reaching the upper limit of the physiologic voice range), whereas younger females sang alto in a relatively low tessitura. This sub-optimal situation could be caused by insufficient vocal technique, or it might be explained in a sociologic/social context.
P1. Christian T. Herbst, Jan G. Švec, Qingjun Qiu, Harm Schutte (2007). Overall and posterior glottal adduction in singing. 7th Pan European Voice Conference (PEVOC), Groningen, The Netherlands. August 1, 2007. download PDF - show abstract
It is known that glottal adduction can be adjusted both posteriorly by PCA/LCA/IA muscles as well as in overall by the TA muscles. A previously conducted pilot study on a baritone suggested that an independent control of the posterior and TA adduction allows achieving better flexibility in controlling the singing voice quality. The goal of this study was to design phonatory exercises to isolate these two types of glottal adduction. Four extreme phonation types were targeted, using the chest and falsetto registers with and without breathiness: a) 'naïve' falsetto (breathy), b) 'resonant falsetto', c) 'light chest' (breathy) and d) 'dramatic/operatic chest'.

6 female and 6 male singers and non-singers were asked to imitate the instructor (i.e. the baritone who participated in the previously conducted pilot study), producing those 4 phonation types at a pitch located within the range of the chest/falsetto register transition (C#4 to F4). In order to maintain the desired registration (chest or falsetto), the target notes were reached by singing a descending (for falsetto) or ascending (for chest) scale of five notes. (The subjects were asked not to 'blend or mix the registers'). The phonation was monitored by videostroboscopy, videokymography (VKG), electroglottography (EGG) and audio recording.

The results showed distinct laryngeal configurations and vocal fold vibration characteristics for the four phonation types. All subjects showed less adducted posterior glottis in the two breathy phonation types than in the non-breathy phonations. In some cases, the arytenoid processes were clearly vibrating during the breathy phonations. All subjects had mucosal waves and sharp lateral peaks in VKG when phonating in 'dramatic/operatic chest' voice. In 9 subjects, mucosal waves of some degree were found in all phonation types, i.e.,even in both the falsetto phonations.

The findings of this study suggest that the designed phonatory exercises can be used to produce 4 extreme types of singing voice and to train singers to gain an independent control of the voice register and glottal adduction, making the voice more flexible. The data also showed that the closed quotient can in some subjects achieve larger values in 'resonant falsetto' than in 'light chest' phonations, implying that the closed quotient is not a sole indicator of the voice register in singing.
top of page
Other publications
O10. Christian T. Herbst (2024). Aus der Forschung für die Lehre (7): Blick über den Tellerrand: Stimmforschung kritisch beleuchtet. Vox Humana, 20 (1).
O9. Christian T. Herbst (2023). Aus der Forschung für die Lehre (4): Wer braucht formant tuning?. Vox Humana, 19 (1).
O8. Christian T. Herbst (2023). Aus der Forschung für die Lehre (5): Vokalbehandlung beim belting. Vox Humana, 19 (2).
O7. Christian T. Herbst (2023). Aus der Forschung für die Lehre (6): Holistisch oder spezifisch? -- ein systematischer Blick auf gesangspädagogische Konzpte. Vox Humana, 19 (3).
O6. Christian T. Herbst, Jan G. Svec, W. Tecumseh Fitch (2023). About the voice production mechanism of cat purring -- a critical appraisal of Remmers and Gautier, 1972. . download PDF - show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
O5. Christian T. Herbst (2022). Aus der Forschung für die Lehre (3): Brauchen wir überhaupt Stimmforschung? Bericht von der 14. Pan-European Voice Conference (PEVOC). Vox Humana, 18 (3).
O4. Christian T. Herbst (2022). Aus der Forschung für die Lehre (2): Lautstärke und Tragfähigkeit der Stimme. Vox Humana, 18 (2).
O3. Christian T. Herbst (2022). Aus der Forschung für die Lehre. Vox Humana, 18 (1).
O2. Thomas Ziegler, Christian T. Herbst (2019). On Scattering Coefficients and Fitting Density for Room Acoustic Simulation of Industry Halls. DAGA 2019 - 45. Jahrestagung für Akustik. download PDF
O1. Ben Larson, Christian T. Herbst, Eric Hunter (2013). EGG Wavegram Python Source Code Tutorial. The National Center for Voice and SpeechOnline Technical Memo No. 16. download PDF
top of page
Conference talks and lectures
C164. Christian T. Herbst (2024). Rückblick und Ausblick: 20 Jahre Stimmforschung. Stimmwelten (invited lecture), Universitätsklinik für Hals-, Nasen- und Ohrenkrankheiten, Kopf- und Halschirurgie, Inselspital Bern, Bern, Switzerland. April 27, 2024.
C163. Christian T. Herbst (2024). Live-Endoskopie mit Einblicken in den Stimmapparat. Stimmwelten (invited lecture), Universitätsklinik für Hals-, Nasen- und Ohrenkrankheiten, Kopf- und Halschirurgie, Inselspital Bern, Bern, Switzerland. April 27, 2024.
C162. Christian T. Herbst (2024). Parallelen der Stimmproduktion bei Menschen und anderen Säugetieren. 20. Leipziger Symposium zur Kinder- und Jugendstimme (invited lecture), Universitätsklinikum Leipzig, Leipzig. February 23, 2024.
C161. Christian T. Herbst (2023). An overview of mammalian voice production mechanisms. XXVIII International Bioacoustics Congress (invited lecture), Sapporo, Japan. October 29, 2023.
C160. Christian T. Herbst (2023). What is voice? - an introduction to the acoustics and physiology of voice production. 11th Voice workshop of the Korean Society of Laryngology, Phoniatrics and Logopedics (invited lecture), Soul, South Korea. October 21, 2023.
C159. Christian T. Herbst (2023). Mechanisms of production of nonlinear phenomena: from the vocal anatomy to biomechanical simulations. Nonlinear phenomena in vertebrate vocalisations: mechanisms and communicative functions (invited lecture), St. Etienne, France. June 14, 2023.
C158. Christian T. Herbst (2023). Bioacoustic signal analysis with Praat -- lessons learned, and questions asked. Nonlinear phenomena in vertebrate vocalisations: mechanisms and communicative functions (invited lecture), St. Etienne, France. June 14, 2023.
C157. Shaofeng Zheng, Kevin Rose, Christian T. Herbst, David Meyer (2023). Elite Singers Speech and Singing Voice Type - A Correlational Study. 52nd Annual Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 4, 2023. presented by Shaofeng Zheng.
C156. Shanshan Zhang, Christian T. Herbst, Katrina Miller, David Meyer (2023). Quantifying Talk-Time in Singing Instruction: an Automated Method. 52nd Annual Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 4, 2023. presented by Shanshan Zhang.
C155. Christian T. Herbst, Brad Story, David Meyer (2023). Acoustical Theory of Vowel Modification Strategies in Belting. 52nd Annual Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 1, 2023. - show abstract
Belting is a primary vocal quality of Contemporary Commercial Music (CCM). Various authors have argued that belting is to be produced with a ``speech-like'' sound production, with the first and second supraglottal vocal tract resonances (f_{R1} and f_{R2}) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified by previous authors as a dominant second harmonic in the radiated spectrum, possibly enhanced by tuning the lowest supraglottal resonance to that harmonic (f_{R1}\approx2f_{o}).

Conceptually, it is not clear how both these concepts -- (a) phonating with ``speech-like'', unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by the first vocal tract resonance f_{R1} -- can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low f_{R1}, such as [i] or [u], might have to be modified considerably (by raising f_{R1}) in order to phonate at higher pitches.

These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The strongest harmonic of the radiated spectrum was assessed in total of 12987 simulations, covering a parameter space of 37 different fundamental frequencies (f_{o}) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels.

The results suggest that, for most vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced within a pitch range of about a musical fifth, centered on a fundamental frequency of about half the respective vowel's first vocal tract resonance (i.e., f_{o}\approx0.5f_{R1}). In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation -- considering the relative prevalence of the different vowels in American English -- suggests that, historically, belting with a f_{R1}\approx2f_{o} resonance tuning was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that -- on acoustical grounds -- the pedagogical commandment for belting with unmodified, ``speech-like'' vowels can not always be fulfilled.
C154. Christian T. Herbst (2023). Mammalian Voice Production Mechanisms. CogCom/BeCogBio Seminar (invited lecture), University of Vienna, Vienna, Austria. May 15, 2023.
C153. Christian T. Herbst (2023). Mixed Voice. Universal Voice Symposium (invited lecture), Amsterdam, The Netherlands. April 2, 2023.
C152. Christian T. Herbst (2022). Voice Science--do we need it? Finding meeting points between the science and the craft.. 14th Pan-European Voice Conference (PEVOC) (invited lecture), Estonian Academy of Music and Theatre, Tallinn, EE. August 27, 2022.
C151. Christian T. Herbst (2022). From sound intensity to loudness -- a perceptually oriented augmentation of the voice range profile. 14th Pan-European Voice Conference (PEVOC), Estonian Academy of Music and Theatre, Tallinn, EE. August 27, 2022.
C150. Christian T. Herbst, Brad H. Story (2022). Simulation of Vocal Tract Resonance Tuning Strategies With Respect to Fundamental Frequency and Voice Source Spectral Slope. 51th Annual Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 5, 2022. - show abstract
A well-known concept of voice pedagogy is ``formant tuning''. Specifically, the lowest two vocal tract resonances (R1, R2) are systematically tuned to harmonics of the laryngeal voice source via the chosen ``vowel color'', in order to increase the level of radiated sound. While this concept has been well documented for certain vowels and fundamental frequencies (mostly targeting the passaggio region), a systematic evaluation of the concept is still outstanding.

Addressing this issue, the effect of R1 and R2 variation on the generated sound level was systematically evaluated in silico across the entire fundamental frequency range of classical singing. A previously introduced low-dimensional computational model was used to generate 10000 vocal tract transfer functions equally distributed across the entire vowel space of females and males. For all these transfer functions, the relative sound output level was computed for musical pitches from C2 (ca. 65.4 Hz) to G6 (ca. 1568.0 Hz). Simulating different voice source ``strengths'' from ``flutey'' to ``brassy'', the simulations were consecutively run with three harmonic series having spectral slopes of -6, -12, and -18 dB/oct. Addressing the non-linearities of human sound level perception, all resulting relative levels were converted to dB(A).

Substantially different strategies for optimized sound output emerged for low vs. high voices. At low pitches, formant tuning only has a marginal effect, due to the close spacing of the harmonics. Low voices may rather rely on a strong voice source with a shallow spectral slope(stronger high-frequency harmonics). In contrast, at higher pitches, proper resonance tuning strategies become more prevalent, and voice source strength plays an increasingly marginal role as fundamental frequency increases to the upper limits of the soprano range.

While this is only a theoretical study with certain limiting factors (e.g., neglecting non-linear coupling effects between vocal tract and source), the insights are nevertheless highly relevant for voice pedagogy. The data suggest that, in general, different voice classes (e.g. low male vs. high female) likely have fundamentally different strategies for optimizing sound output. In such cases, imitation learning may be less efficient, thus requiring the pedagogue to transcend their own singing technique during the teaching situation.
C149. Josipa Bainac Hausknecht, Kristen M. Murdaugh, Elke Nagl, Christian T. Herbst (2022). Global Inventory and Similarity Rating of Singing Voice Assessment Terms Used at English Speaking Academic Institutions. 51th Annual Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 5, 2022.
C148. Kristen Murdaugh, Olivier Perrotin, Bruno Gingras, Christian T. Herbst (2022). Correlating Perceptual and Spectral Aspects of Chiaroscuro in Singing -- A Pilot Study. 7th International Physiology and Acoustics of Singing Conference (PAS7+), Instituto de Forma{\c c}{\~a}o em Voz - IFV; Faculdade Novo Horizonte FNH; Universidad Nacional de Educaci{\'o}n a Distancia (UNED), May 7, 2022. - show abstract
Vocal pedagogy terms rooted in perception have been employed in voice literature and training for cen-
turies. One such traditional, tried and true concept is chiaroscuro (chiaro = bright; scuro = dark).
Research observations suggest that chiaroscuro can be examined on two levels: the physiological (physi-
cal) level and the perceptual (psychoacoustic) level. Yet to date, no empirical studies have been conducted
to investigate what exactly is being altered or modified physiologically to create chiaro and scuro and
where exactly these tone qualities exist within the spectrum of the radiated sound wave. To fill this
gap, a multipart study with an overarching goal of potentially relating acoustical and spectral sound
characteristics at various perceptual levels of chiaroscuro to the respective underlying physiological voice
production gestures is being conducted, with the first part detailed in this presentation investigating
whether certain aspects of the radiated spectral composition of the voice are systematically relevant for
the perception of chiaroscuro and chiaro and scuro, respectively.
Informal listening experiments have a priori identified four potential acoustic and spectral features that
may be relevant to the perception of chiaroscuro: (a) overall sound level; (b) global frequency shifts of
formant frequencies; (c) spectral slope; and (d) singers' formant cluster level. In this study, a cohort of
twelve experienced singing voice pedagogues were asked to rate the effect of these four sound modification
classes vis-a-vis their perception of 1) the degree of chiaroscuro as a Gestalt principle (task 1); and 2)
the degree of chiaro and scuro individually within each presented sound sample (task 2).
The perceptual ratings of chiaroscuro as a Gestalt principle and chiaro and scuro individually varied
drastically from participant to participant, with correlations and trends in the data being largely inde-
terminable, with the exception of global frequency shifts of formant frequencies and negative spectral
slope shifts. It can be theorized, within the context of the limitations of this pilot study and its data
set, that singing voice pedagogues have drastically varying perceptual definitions of chiaro, scuro, and
chiaroscuro as a whole, highlighting the importance of further chiaroscuro related research.
C147. Matthias Echternach, Christian T. Herbst, Marie Köberlein, Brad H. Story, Michael Döllinger, Donata Gellrich (2022). Non-linear source-tract interactions in classical singing. 7th International Physiology and Acoustics of Singing Conference (PAS7+) (invited lecture), Instituto de Forma{\c c}{\~a}o em Voz - IFV; Faculdade Novo Horizonte FNH; Universidad Nacional de Educaci{\'o}n a Distancia (UNED), May 6, 2022. - show abstract
In recent studies, it has been assumed that vocal tract formants (F n ) and the voice source could interact.
However, there are only few studies analyzing this assumption in vivo. Here, the vowel transition /i/--
/a/--/u/--/i/ of 12 professional classical singers (6 females, 6 males) when phonating on the pitch D4
[fundamental frequency (f o ) ca. 294 Hz] were analyzed using transnasal high speed videoendoscopy
(20,000 fps), electroglottography (EGG), and audio recordings. Fn data were calculated using a cepstral
method. Source-filter interaction candidates (SFICs) were determined by (a) algorithmic detection of
major intersections of F n /nf o and (b) perceptual assessment of the EGG signal. Although the open
quotient showed some increase for the /i--a/ and /u--i/ transitions, there were no clear effects at the
expected F n /nf o intersections. In contrast, f o adjustments and changes in the phonovibrogram occurred
at perceptually derived SFICs, suggesting level-two interactions. In some cases, these were constituted by
intersections between higher nf o and F n . The presented data partially corroborates that vowel transitions
may result in level-two interactions also in professional singers. However, the lack of systematically
detectable effects suggests either the absence of a strong interaction or existence of confounding factors,
which may potentially counterbalance the level-two-interactions.
C146. Christian T. Herbst (2021). Registration: The Snake Pit of Voice Pedagogy. NATS Chat (invited lecture), National Association of Teachers of Singing (NATS), Jacksonville, FL. December 1, 2021.
C145. Kristen Murdaugh, Olivier Perrotin, Bruno Gingras, Christian T. Herbst (2021). Correlating Perceptual and Spectral Aspects of Chiaroscuro in Singing. KTH Popup Voice Science Break #9, KTH Royal Institute of Technology, Stockholm, Sweden. June 1, 2021. presented by Kristen Murdaugh.
C144. Christian T. Herbst, Vicky Ossio, Marcelo Levy, Jacob C. Dunn (2021). Generating alternative facts? -- Bioacoustic fundamental frequency estimation is a well-educated gamble. 50th Anniversary Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 1, 2021. - show abstract
Fundamental frequency (fo) is one of the most commonly reported attributes of the voice. In animal vocal communication, fo is one of the key attributes for discriminating the call types that make up a species' vocal repertoire. For reasons of experimental feasibility, fo data is typically extracted from acoustic signals produced in more or less ``realistic'' situations, which introduces a certain amount of ambiental noise into the acquired signals (e.g., simultaneous calls of conspecifics). A common method for fo extraction in bioacoustic research is the default autocorrelation (AC) ``pitch'' extraction provided in the Praat voice analysis software.

Here, we evaluate this approach , i.e., Praat AC fo computation from acoustic signals, using 1078 calls from twelve New World monkeys at La Senda Verde wildlife sanctuary (a private non-profit organization established in 2003 in the subtropical region east of the Bolivian Andes), representing six species. All calls were documented with simultaneous acoustic and electroglottographic (EGG) recordings, allowing us to compare results from these different methods. fo was analyzed with Praat's AC algorithm, systematically varying the ``voicing threshold'' and ``octave cost'' parameters.

The 50th percentiles, i.e., medians of EGG-based fo data per species, were between 11 % and 153 % lower than the fo computed from the corresponding acoustic signals, suggesting a considerable influence of the background noise levels, registered at 54 to 67 dB(C). Systematic variation of Praat's ``voicing threshold'' and ``octave cost'' parameters showed an equally dramatic influence on the computed fo. Tests with synthesized EGG signals suggest that these two parameters are expected to influence the computed fo data in the presence of voice irregularity (a-periodicity) and subharmonics, respectively.

In summary, our results suggest that previously reported animal bioacoustic fo data, if based on acoustic signals and computed with Praat's ``out of the box'' approach, should be interpreted with care. If possible, a physiological correlate of laryngeal voice production (such as the EGG signal), bypassing background noise issues, should be used as the basis of fo evaluation. Further research is needed to quantify the interaction of the examined Praat AC algorithm parameters with various degrees of a-periodicity and subharmonics.
C143. Christian T. Herbst (2021). Performance evaluation of subharmonic-to-harmonic ratio (SHR) computation. 50th Anniversary Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 1, 2021. - show abstract
Subharmonics are an important class of voice signals, relevant for speech, pathological voice, singing, and animal bioacoustics. They arise from special cases of amplitude (AM) or frequency modulation (FM) of the time-domain signal. Surprisingly, to date there is only one open source subharmonics detector available to the scientific community: Sun's subharmonic-to-harmonic ratio (SHR) [Sun, 2000, JVoice]. Here, this algorithm was subjected to a formal evaluation with two data sets of synthesized and empirical speech samples.

Both data sets consisted of electroglottographic (EGG) signals, i.e., a physiological correlate of vocal fold oscillation that bypasses vocal tract acoustics. Data Set I contained of 2560 synthesized EGG signals with varying degrees of AM and FM, fundamental frequency (fo), periodicity, and signal-to-noise ratio (SNR). Data Set II was made up of 25 EGG samples extracted from the CMU Arctic speech data base. For a ``ground truth'' of subharmonicity, these samples were manually annotated by a group of five external experts.

Analysis of the synthesized data suggested that the SHR metric was relatively robust as long as the subharmonic modulation extent was below 0.35 and 0.7 for the FM and AM scenarios, respectively. In the CMU Arctic speech data samples, the SHR analysis reached a maximum sensitivity of about 87 % with a specificity of over 90 %, but only for adaptive algorithm parameter settings. In contrast, the algorithm's default parameter settings could only successfully classify about 9 % of all subharmonic instances.

The SHR is a useful metric for assessing the degree of subharmonics contained in voice signals, but only at adaptive parameter settings. In particular, the frequency ceiling should be chosen as five times the highest fo, and the frame length as at least five times the largest fundamental period of the analyzed signal. For subharmonic classification a threshold of SHR ≥ 0.01 is recommended.
C142. Kristen M. Murdaugh, Josipa Bainac-Hausknecht, Christian T. Herbst (2021). In-Person or Virtual? -- Assessing the Impact of Covid-19 on the Teaching Habits of Voice Pedagogues. 50th Anniversary Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 1, 2021. - show abstract
The social distancing measures implemented world-wide in the wake of the novel Coronavirus (COVID-19) crisis have forced voice pedagogues to alter their teaching habits, likely shifting from customary in-person teaching to virtual teaching. An online survey, distributed world-wide in April/May 2020, investigated how singing voice pedagogues were impacted by the COVID-19 crisis. The collected responses from 387 survey participants suggest that, overall, voice teachers were only moderately satisfied with having to teach virtually, indicating that virtual voice teaching is not a sufficient replacement for in-person teaching. The participants indicated that during virtual teaching the singing voice can be assessed relatively well through features which provide both acoustic and visual clues. In contrast, depending on utilized technology, it may be harder to judge those aspects of the singing voice that are solely defined acoustically, such as dynamic range and spectral composition. This may be explained by limitations imposed by ``out of the box'' technology for online communication, which is typically optimized for speech instead of singing. This calls for better information on technological solutions for virtual voice teaching.
C141. Josipa Bainac-Hausknecht, Kristen M. Murdaugh, Johan Thomasson, Hubert Kerschbaum, Christian T. Herbst (2021). Assessing the Effects of Performance-related Arousal on Singing Voice Production -- A Single Subject Feasibility-Study. Virtually PAVA - Pan-American Vocology Association Symposium, Pan-American Voice Association, Salt Lake City, UT. January 1, 2021. presented by Josipa Bainac-Hausknecht. - show abstract
In contrast to insights from numerous well-conducted laboratory experiments, relatively little is known about the physiological processes of singing voice production on stage, and how these are potentially influenced by performance-related arousal. Addressing this question, an exploratory study was conducted to test the feasibility of wireless acquisition of pulse rate, electroglottographic (EGG) signals, and microphone signals during rehearsal and in a performance situation. A tenor in his second year of graduate singing studies was asked to sing Schubert's ``Der Neugierige'' in four different situations: (a) twice in a rehearsal situation, before and after his teacher's coaching input (R1 and R2), and (b) twice on stage, in a dress rehearsal and in a simulated concert in the presence of a mock theatre agent (S1 and S2). From each recording, the same 30 sustained notes were extracted. For each note, the averaged sound pressure level (SPL), vibrato frequency and amplitude, and EGG contact quotient (CQ_EGG) were computed. During each recording session, the heart rate was continuously monitored. In addition to baseline measurements, saliva-based cortisol levels were measured immediately prior to and after singing. The most notable changes from the rehearsal (R1 and R2) to the stage situations (S1 and S2) were an increase of vibrato amplitude by about 12.8 % and a decrease of CQ_EGG from about 0.56 to about 0.48. Average sound pressure levels were about equal across all conditions, with the exception of R1 (-2 dB). The heart rate during singing (R1, R2, and S1) was at about 80 to 85 beats per minute (BPM), reaching maxima of about 185 BPM in the presence of the mock theatre agent (S2). As compared to the baseline, cortisol levels were about twice as high on the days of both the rehearsal and the recording, but no difference was found between the dress rehearsal (S1) and stage (S2) conditions. This study demonstrates the feasibility of wireless bio-signal acquisition during singing on stage. However, a more rigorous trial with a larger number of participants is needed to show whether performance-related arousal has a systematic influence on voice production.
C140. Kristen M. Murdaugh, Olivier Perrotin, Bruno Gingras, Christian T. Herbst (2021). Correlating Perceptual and Spectral Aspects of Chiaroscuro in Singing -- A Pilot Study. Virtually PAVA - Pan-American Vocology Association Symposium, Pan-American Voice Association, Salt Lake City, UT. January 1, 2021. presented by Kristen Murdaugh. - show abstract
Vocal pedagogy terms rooted in perception have been employed in voice literature and training for centuries. One such traditional, tried and true concept is chiaroscuro (chiaro = bright; scuro = dark). Research observations suggest that chiaroscuro can be examined on two levels: the physiological (physical) level and the perceptual (psychoacoustic) level. Yet to date, no empirical studies have been conducted to investigate what exactly is being altered or modified physiologically to create chiaro and scuro and where exactly these tone qualities exist within the spectrum of the radiated sound wave. To fill this gap, a four-part study with an overarching goal of potentially relating acoustical and spectral sound characteristics at various perceptual levels of chiaroscuro to the respective underlying physiological voice production gestures is being conducted, with the first part detailed in this presentation investigating whether certain aspects of the radiated spectral composition of the voice are systematically relevant for the perception of chiaroscuro and chiaro and scuro, respectively.
Informal listening experiments have a priori identified four potential acoustic and spectral features that may be relevant to the perception of chiaroscuro: (a) overall sound level; (b) global frequency shifts of formant frequencies; (c) spectral slope; and (d) singers' formant cluster level. In this study, a cohort of twelve experienced singing voice pedagogues were asked to rate the effect of these four sound modification classes vis-a-vis their perception of 1) the degree of chiaroscuro as a Gestalt principle (task 1); and 2) the degree of chiaro and scuro individually within each presented sound sample (task 2).
The perceptual ratings of chiaroscuro as a Gestalt principle and chiaro and scuro individually varied drastically from participant to participant, with correlations and trends in the data being largely indeterminable, with the exception of global frequency shifts of formant frequencies. It can be theorized, within the context of the limitations of this pilot study and its data set, that singing voice pedagogues have drastically varying perceptual definitions of chiaro, scuro, and chiaroscuro as a whole, highlighting the importance of further chiaroscuro related research.
C139. Kristen M. Murdaugh, Josipa Bainac-Hausknecht, Christian T. Herbst (2021). In-Person or Virtual? -- Assessing the Impact of COVID-19 on the Teaching Habits of Voice Pedagogues. Virtually PAVA - Pan-American Vocology Association Symposium, Pan-American Voice Association, Salt Lake City, UT. January 1, 2021. presented by Kristen Murdaugh. - show abstract
The social distancing measures implemented world-wide in the wake of the novel Coronavirus (COVID-19) crisis forced voice pedagogues to alter their teaching habits, shifting from customary in-person teaching to virtual teaching. Aiming to assess the impact which COVID-19 had on the teaching habits of voice pedagogues, particularly pedagogues' technological solutions for virtual teaching and the ability to assess the voice within such solutions, an online survey was distributed to voice pedagogues world-wide in April/May 2020. The collected responses from 387 survey participants suggest that, overall, voice teachers were only moderately satisfied with having to teach virtually, indicating that virtual voice teaching is not a sufficient replacement for in-person teaching. The participants indicated that during virtual teaching the singing voice can be assessed relatively well through features which provide both acoustic and visual clues. In contrast, depending on utilized technology, it may be harder to judge those aspects of the singing voice that are solely defined acoustically, such as dynamic range and spectral composition. This may be explained by limitations imposed by ``out of the box'' technology for online communication, which is typically optimized for speech instead of singing. This calls for better information on technological solutions for virtual voice teaching.
C138. Christian T. Herbst (2020). Subharmonics: a common (but sometimes overlooked) phenomenon of the speaking and singing voice. NATS 56th National Virtual Conference (invited lecture), National Association of Teachers of Singing, Knoxville. June 27, 2020. - show abstract
Despite remarkable scientific progress made in the past decades, the notion of singing voice registers remains a ``hot topic''. In this talk, several strategies for defining and classifying registers will be reviewed, starting with the more traditional proprioceptive and perceptual/psychoacoustic approaches, but also covering the modern views regarding laryngeal mechanisms and vocal tract influences. Rather than proposing a novel scheme for register classification, it is the purpose of this contribution to show how some persistent disagreements -- for instance as regards the number of registers -- might be mitigated when explicitly considering the level at which the issues at hand are discussed: proprioceptive, perceptual/psychoacoustic, or physical/acoustical, with or without considering vocal tract influences. Such an acknowledgement of the increased complexity of the topic of registers (or, rather, the removal of unjustified over-simplification) may eventually lead to increased clarity and better understanding in both science and teaching.
C137. Christian T. Herbst (2020). Vocal Studies and Vocal Research. Wissenschaftsseminar (invited lecture), MDW/MuWi/ÖGfMM, Vienna. June 27, 2020.
C136. T. Nestorova, I. Howell, J. Gilbert, Christian T. Herbst (2020). Does Vibrato Define Genre or Vice Versa? A Novel Approach to Stylistic Vibrato Derivative Analysis. Virtually PAVA - Pan-American Vocology Association Symposium, PAVA, Salt Lake City, UT. January 1, 2020. presented by Theodora Nestorova.
C135. Christian T. Herbst (2019). Et hi tres unum sunt -- Die Teilsysteme des Stimmapparates und ihre Relevanz für die Sängerstütze. 58. Berliner Gesangswissenschaftliche Tagung (invited lecture), Hochschule für Musik "Hanns Eisler", September 28, 2019. - show abstract
Das Thema "Sängerstütze" ist nach wie vor ein Dauerbrenner in gesangspädagogisch orientierten Diskussionen. In diesem Vortrag wird ein stimmphysiologisch orientierter Ansatz zum Thema vorgestellt. Es werden vier gängige und inhaltlich scheinbar stark voneinander abweichende Definitionen von "Stütze" in Hinblick auf ihre physiologische Relevanz evaluiert. Es wird gezeigt, dass jede Definition auf unterschiedliche Teilaspekte der sängerischen Stimmproduktion (Atemapparat, Kehlkopf, Vokaltrakt) abzielt. Unter Berücksichtung der existierenden physiologischen und physikalischen Interaktionen zwischen diesen Teilsystemen des Stimmproduktions-Apparates können scheinbare Widersprüche zwischen den besprochenen Stützkonzepten aufgelöst werden. So ergibt sich ein weiter gefasstes Rahmenkonzept welches hilft, historische und aktuelle Definitionen der "Sängerstütze" physiologisch relevanter zu interpretieren, was zu einer effektiveren Anwendung in der GesangspŠdagogik und in der Stimmtherapie führen kann.

Dieser Vortrag ist eine Zusammenfassung folgender in VOX HUMANA erschienenen Beiträge:

Herbst, C. T. (2018) Physiologische Grundlagen der Sängerstätze (Teil 1): Et hi tres unum sunt Wechselwirkungen der Teilsysteme des Stimmapparates, Vox Humana, 3(3), pp. 50-55.
Herbst, C. T. (2019) Physiologische Grundlagen der Sängerstütze (Teil 2): Eine konzeptionelle Begriffserweiterung, Vox Humana, 15(1), pp. 36-39.
C134. Christian T. Herbst, Jacob Dunn (2019). Voice production physiology of non-human primates. 13th Pan-European Voice Conference (PEVOC), University of Copenhagen, Copenhagen, DK. August 27, 2019. - show abstract
A long standing scientific debate stands over whether -- in comparison to non-human primates and other mammals -- the human vocal organ has evolved especially to support speech, or whether the capacity for speech is mainly determined by advanced neural control and cognition in humans. Mounting evidence seems to support the latter hypothesis.

By inversion of this argumentation, we can speculate whether non-human primates would be able to speak and sing like humans if they had the required neuronal and cognitive adaptations. For this, they would require a voice production apparatus that is in essence similar to that of humans, both anatomically and functionally.

While the laryngeal and vocal tract anatomy of non-human primates is relatively well investigated, relatively little is known about the functional aspects of their voice production. In particular, physiological data in vivo is scarce, which is mostly attributed to experimental difficulties.

In this presentation, the available empirical evidence concerning voice production physiology in non-human primates is reviewed. Where possible, comparisons to the respective human traits are made, highlighting analogous mechanisms.
C133. Christian T. Herbst, Hiroki Koda, Takumi Kunieda, Juri Suzuki, Maxime Garcia, W. Tecumseh Fitch, Takeshi Nishimura (2019). Japanese macaque phonatory physiology. 13th Pan-European Voice Conference (PEVOC), University of Copenhagen, Copenhagen, DK. August 27, 2019. - show abstract
While the call repertoire and its communicative function is relatively well explored in Japanese macaques (Macaca fuscata), little empirical data is available on the physics and the physiology of this species' vocal production mechanism. Here, a 6 year old female Japanese macaque was trained to phonate under an operant conditioning paradigm. The resulting "coo" calls, and spontaneously uttered "growl" and "chirp" calls, were recorded with sound pressure level (SPL) calibrated microphones and electroglottography (EGG), a non-invasive method for assessing the dynamics of phonation. A total of 448 calls were recorded, complemented by ex vivo recordings on an excised Japanese macaque larynx. In this novel multidimensional investigative paradigm, in vivo and ex vivo data were matched via comparable EGG waveforms. Subsequent analysis suggests that the vocal range (range of fundamental frequency and SPL) was comparable to that of a 7-10 year old human, with the exception of low-intensity chirps, whose production may be facilitated by the species' vocal membranes. In coo calls, redundant control of fundamental frequency in relation to SPL was also comparable to humans. EGG data revealed that growls, coos, and chirps were produced by distinct laryngeal vibratory mechanisms. EGG further suggested changes in the degree of vocal fold adduction in vivo, resulting in spectral variation within the emitted coo calls, ranging from "breathy" (including aerodynamic noise components) to "non-breathy". This is again analogous to humans, corroborating the notion that phonation in humans and non-human primates is based on universal physical and physiological principles.
C132. Christian T. Herbst, Bodo Maass (2019). Workshop: Real-time visualization feedback of voice production physiology with electroglottographic wavegrams. 48th Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 31, 2019. - show abstract
Electroglottography (EGG) is a non-invasive, low-cost technology for monitoring the relative vocal fold contact area during voice production. The EGG signal may -- under certain conditions -- give insights into the vocal register (chest vs. falsetto, or M1 vs. M2) [1] and the degree of vocal fold adduction (breathy, normal, pressed) [2] of an individual. However, careful interpretation is required, particularly for quantitative analysis parameters, such as the ubiquitous contact quotient [3]. An alternative is constituted by qualitative assessment of EGG data, either through visual (real-time) display of the EGG waveform [4], or via the recently introduced EGG wavegram visualization technique [5].

EGG wavegrams provide an intuitive means for quickly assessing vocal fold contact phenomena and their abrupt or gradual variation over time. This allows to document changes of vocal register and vocal fold adduction, as well as related indirect effects introduced by variation of pitch or loudness.

Currently available software for generating EGG wavegrams is limited to ex-post (i.e., offline) analysis [6]. In this workshop, two new tools for real-time EGG wavegram feedback are demonstrated: After a brief review of the theoretical background for EGG wavegram interpretation, CTH will introduce a rudimentary freeware prototype for real-time EGG wavegram feedback, documenting the effects of variations of laryngeal voice production settings (such as vocal registers or adduction). In the second part of this workshop, BM will present the novel wavegram extension incorporated into the 2nd generation VoceVista software by Sygyt Software. In both software demonstrations, workshop attendees have the opportunity to test and try various phonation types and the respective EGG wavegram real-time feedback.


References:
[1] N. Henrich, C. d'Alessandro, B. Doval, and M. Castellengo, ``Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency,'' J. Acoust. Soc. Am., vol. 117, no. 3, pp. 1417--1430, 2005.
[2] R. Scherer and I. R. Titze, ``The Abduction Quotient Related to Vocal Quality,'' J. Voice, vol. 1, no. 3, pp. 246--251, 1987.
[3] C. T. Herbst, H. K. Schutte, D. L. Bowling, and J. G. Svec, ``Comparing Chalk With Cheese---The EGG Contact Quotient Is Only a Limited Surrogate of the Closed Quotient,'' J. Voice, vol. 31, no. 4, pp. 401--409, Jul. 2017.
[4] C. T. Herbst, D. M. Howard, and J. Schlömicher-Thier, ``Using Electroglottographic Real-Time Feedback to Control Posterior Glottal Adduction during Phonation,'' J. Voice, vol. 24, no. 1, pp. 72--85, 2010.
[5] C. T. Herbst, W. T. Fitch, and J. G. Švec, ``Electroglottographic wavegrams: a technique for visualizing vocal fold dynamics noninvasively,'' J Acoust Soc Am, vol. 128, no. 5, pp. 3070--3078, 2010.
[6] B. Larson, C. T. Herbst, and E. Hunter, ``EGG Wavegram Python Source Code Tutorial,'' 2013.
C131. Christian T. Herbst, Takeshi Nishimura, Maxime Garcia, Kishin Migimatsu, Isao Tokuda (2019). Effect of Ventricular Folds on Vocalization Fundamental Frequency in Domestic Pigs (Sus scrofa domesticus). 48th Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 30, 2019. - show abstract
This study investigates the effect of the ventricular folds on fundamental frequency (fo) in the
voice production of domestic pigs (Sus scrofa domesticus). The larynges of six subadult pigs
were phonated ex vivo in two preparation stages, with the ventricular folds present (PS1) and
removed (PS2). Vocal fold resonances were tested with a laser vibrometer, and a four-mass
computational model was created. Highly significant fo differences were found between PS1
and PS2 (means at 93.7 Hz and 409.3 Hz, respectively). Two tissue resonances were found
at 115 Hz and 250--290 Hz. The computational model had unique solutions for abducted and
adducted ventricular folds at about 150 Hz and 400 Hz, roughly matching the fo measured ex
vivo for PS1 and PS2. The differing fo encountered across preparation stages PS1 and PS2 is
explained by distinct activation of either a high or a low eigenfrequency mode, depending on
the engagement of the ventricular folds. The inability of the investigated larynges to vibrate
at frequencies below 250 Hz in PS2 suggests that in vivo low-frequency calls of domestic
pigs (pre-eminently grunts) are likely produced with engaged ventricular folds. Allometric
comparison suggests that the special ``double oscillator'' has evolved to prevent signaling
disadvantages.
C130. Christian T. Herbst (2019). Empirische Forschung für das künstlerischen Schaffen -- Beispiel: Physiologie und Akustik der Stimme. Seminar (invited lecture), Universität Mozarteum Salzburg, May 7, 2019. - show abstract
Damit beim Musizieren aus Begabung Exzellenz werden kann, bedarf es eines pädagogischen Prozesses welcher sich auf mehreren Ebenen abspielt: Neben der Musikalität und der persönlichen Entwicklung ist auch eine motorische Komponente im Sinne des technisch optimierten Musizierens am Instrument wesentlich. Dieser Vortrag widmet sich dieser „handwerklichen`` Kompetenz am Beispiel des Singens.

Nach einer kurzen, nicht-mathematischen Einführung in die Physiologie und Akustik der Stimme wird exemplarisch gezeigt, wie objektive, empirisch gewonnene Erkenntnisse über die Stimmproduktion bei professionellen und AmateursängerInnen in stimmbildnerische Ansätze einfliessen und jene bereichern können, und zwar sowohl im klassischen Gesang als auch in der Popularmusik. Eine derartige „evidenzbasierte Gesangspädagogik`` wird innerhalb der internationalen community mehr und mehr diskutiert. Sie zielt darauf ab, dass sowohl akademisch Studierende als auch Amateure ohne professionellen Anspruch effizient lernen, ästhetisch gut, funktionell richtig und langfristig gesund zu singen.
C129. Christian T. Herbst (2019). Voice production in humans and non-human primates. 30. Konferenz Elektronische Sprachsignalverarbeitung (invited lecture), Technische Universität Dresden, Dresden, Germany. March 6, 2019.
C128. Christian T. Herbst (2019). Analyse und Visualisierung von Stimmklängen für Gesangspädagogen, Ärzte und Therapeuten. 17. Leipziger Symposium zur Kinder- und Jugendstimme (invited lecture), Universitätsklinikum Leipzig, Leipzig. February 24, 2019.
C127. Christian T. Herbst (2018). Akustik und Physiologie der Stimmregister im Gesang. 10. Gesangspädagogisches Symposium (invited lecture), Antonio Salieri Institut für Gesang und Stimmforschung in der Musikpädagogik, Universität für Musik und Darstellende Kunst, Wien. October 13, 2018.
C126. Christian T. Herbst, Brian P. Gill (2018). Voice Building, Coaching or Therapy? -- A Delineation of Areas of Responsibility in Voice Pedagogy. International Voice Symposium Salzburg, Austrian Voice Institute, Salzburg, Austria. August 26, 2018. - show abstract
The final keynote panel of the 10th Pan-European Voice Conference (PEVOC) was concerned with the topic ``Voice Pedagogy -- What do we need?'' In this presentation the panel discussion is summarized and a deepening discussion on one of the key questions is provided, addressing the roles and tasks of people working with voice students. In particular, a distinction is made between (a) voice building (derived from the German term ``Stimmbildung''), primarily comprising the functional and physiological aspects of singing; (b) coaching, mostly concerned with performance skills; and (c) singing voice rehabilitation. Both public and private educators are encouraged to apply this distinction to their curricula, in order to arrive at more efficient singing teaching and to reduce the risk of vocal injury to the concerned singers.


Reference
Brian P. Gill, Christian T. Herbst (2015). Voice Pedagogy - What Do We Need? Logopedics Phoniatrics Vocology, 41 (4), 168-173


Acknowledgements:
This work has been supported by the European Social Fund and the state budget of the Czech Republic, project no. CZ.1.07/2.3.00/30.0004 `POST-UP', and by an APART grant from the Austrian Academy of Sciences (both to C.T.H.)
C125. Christian T. Herbst (2018). A Review of Singing Voice Subsystem Interactions. 47th Annual Symposium: Care of the Professional Voice, The Voice Foundation, June 3, 2018. - show abstract
In accordance with the myoelastic-aerodynamic and the source-filter theories, the human voice production system is typically described having three sub-systems: the respiratory system (the power source), the larynx (the sound source), and the oral and nasal vocal tracts (the sound modifiers). These sub-systems can interact during sound generation, where physiological action in one system has a direct mechanical consequence in another sub-system.

Here, three major such synergies are reviewed, creating a pedagogical model of voice sub-system interactions: (1) Vocal tract adjustments can potentially influence the behavior of the voice source via non-linear source-tract interactions; (2) the type and degree of vocal fold adduction controls the expiratory airflow rate; and (3) the tracheal pull caused by the respiratory system affects the vertical larynx position and thus the vocal tract resonances.

The pedagogical relevance of the presented model is discussed, suggesting, amongst others, that functional work on a particular voice sub-system may have side effects or benefits on other sub-systems, even when targeting a clearly defined and isolated physiological goal.


Reference:
Christian T. Herbst (2017). A review of singing voice sub-system interactions - towards an extended physiological model of "support". Journal of Voice, 31 (2), 249.e13--249.e19
C124. Christian T. Herbst (2018). Toward an Extended Physiological Model of "Support". 47th Annual Symposium: Care of the Professional Voice, The Voice Foundation, June 3, 2018. - show abstract
The notion of ``support'' and the closely linked concepts of ``appoggio'' and ``breath control'' are central elements in singing voice pedagogy. They have received much scholarly attention in the past and present. Over several centuries, scholars have proposed a number of definitions, some of which contain apparent contradictions.

Several of these concepts are reviewed here, and their physiological and physical relevance is discussed in the context of the model of interactive voice sub-systems [1], as outlined in the previous presentation. In particular, I will discuss to which degree the respective definitions of ``support'' cover or neglect the three voice subsystems: the respiratory system (the power source), the larynx (the sound source), and the oral and nasal vocal tracts (the sound modifiers).

Based on this analysis, I argue that ostensible inconsistencies between various definitions of ``support'' can be resolved by putting them into the wider context of the subsystem interaction model presented here, thus offering a framework for reviewing and potentially refining some current and historical pedagogical approaches.

In a broader context, this presentation advertises the value of physically and physiologically informed approaches for singing voice instruction, paving the way for evidence-based voice pedagogy.


Reference:
Christian T. Herbst (2017). A review of singing voice sub-system interactions - towards an extended physiological model of "support". Journal of Voice, 31 (2), 249.e13--249.e19
C123. Christian T. Herbst (2018). Common mechanism of voice production in humans, non-human mammals, and birds. Ars Choralis 2018. 5th International Symposium on Chorusology - Choral Art - Singing - Voice (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. April 5, 2018.
C122. Christian T. Herbst (2018). Freddie Mercury --- acoustic voice analysis. Seminar, Department of Otorhinolaryngology-Head and Neck Surgery (invited lecture), Seoul National University Hospital, Seoul, Republic of Korea. January 26, 2018. - show abstract
Freddie Mercury was one of the twentieth century's best-known singers of commercial contemporary music. This study presents an acoustical analysis of his voice production and singing style, based on perceptual and quantitative analysis of publicly available sound recordings.

Analysis of six interviews revealed a median speaking fundamental frequency of 117.3 Hz, which is typically found for a baritone voice. Analysis of voice tracks isolated from full band recordings suggested that the singing voice range was 37 semitones within the pitch range of F#2 (about 92.2Hz) to G5 (about 784Hz). Evidence for higher phonations up to a fundamental frequency of 1,347 Hz was not deemed reliable.

Analysis of 240 sustained notes from 21 a-cappella recordings revealed a surprisingly high mean fundamental frequency modulation rate (vibrato) of 7.0 Hz, reaching the range of vocal tremor. Quantitative analysis utilizing a newly introduced parameter to assess the regularity of vocal vibrato corroborated its perceptually irregular nature, suggesting that vibrato (ir)regularity is a distinctive feature of the singing voice.

Imitation of subharmonic phonation samples by a professional rock singer, documented by endoscopic high-speed video at 4,132 frames per second, revealed a 3:1 frequency locked vibratory pattern of vocal folds and ventricular folds. These traits, in combination with the fast and irregular vibrato, might have helped create Freddie Mercury's eccentric and flamboyant stage persona.


Reference:

C. T. Herbst, S. Hertegard, D. Zangger-Borch, and P.-{\AA}. Lindestad, ``Freddie Mercury---acoustic analysis of speaking fundamental frequency, vibrato, and subharmonics,'' Logop. Phoniatr. Vocology, vol. 42, no. 1, pp. 29--38, Jan. 2017.
C121. Christian T. Herbst (2018). From vocal fold vibration to sound - why is glottal closure important?. Seminar, Department of Otorhinolaryngology-Head and Neck Surgery (invited lecture), Seoul National University Hospital, Seoul, Republic of Korea. January 26, 2018. - show abstract
When assessing healthy or pathologic voices in laryngology, a major focus is on the vibratory pattern of the vocal folds, as provided by laryngeal endoscopic recordings. In particular, a symmetric and periodic vibration with a pronounced closed phase is often found desirable for healthy voices. The causal connection between vocal fold vibratory patterns and the resulting sound is often only considered implicitly.

In this basic introductory lecture, the fundamental physical and physiological principles of laryngeal sound generation are reviewed. In particular, it is shown that a main determinant of the radiated sound (constituting the ``voice'') is the glottal airflow, a quantity often overlooked when medically assessing the voice.

Some causal relations between vocal fold vibration, glottal airflow and sound quality will be reviewed. Furthermore, the possibilities to control the quality of glottal airflow (and hence the quality of the produced sound) are discussed on a physiological level, and some psychoacoustic (perceptual) consequences of this airflow control are considered.


References:

[1] C. T. Herbst, D. M. Howard, and J. G. Svec, ``The sound source in singing -- basic principles and muscular adjustments for fine-tuning vocal timbre,'' in The Oxford Handbook of Singing, G. Welch, D. M. Howard, and J. Nix, Eds. Oxford, UK: Oxford University Press, 2016.

[2] C. T. Herbst, ``Biophysics of Vocal Production in Mammals,'' in Vertebrate Sound Production and Acoustic Communication, W. T. Fitch, A. N. Popper, and R. A. Suthers, Eds. New York: Springer, 2016, p. 328.
C120. Christian T. Herbst (2017). Speech and singing voice assessment with electroglottegraphy. VIII Annual COST related Symposium Copenhagen & VII World Voice Consortium Congress (invited lecture), World Voice Consortium, Copenhagen, Denmark. December 8, 2017.
C119. Mona Kirstin Fehling, Bernhard Schick, Jan G. Svec, Christian T. Herbst, Jörg Lohscheller (2017). Zusammenhang zwischen der Morphologie von Stimmlippentrajektorien und vertikaler Schwingungsdynamik. 34. Wissenschaftliche Jahrestagung der Deutschen Gesellschaft für Phoniatrie und Pädaudiologie (DGPP) - Dreiländertagung D-A-CH, Deutsche Gesellschaft für Phoniatrie und Pädaudiologie e. V., September 15, 2017. presented by Mona Kirstin Fehling. - show abstract
Hintergrund: Die objektive Analyse endoskopischer Hochgeschwindigkeits-Videoaufnahmen der Stimmlippen (SL) basiert auf einer initialen Segmentierung der SL-kanten. So lassen sich an beliebigen Positionen entlang der glottalen Achse die Bewegungsvorgänge beider SL individuell durch Trajektorien beschreiben, die in Abhängigkeit des Schwingungsmusters unterschiedliche Zeitverläufe aufweisen. Charakteristisch für die Trajektorienmorphologie ist der Zeitpunkt maximaler Auslenkung, welcher Öffnungs- und Schlussphase trennt und unterschiedlich starke Krümmung aufweist.

Material und Methoden: Für ein Kollektiv 100 stimmgesunder Probanden wird die Ausprägung der Trajektorienkrümmung zum Zeitpunkt maximaler Auslenkung entlang der gesamten glottalen Achse mittels eines Krümmungsparameters ermittelt. Mittels Regressionsanalyse werden zwei Winkelparameter ermittelt, welche zum Zeitpunkt der maximalen Auslenkung die Steilheit von Öffnungs- und Schlussphase quantifizieren. Die Methode wird auf stationäre sowie auf nicht-stationäre Phonationsparadigmen angewendet.

Ergebnisse: Die Analyse stimmgesunder Sequenzen zeigt eine Geschlechts- sowie eine Frequenzabhängigkeit der Krümmungsmorphologie. Männer weisen eine im Mittel niedrigere Krümmung gegenüber Frauen auf, welche geschlechtsunabhängig mit zunehmender Frequenz abnimmt. Zudem zeigt sich ein Unterschied zwischen beiden Winkelparametern, welche die Steilheit der Öffnungs- und Schlussphase quantifizieren, wobei die Öffnung einen steileren Verlauf zeigt. Bei auftretenden lateralen Asymmetrien konnten zudem Phasenverschiebungen zwischen den SL-trajektorien identifiziert werden, wobei die Phase der SL mit geringerer Krümmung vorwegläuft.

Diskussion: Unterschiedliche Krümmungen lassen sich auf eine veränderte laterale Dynamik sowie eine ausgeprägt vertikale Phasenverschiebung in der Schwingungsdynamik zurückführen. Bei der Segmentierung kann dies dazu führen, dass die SL-kanten an unterschiedlichen lateralen Positionen extrahiert werden und somit abrupt springen. Dieser vertikale Versatz tritt zum Zeitpunkt maximaler SL-auslenkung auf, was als "Trajektorienknick" bzw. starke Krümmung interpretiert wird. Werden die SL bei der Endoskopie unter einem leicht schrägen Winkel betrachtet, kann dies zudem zu einer deutlichen Phasenverschiebung zwischen linker und rechter Stimmlippe führen. Diese Effekte sind bei der klinischen Bewertung sowie der computergestützten zu berücksichtigen, da auftretende Asymmetrien in den Trajektorien Artefakte darstellen und als vorliegende Schwingungsasymmetrien fehlinterpretiert werden können.
- show citations
please wait until both citation applets (Scopus, Altmetrics) are loaded:
C118. Christian T. Herbst, Brian P. Gill (2017). Delineation of Three Main Areas of Voice Pedagogy: Voice Building, Coaching, and Voice Rehabilitation. The Voice Foundation's 46th Annual Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 4, 2017. presented by Brian P. Gill. - show abstract
The final keynote panel of the 10th Pan-European Voice Conference (PEVOC) was concerned with the topic ``Voice Pedagogy -- What do we need?'' In this presentation the panel discussion is summarized and the authors provide a deepening discussion on one of the key questions, addressing the roles and tasks of people working with voice students. In particular, a distinction is made between (a) voice building (derived from the German term ``Stimmbildung''), primarily comprising the functional and physiological aspects of singing; (b) coaching, mostly concerned with performance skills; and (c) singing voice rehabilitation. Both public and private educators are encouraged to apply this distinction to their curricula, in order to arrive at more efficient singing teaching and to reduce the risk of vocal injury to the concerned singers.

Reference
Gill B.P., Herbst C.T. Voice Pedagogy - What do we need? Logop Phoniatr Vocol. 2016, 41 (4), 168-173
C117. Christian T. Herbst, Harm K. Schutte, Daniel L. Bowling, Jan G. Svec (2017). Comparing chalk with cheese -- The EGG contact quotient is only a limited surrogate of the closed quotient. The Voice Foundation's 46th Annual Symposium: Care of the Professional Voice, The Voice Foundation, Philadelphia, PA. June 1, 2017. presented by Jan G. Svec. - show abstract
The electroglottographic (EGG) contact quotient (CQegg), an estimate of the relative duration of vocal fold contact per vibratory cycle, is the most commonly used quantitative analysis parameter. The purpose of this study is to quantify the CQegg's relation to the closed quotient, a measure more directly related to glottal width changes during vocal fold vibration and the respective sound generation events.

Thirteen singers (six females) phonated in four extreme phonation types, while independently varying the degree of breathiness and vocal register. EGG recordings were complemented by simultaneous videokymographic (VKG) endoscopy, which allows for calculation of the videokymographic closed quotient (CQvkg). The CQegg was computed using five different algorithms, all used in previous research.

All CQegg algorithms produced CQegg values that clearly differed from the respective CQvkg, with standard deviations around 20 % of cycle duration. The difference between CQvkg and CQegg was generally greater for phonations with lower CQvkg. The largest differences were found for low-quality EGG signals with a signal-to-noise ratio (SNR) below 10 dB, typically stemming from phonations with incomplete glottal closure. Disregarding those low-quality signals, the best match between CQegg and CQvkg was found for a CQegg algorithm operating on the first derivative of the EGG signal.

These results show that the terms ``closed quotient'' and ``contact quotient'' should not be used interchangeably. They relate to different physiological phenomena. Phonations with incomplete glottal closure having an EGG SNR below 10 dB are not suited for CQegg analysis.
C116. Christian T. Herbst (2017). Gesangstechnik an der Schnittstelle zwischen Pädagogik und Stimmforschung. Stimmwelten (invited lecture), Universitätsklinik für Hals-, Nasen- und Ohrenkrankheiten, Kopf- und Halschirurgie, Inselspital Bern, Bern, Switzerland. April 29, 2017.
C115. Christian T. Herbst (2017). A review of singing voice sub-system interactions - towards an extended physiological model of "support". Chorusology Symposium (invited lecture), Malta Diocese Catholic Institute, Valletta, Malta. April 21, 2017.
C114. Christian T. Herbst (2017). Voice building, coaching, or therapy? - A delineation of areas of responsibility in voice pedagogy. Chorusology Symposium (invited lecture), Malta Diocese Catholic Institute, Valletta, Malta. April 20, 2017.
C113. Christian T. Herbst (2017). Electroglottographic investigation of primate vocalization. SPIRITS program workshop "Biology and Evolution of Speech" (invited lecture), Kyoto, Japan. February 23, 2017. - show abstract
In analogy to humans, primate sound production can, as a first approximation, be explained by the source filter theory [1]⁠. While the filter function of the vocal tract can be approximated by acoustic analysis [2]⁠, assessment of the sound source would require invasive investigative techniques like laryngeal endoscopy, which are quite challenging in vivo. A low-cost non-invasive alternative is electroglottography (EGG). A low intensity, high-frequency current is passed between two electrodes placed on each side of the larynx. The admittance variations resulting from vocal fold (de)contacting during laryngeal sound production are largely proportional to the time-varying relative vocal fold contact area [3]⁠. Whereas EGG is relatively widely used in human voice research, it has only been sparsely applied in primate in vivo investigations [4]⁠.

In this progress report the potential of EGG for primate voice production analysis is examined in closer detail. Part one critically reviews fundamental frequency (f0) extraction the EGG signal when applying different computational algorithms. For this purpose, the signal quality of database of EGG recordings from six different new world monkey species was analyzed, and a total of 15625 synthetic EGG signal were generated while varying six parameters that control EGG signal quality: random f0 variation, subharmonics, amplitude drift, mains hum, baseline drift, and noise components. Preliminary data analysis suggests a strong dependency of f0 results on the chosen algorithm.

In the second part of this presentation the findings of a pilot study pilot study performed with a female Japanese Macaque who was trained to vocalize upon a visual stimulus are summarized. A total of 369 ``coo'' calls, 17 ``grunts'', and 5 ``chirps'' were documented with SPL-calibrated microphone signals and simultaneous EGG recordings, generating the first phonetogram [5]⁠ for a non-human species. 26 recorded calls contained transitions between coos and grunts, and the EGG evidence suggests that the coos and the grunts constitute distinct laryngeal mechanisms (comparable to ``registers'' in human singing), potentially generated by the same vibrating structures.


References:

[1] T. Chiba and M. Kajiyama, The Vowel: Its Nature and Structure. Tokyo, Japan: Tokyo-Kaiseikan, 1941.
[2] L.-J. Bo{\"e}, F. Berthommier, T. Legou, G. Captier, C. Kemp, T. R. Sawallis, Y. Becker, A. Rey, and J. Fagot, ``Evidence of a Vocalic Proto-System in the Baboon (Papio papio) Suggests Pre-Hominin Speech Precursors,'' PLoS One, vol. 12, no. 1, p. e0169321, Jan. 2017.
[3] V. Hampala, M. Garcia, J. G. Svec, R. C. Scherer, and C. T. Herbst, ``Relationship between the Electroglottographic Signal and Vocal Fold Contact Area,'' J. Voice, vol. 30, no. 2, pp. 161--171, 2016.
[4] C. H. Brown and M. P. Cannito, ``Modes of vocal variation in Syke's monkey (Cercopithecus albogularis) squeals,'' J. Comp. Psychol., vol. 109, no. 4, pp. 398--415, 1995.
[5] P. H. Damste, ``The phonetogram,'' Pr. Otorhinolaryngol, vol. 32, no. 3, pp. 185--187, 1970.
C112. Christian T. Herbst (2016). The myoelastic-earodynamic theory of voice production in humans, mammals and birds. XXIV Pacific Voice Conference (invited lecture), Pacific Voice & Speech Foundation, Warsaw, Poland. October 5, 2016.
C111. Christian T. Herbst (2016). The sound source in singing: What electroglottography (EGG) can tell us about glottal configurations. The Singing Voice Science Workshop (invited lecture), John J. Cali School of Music at Montclair State University, Montclair, NJ. June 9, 2016.
C110. Christian T. Herbst (2016). "Et hi tres unum sunt" - Interactions between sound source, vocal tract, and pulmonary system in singing. The Singing Voice Science Workshop (invited lecture), John J. Cali School of Music at Montclair State University, Montclair, NJ. June 8, 2016.
C109. Christian T. Herbst, Jakob Unger, Hanspeter Herzel, Jan G. Svec, Jörg Lohscheller (2016). Phasegram Analysis of Vocal Fold Vibration Documented with Laryngeal High-Speed Video Endoscopy. 45th Annual Symposium: Care of the Professional Voice, The Voice Foundation, June 5, 2016. presented by Jan G. Svec. - show abstract
Objective. In a recent publication, the phasegram, a bifurcation diagram over time, has been introduced as an intuitive visualization tool for assessing the vibratory states of oscillating systems. Here, this non-linear dynamics approach is augmented with quantitative analysis parameters, and it is applied to clinical laryngeal high-speed video (HSV) endoscopic recordings of healthy and pathologic phonations.

Methods/Design. HSV data from a total of 73 females diagnosed as healthy (n=42), or with functional dysphonia (n=15) or unilateral vocal fold paralysis (n=16), were quantitatively analyzed. Glottal area waveforms (GAW) as well as left and right hemi-GAWs (hGAW) were extracted from the HSV recordings. Based on Poincaré sections through phase space embedded signals, two novel quantitative parameters were computed: The phasegram entropy (PE), and the phasegram complexity estimate (PCE), inspired by signal entropy and correlation dimension computation, respectively.

Results. Both PE and PCE assumed higher average values (suggesting more irregular vibrations) for the pathological as compared to the healthy participants, significantly discriminating the healthy from the paralysis group (p=0.02 for both PE and PCE). Comparisons of individual PE or PCE data for the left and right hGAW within each subject resulted in asymmetry measures for the regularity of vocal fold vibration. The PCE-based asymmetry measure revealed significant differences between the healthy and the paralysis group (p=0.03).

Conclusions. Quantitative phasegram analysis of GAW and hGAW data is a promising tool for the automated processing of HSV data in research and in clinical practice.
C108. Christian T. Herbst, Brian P. Gill (2016). Voice building, coaching, or therapy? - A delineation of areas of responsibility in voice pedagogy. Ars Choralis 2016. 4th International Symposium on Chorusology - Choral Art - Singing - Voice (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. March 31, 2016.
C107. Christian T. Herbst (2016). A review of singing voice sub-system interactions -- towards an extended physiological model of "support". Ars Choralis 2016. 4th International Symposium on Chorusology - Choral Art - Singing - Voice (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. March 31, 2016.
C106. Christian T. Herbst, Hiroki Koda, Takumi Kunieda, Juri Suzuki, Maxime Garcia, W. Tecumseh Fitch, Takeshi Nishimura (2016). In vivo assessment of Japanese Macaque sound production using electroglottography. 11th International Conference on the Evolution of Language (Evolang XI), University of Southern Mississippi, New Orleans, Louisiana, USA. March 21, 2016. presented by Takeshi Nishimura. - show abstract
While the call repertoire of Japanese Macaques (Macaca fuscata) has been described based on acoustic evidence (Green, 1975), little is known about the underlying laryngeal function, mostly due to experimental difficulties in vivo. As an alternative to direct laryngeal observation, vocal fold vibration can be assessed non-invasively with electroglottography (EGG). A low intensity, high-frequency current is passed between two electrodes placed on each side of the larynx. The admittance variations resulting from vocal fold (de)contacting during laryngeal sound production are largely proportional to the time-varying relative vocal fold contact area (Hampala et al., 2015).

Here, we present the results of a pilot study performed with a female Japanese Macaque who was trained to vocalize upon a visual stimulus. A total of 369 ``coo'' calls, 17 ``grunts'', and 5 ``chirps'' were documented with SPL-calibrated microphone signals and simultaneous EGG recordings. In the coos and the grunts, an EGG signal with cyclic content corresponding to the microphone signal was found. The absence of an EGG trace for the high-frequency chirps might have been caused by a low-pass filter in the EGG device hardware.

26 recorded calls contained transitions between coos and grunts, and the EGG evidence suggests that the transitions between the individual call types regularly occurred during as little as one to five vibratory cycles. This suggests that the coos and the grunts constitute distinct laryngeal mechanisms (comparable to ``registers'' in human singing), potentially generated by the same vibrating structures. Excised larynx experiments are warranted to test this hypothesis, also investigating the potential influence of the species' vocal membranes.

References:
Green, S. (1975). "Variation of Vocal Pattern with Social Situation in the Japanese Moneky (Macaca fuscata): A FieldStudy," in Primate Behaviour. Developments in Field and Laboratory Research, edited by L. A. Rosenblum (Academic Press, New York), pp. 1-102.
Hampala, V., Garcia, M., Svec, J. G., Scherer, R. C., and Herbst, C. T. (2015). "Relationship between the Electroglottographic Signal and Vocal Fold Contact Area," Journal of Voice in press.

C105. Coen Elemans, Jeppe Have Rasmussen, Christian T. Herbst, Daniel Düring, Sue Anne Zollinger, Henrik Brumm, Kyle Srivastava, Niels Svane, Ming Ding, Ole Larsen, Samuel Sober, Jan G. Svec (2016). Universal mechanisms of sound production and control in birds and mammals. 10th International Conference on Voice Physiology and Biomechanics (ICVPB), Universidad Tecnica Federico Santa Maria, Vina del Mar, Chile. March 15, 2016. presented by Christian T. Herbst.
C104. Christian T. Herbst (2015). Monitoring the mammalian and avian sound source with electroglottography. IBAC 2015, XXV International Bioacoustics Congress (invited lecture), International Bioacoustics Council, Murnau, Germany. September 7, 2015.
C103. Christian T. Herbst (2015). Elephant on the bench - ex-vivo investigation of mammalian sound production. Séminaire du département Parole et Cognition (invited lecture), GIPSA-lab, Grenoble, France. June 11, 2015.
C102. Christian T. Herbst, Vit Hampala, Maxime Garcia, Ronald C. Scherer, Jan G. Svec (2015). Electroglottography and Direct Measurement of Vocal Fold Contact Area -- a High-Speed Video Update. 44th Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 27, 2015. - show abstract
Objective. Electroglottography (EGG) is a popular non-invasive method that purports to measure changes in relative vocal fold contact area (VFCA) during phonation. Despite its broad application, the putative direct relation between the EGG waveform and the VFCA has to date only been formally tested in a single study (Scherer et al., 1988), suggesting an approximately linear relationship between VFCA and the EGG signal magnitude. However, in that study flow-induced vocal fold vibration was not investigated. A rigorous empirical evaluation of EGG as a measure of relative vocal fold contact area under proper physiological conditions is therefore still needed.

Methods/Design. In order to address this issue, three red deer larynges where phonated in an excised hemi-larynx preparation utilizing a conducting glass plate. The time varying contact between the vocal fold and the glass plate was assessed by high-speed video recordings made in the sagittal plane at 6000 fps, synchronized to the EGG signal (+/- 0.167 ms).

Results and Conclusions. In the contacting phase, the EGG waveform systematically preceded the measured VFCA. The average difference between the normalized [0..1] VFCA and EGG data in the three larynges was 0.180 (+/- 0.156), 0.075 (+/- 0.115) and 0.168 (+/- 0.184) in the contacting phase, and 0.159 (+/- 0.112), -0.003 (+/- 0.029) and 0.004 (+/- 0.0.32) in the de-contacting phase. In the de-contacting phase, there was thus a good agreement between VFCA and the EGG waveform in two out of three larynxes. Disagreements between the VFCA and EGG waveforms could have been caused by errors in data normalization, electrode placement, anisotropic conductance properties of the vocal folds, and possible effects of electroglottograph hardware circuitry. Pending further research to clarify the issue, quantitative EGG data should be interpreted cautiously, allowing for potential errors.
C101. Christian T. Herbst, Markus Hess, Frank Müller, Jan G. Svec, Johan Sundberg (2015). Glottal Adduction and Subglottal Pressure in Singing. 44th Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 27, 2015. - show abstract
Previous research suggests that independent variation of vocal loudness and glottal configuration (type and degree of vocal fold adduction) does not occur in untrained speech production. This study investigated whether these factors can be varied independently in trained singing, and how changes of subglottal pressure are related to changes of average glottal airflow, voice source properties and sound level under these conditions.

A classically trained baritone produced sustained phonations on the endoscopic vowel [i:] at pitch D4 (approx. 294 Hz), exclusively varying either (a) vocal register; (b) phonation type (from ``breathy'' to ``pressed'' via cartilaginous adduction); or (c) vocal loudness, while keeping the others constant. Phonation was documented by simultaneous recording of videokymographic, electroglottographic, airflow and voice source data, and by percutaneous measurement of relative subglottal pressure.

Register shifts were clearly marked in the EGG wavegram display. As compared with chest register, falsetto was produced with greater pulse amplitude of the glottal flow, H1-H2, mean airflow, and with lower MFDR, subglottal pressure, and sound pressure. Shifts of phonation type (breathy/flow/neutral/pressed) induced comparable systematic changes. Increase of vocal loudness resulted in increased subglottal pressure, average flow, sound pressure, MFDR, glottal flow pulse amplitude and H1-H2.

When changing either vocal register or phonation type, subglottal pressure and mean airflow showed an inverse relationship, i.e, variation of glottal flow resistance. The direct relation between subglottal pressure and flow when varying only vocal loudness demonstrated independent control of vocal loudness and glottal configuration. Achieving such independent control of phonatory control parameters would be an important target in vocal pedagogy and in voice therapy.
C100. Christian T. Herbst, Jan G. Svec (2015). Electroglottography -- a high-speed video update. 11th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research (AQL), The Royal National Throat, Nose and Ear Hospital, London, UK. April 8, 2015.
C99. Christian T. Herbst (2015). Effizienz der Tonproduktion in Sprache und Gesang. 4. Jahrestagung - Symposium 'FIT ON STAGE' (invited lecture), Österr. Gesellschaft für Musikermedizin (ÖGfMM), Vienna, Austria. March 21, 2015.
C98. Maxime Garcia, Markus Boeckle, Christian T. Herbst, Bruno Gingras, Yann Locatelli, W. Tecumseh Fitch (2014). Call classification design of the Wild Boar (Sus scrofa) complex vocalization system.. VII European Conference on Behavioural Biology (ECBB), Czech and Slovak Ethological Society, Prague, Czech Republic. July 18, 2014. presented by Maxime Garcia.
C97. Christian T. Herbst, Jinook Oh, Jitka Vydrova, Jan G. Svec (2014). DigitalVHI -- a Multi-Lingual Freeware Software Application to Capture Voice Handicap Index Data. 43rd Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 31, 2014. - show abstract
The voice handicap index (VHI) is a questionnaire to quantify the functional, physical and emotional impacts of a voice disorder on a patient's quality of life [1]. The VHI has been used in numerous studies as an indicator for finding evidence of voice disorders, and as a retrospective test of the outcome of clinical interventions.

Despite the widespread use of the tool, to the best of our knowledge, there does not seem to be any computer software available to facilitate the computer-aided capture of VHI data. Such software is needed to store the questionnaire results electronically, to automatically calculate the final scores as well as to facilitate handling the data for clinical studies.

Here, we introduce DigitalVHI, a freeware open source software application to capture Voice Handicap Index data [2]. Both a Mac OS X and a Microsoft Windows version, as well as the original Python source code are available for download at http://www.christian-herbst.org/DigitalVHI/

DigitalVHI consists of a simple user interface, which has successfully been tested over a period of two years in the voice clinic led by author JV. The final result of each questionnaire data acquisition is saved as a PDF file, and the collected data is appended to a file in CSV format (data can then be imported to OpenOffice, Microsoft Excel, R, SPSS, etc., for further processing). To maximize data security of sensitive patient data, no internet connection is required to run the software. The DigitalVHI user interface (including all questionnaire data) can be easily translated to any language by creating additional language packs.

Acknowledgement: This project has been co-financed by the European Social Fund and the state budget of the Czech Republic within the project no. CZ.1.07/2.3.00/30.0004 "POST-UP" (CH, JGS) and the projects no. CZ.1.07/2.4.00/17.0009 and CZ 1.07/2.3.00/20.0057 (JV, JGS).

References:

[1] B. Jacobson, et al., "The voice handicap index (VHI): development and validation," J.Speech-Lang.Path., vol. 6, pp. 66-70, 1997.
[2] C. T. Herbst, et al., "DigitalVHI-a freeware open-source software application to capture the Voice Handicap Index and other questionnaire data in various languages," Logoped Phoniatr Vocol, Sep 19 2013 (early online, doi: 10.3109/14015439.2013.830769).
C96. Christian T. Herbst, Hanspeter Herzel, Jan G. Svec, Megan Wyman, W. Tecumseh Fitch (2014). Visualizing voice dynamics with phasegrams. 43rd Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 31, 2014. - show abstract
``Normal'' voice production is characterized by (nearly) periodic vocal fold vibration. The deviation from periodicity by introducing subharmonic or irregular oscillations is either inside (as, e.g., in the case of some singing styles or in certain mammalian vocalizations) or outside the voice's normal range of operation (in the case of most voice pathologies). In order to assess these different oscillatory states, a novel tool for visualization and analysis of voice dynamics ``on the way to chaos'' is introduced: the phasegram [1, 2].

Phasegrams combine the advantages of sliding-window analysis (such as the spectrogram) with well-established visualization techniques from the domain of non-linear dynamics. In a phasegram, time is mapped onto the x-axis, and various vibratory regimes, such as periodic oscillation, subharmonics or chaos, are identified within the generated graph by the number and stability of horizontal lines.

A phasegram can be interpreted as a bifurcation diagram in time. In contrast to other analysis techniques, it can be automatically constructed from time-series data alone: no additional system parameter needs to be known. Phasegrams show great potential for signal classification and can act as the quantitative basis for further analysis of oscillating systems in many scientific fields, such as physics (particularly acoustics), biology or medicine. The phasegram's usefulness for voice analysis will be demonstrated by analyzing electroglottographic (EGG) signals of excised larynx experiments, singing and pathologic voice production.
C95. Shaheen N. Awan, Andrew R Krauss, Christian T. Herbst (2014). An Examination of the Relationship Between Electroglottographic (EGG) Contact Quotient, EGG Decontacting Phase Profile, and Acoustical Spectral Moments. 43rd Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 29, 2014. presented by Shaheen Awan.
C94. Christian T. Herbst, Jörg Lohscheller, Jan G. Svec, Nathalie Henrich Bernadoni, Gerald Weissengruber, W. Tecumseh Fitch (2014). Electroglottographic and super high-speed video investigation of glottal opening and closing events. 43rd Annual Symposium: Care of the Professional Voice, The Voice Foundation, May 29, 2014. - show abstract
Previous research has suggested that the peaks in the first derivative (dEGG) of the electroglottographic signal (EGG) are good approximate indicators of the events of glottal opening and closing. These findings were based on high-speed video (HSV) recordings with frame rates ten times lower than the sampling frequencies of the corresponding EGG data. The current study attempts to corroborate these previous findings, utilizing super-HSV recordings.

The HSV and EGG recordings (sampled at 27 kHz and 44 kHz, respectively) of excised canine larynx vocalization were synchronized by an external TTL signal to within 0.037 ms. Data were analyzed by means of EGG, dEGG, the glottal area waveform, digital kymograms, glottovibrograms, and the vocal fold contact length (VFCL), a new parameter representing the time-varying degree of ``zippering'' closure along the anterior-posterior (A-P) glottal axis.

The temporal offsets between glottal events (depicted in the HSV recordings) and dEGG peaks in the opening and closing phase of glottal vibration ranged from 0.02 to 0.61 ms, amounting to 0.24 -- 10.88 % of the respective glottal cycle durations. All dEGG double peaks coincided with vibratory A-P phase differences. In two out of the three analyzed video sequences, peaks in the first derivative of the VFCL coincided with dEGG peaks, again co-occurring with A-P phase differences.

The findings suggest that dEGG peaks do not always coincide with the events of glottal closure and initial opening. Vocal fold contacting and de-contacting do not occur at infinitesimally small instants of time, but extend over a certain interval, particularly under the influence of A-P phase differences [1].


Acknowledgements: This research was supported by an ERC Advanced grant no. 230604 `SOMACCA' (C.T.H.), a start-up grant from the University Vienna (W.T.F.), the European Social Fund Project OP VK CZ.1.07/2.3.00/20.0057 (J.G.S.) and by the DFG grant LO1413/2-2 (J.L.).


Reference:

[1] C. T. Herbst, et al., "Glottal opening and closing events investigated by electroglottography and super-high-speed video recordings," J Exp Biol (accepted).
C93. Christian T. Herbst (2014). Same, but different - physical aspects of mammalian sound production. COSB Seminar (invited lecture), Center for Organismal Systems Biology, Faculty of Life Sciences, University of Vienna, Vienna, Austria. May 19, 2014.
C92. Christian T. Herbst (2014). Phasegrams - a novel method for visualizing oscillations in non-linear systems. Seminar, Dept. of Biophysics, Palacky University, Olomouc, Czech Republic. May 15, 2014.
C91. Christian T. Herbst (2014). Freddie Mercury - Acoustical Voice Analysis. 5th Czech-Slovak Symposium on ART VOICE (invited lecture), Clinic of Otolaryngology, Facultiy of Medicine, Comenius University, Bratislava, Slovakia. May 10, 2014.
C90. Christian T. Herbst (2014). Electroglottography -- a low-cost method to non-invasively assess vocal fold vibration. 5th Czech-Slovak Symposium on ART VOICE (invited lecture), Clinic of Otolaryngology, Facultiy of Medicine, Comenius University, Bratislava, Slovakia. May 9, 2014.
C89. Christian T. Herbst (2014). Electroglottography -- a low-cost method to non-invasively assess vocal fold vibration. 3. International Symposium on Chorusology - Choral Art - Singing - Voice (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. April 25, 2014.
C88. Christian T. Herbst (2014). Mutation und Klang - Physiologische Hintergründe und Rahmenbedingungen. Tag der Kinder- & Jugendsingstimme Salzburg (invited lecture), Universität Mozarteum, Salzburg, Austria. April 5, 2014.
C87. Christian T. Herbst (2014). Assessment of vocal fold vibration with videokymography and related techniques. 1st conference of POST-UP, Palacky University, Olomouc, Czech Republic. January 22, 2014.
C86. Christian T. Herbst (2014). The Phasegram - a new method for visualizing system dynamics. 2nd NYU International Symposium VOICE SOURCE CHARACTERISTICS: Methods and Discoveries (invited lecture), NYU Steinhardt School of Culture, Education and Human Development, Department of Music and Performing Arts Professions, New York. January 11, 2014.
C85. Christian T. Herbst (2013). Vom Knabensopran zur Männerstimme - eine Momentaufnahme im Stimmbruch. Grazer Stimmtage (invited lecture), Hals-, Nasen-, Ohren-Universitätsklinik, Klinische Abteilung für Phoniatrie, Graz, Austria. November 15, 2013.
C84. Christian T. Herbst (2013). Brunftzeit! - Der Sexualdimorphismus bei Vokalisationen von Tier und Mensch. Grazer Stimmtage (invited lecture), Hals-, Nasen-, Ohren-Universitätsklinik, Klinische Abteilung für Phoniatrie, Graz, Austria. November 15, 2013.
C83. Christian T. Herbst (2013). Mythos "Primärklang"? - Physiologische Möglichkeiten zur Beeinflussung der Klangfarbe im Gesang. Fachtagung "Musikpädagogik konkret" (invited lecture), Hochschule für Musik und Darstellende Kunst, Frankfurt, Germany. October 18, 2013.
C82. Christian T. Herbst (2013). Angewandte Stimmphysiologie und Akustik. CAS Singstimme -- Fehlfunktionen erkennen, abbauen, vermeiden (invited lecture), Hochschule der Künste Bern, Bern, Switzerland. October 12, 2013. - show abstract
Im Theorieteil werden die akustischen und physiologischen Grundprinzipien der Stimmgebung (Sprache und Stimme) dargestellt. Im praktischen Teil wird mit Probanden gearbeitet (Diagnose und didaktische Arbeit), die als "Grenzfälle" zwischen Pädagogik und Pathologie gelten. Es wird ein Ansatz vermittelt, der sich auf die physiologische Parametrisierung der beeinträchtigten Sprech- und Singstimme konzentriert. Dieses Modul versteht sich als Brückenschlag zwischen Phoniatrie, Sprachtherapie und (Gesangs)didaktik.
C81. Christian T. Herbst (2013). Physiologische Wechselwirkungen zwischen Glottiskonfiguration und Sängeratmung. Symposium "Vom Atem zum Gesang" (invited lecture), Austrian Voice Institute, EVTA Austria, Salzburg, Austria. September 28, 2013. - show abstract
Das Atmungssystem kann sowohl aus physiologischer als auch aus gesangspädagogischer Sicht nicht isoliert vom Stimmapparat betrachtet und behandelt werden. Der Einfluss der Kehlkonfiguration auf den Atemluftstrom und das resultierende Stimmtimbre wurde bereits in der Mitte des 19. Jahrhunderts von Manuel Garcia in seinem "Traité complet de l'art du chant" beschrieben. Darüber hinaus hat die Größe des subglottischen Drucks unmittelbare Auswirkungen auf die Qualität der Stimmlippenschwingungen und beeinflusst somit ebenfalls das Stimmtimbre. Im gegenständlichen Vortrag werden jene physiologischen Wechselwirkungen anhand von aktuellen stimmwissenschaftlichen Erkenntnissen erläutert und daraus - soweit möglich - pädagogische Empfehlungen abgeleitet.
C80. Roland Frey, Elena Volodina, Ilya Volodin, David Reby, Megan Wyman, Christian T. Herbst, Angela S. Stoeger, W. Tecumseh Fitch (2013). The anatomy of low frequency vocalization in mammals. XXIV International Bioacoustics Congress, International Bioacoustics Council (IBAC), Pirenopolis, Brazil. September 10, 2013. presented by Roland Frey.
C79. Christian T. Herbst (2013). Voice pedagogy: What do we need?. 10th Pan European Voice Conference (PEVOC) (invited lecture), Palacky University, Olomouc, Czech Republic. August 23, 2013.
C78. David Howard, Jenevora Williams, Christian T. Herbst (2013). "Ring" in the solo child's singing voice. 10th Pan European Voice Conference (PEVOC), Palacky University, Olomouc, Czech Republic. August 22, 2013. presented by David Howard.
C77. Maxime Garcia, Christian T. Herbst, Bruno Gingras, Markus Boeckle, Yann Locatelli, W. Tecumseh Fitch (2013). Call classification design of the Wild Boar (Sus scrofa) complex vocalization system.. 10th Pan European Voice Conference (PEVOC) (invited lecture), Palacky University, Olomouc, Czech Republic. August 22, 2013. presented by Maxime Garcia. - show abstract
Wild boars live in complex social systems in which individuals interact intensively using multicomponent communication signals such as olfactory and acoustic cues.

Possibly related to their complex vocal tract anatomy, characterized by two pairs of vocal folds, wild boar vocalizations are very diversified, and their heterogeneity was reported in an empirical study led by Klingholz et al. (1979). This analysis had however no statistical support and relied mainly on visual inspections and manual measurements of the parameters generally used in bioacoustics studies at the time. Due to technical advances and deeper knowledge of the physical properties of sounds nowadays, this classification could potentially be validated, or improved, based on a more objective, ``hands-off'' signal analysis and statistical approach.

Here, following a primary visual inspection and computer-aided extraction of acoustical parameters, we applied on the resulting dataset several multivariate analysis approaches, which have proven useful in the identification of vocal repertoires in various species. We attempted to establish, by a comparative means, which classification method is the most appropriate, based on objectivity and repeatability of the measurements.

Quantification and structural characterization of wild boar vocal repertoire is crucial to a better understanding of this species' acoustic communication. This study can provide a solid foundation for further investigation on the production mechanisms (Excised Larynx Experiments), functionality (Playback Experiments), geographical variation, as well as social relevance and transmission of these acoustic signals. Eventually this will help identifying the context and selection pressures that drove the emergence of such vocal displays.
C76. Adam Novozamsky, Jiri Sedlar, Christian T. Herbst, Jan G. Svec, Barbara Zitova, Jan Flusser (2013). VKFD: Computerized analysis of videokymographic data. 10th Pan European Voice Conference (PEVOC), Palacky University, Olomouc, Czech Republic. August 22, 2013. presented by Adam Novozamsky.
C75. Christian T. Herbst, Jan G. Svec, Jörg Lohscheller, Roland Frey, Michaela Gumpenberger, Angela S. Stoeger, W. Tecumseh Fitch (2013). Super Size Me! -- Vibratory Characteristics of an Elephant Larynx. 10th Pan European Voice Conference (PEVOC) (invited lecture), Palacky University, Olomouc, Czech Republic. August 22, 2013. - show abstract
Elephants are the largest land-based mammals. Their low-frequency vocalizations in the infrasonic range (fundamentals below 20 Hz) have been hypothesized to be produced by either of two fundamentally different sound production mechanisms: (a) by a regular pattern of successive EMG bursts (e.g. 20-30 Hz for cat purrs) resulting in consecutive active muscle contractions (AMC); or (b) by flow-induced self-sustaining oscillations in accordance with the myoelastic-aerodynamic (MEAD) theory of sound production.
In a recent publication the author and collaborators have documented self-sustaining, flow-induced vocal fold oscillations in an excised elephant larynx (Loxodonta africana), thus rejecting the AMC mechanism as a plausible cause for elephant infrasound vocal production. Rather, sounds were produced in a manner directly paralleling human speech or song.

Here, a more detailed analysis of the vibratory phenomena seen in the excised elephant larynx is presented. Vocal fold oscillation occurred with a wide variety of vibratory modes, including periodic and complex subharmonic regimes, as well as irregular patterns typically seen in deterministic chaos. Phase delays along the inferior-superior and anterior-posterior (A-P) dimension were commonly observed, as well as travelling wave patterns along the A-P dimension, as yet not documented in the literature. These phenomena might have been facilitated by the large dimensions of the elephant vocal folds (length: 104 mm, thickness: 32 mm). The vestibular folds, when adducted, participated in the tissue vibration, effectively increasing the generated sound pressure level by 12 dB.

In conclusion, the same basic physical principles of voice production apply to mammals of various sizes (i.e. bats, humans, elephants), suggesting that the myoelastic-aerodynamic theory extends across a remarkably wide range of body sizes and vocal frequencies (more than four orders of magnitude). The elephant larynx is, however, not simply a linearly scaled version of the human model, thus giving rise to a range of vibratory phenomena not regularly seen in non-pathologic human phonation.
C74. Laura Enflo, Christian T. Herbst, Johan Sundberg, Anita McAllister (2013). Comparing vocal fold contact criteria derived from electroglottographic and acoustic signals. 10th Pan European Voice Conference (PEVOC), Palacky University, Olomouc, Czech Republic. August 22, 2013. presented by Laura Enflo.
C73. Christian T. Herbst, Svante Granqvist (2013). Voice acoustics, microphones, recording and computers. One-Day Crash Course on 'Voice' (invited lecture), European Academy of Voice, August 21, 2013.
C72. Christian T. Herbst, Angela S. Stoeger, Roland Frey, Jörg Lohscheller, Ingo R. Titze, Michaela Gumpenberger, W. Tecumseh Fitch (2013). Sound production mechanism in elephant infrasound vocalizations. Annual Meeting of the Society of Experimental Biology (invited lecture), Valencia, Spain. July 3, 2013. - show abstract

The sound production of most mammals can be explained by one of two fundamentally different sound production mechanisms: According to the myoelastic-aerodynamic (MEAD) theory of sound production, the primary sound source is generated by flow-induced self-sustaining oscillations of the vocal folds. In an alternative mechanism, sound is created by active muscle contractions (AMC). Here, a regular pattern of successive EMG bursts (e.g. 20--30 Hz for cat purrs) causes the intrinsic laryngeal muscles to modulate the respiratory airflow. See Fig. 1A for body mass and fundamental frequency data of selected mammals producing either MEAD or AMC driven vocalizations.
Elephants are the largest land mammals. They produce low-frequency vocalizations in the infrasonic range (fundamentals below 20 Hz). Both AMC and MEAD have been suggested in the literature as sound production mechanisms, but to date no physiologic evidence for either case has been produced.
Using high-speed video, acoustic and electroglottographic recordings, we documented flow-induced, self-sustaining oscillations of the vocal folds of an excised elephant larynx (Loxodonta africana) at fundamental frequencies below 20 Hz (Fig. 1B and C). We also observed a range of nonlinear phenomena, which are directly comparable to those documented in humans and other mammals. Due to the absence of any neural signals in the excised larynx setup, the AMC mechanism can be rejected for elephant infrasound vocal production. Rather, sounds are produced in a manner directly paralleling human speech or song.
We conclude that the same physical principles of voice production apply to mammals of various sizes (i.e. bats, humans, elephants), and that the MEAD theory extends across a remarkably wide range of body sizes and vocal frequencies (more than four orders of magnitude).
C71. Christian T. Herbst, W. Tecumseh Fitch, Jörg Lohscheller, Jan G. Svec (2013). Estimation of the vertical glottal shape based on empirical high-speed video and electroglottographic data. 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio. June 3, 2013.
C70. Christian T. Herbst, Angela S. Stoeger, Roland Frey, Jörg Lohscheller, Ingo R. Titze, Michaela Gumpenberger, W. Tecumseh Fitch (2013). Sound production mechanism in elephant infrasound vocalizations. The Voice Foundation's 42nd Annual Symposium: Care of the Professional Voice, Philadelphia, PA. May 30, 2013. - show abstract
The sound production of most mammals can be explained by one of two fundamentally different sound production mechanisms: According to the myoelastic-aerodynamic (MEAD) theory of sound production, the primary sound source is generated by flow-induced self-sustaining oscillations of the vocal folds. In an alternative mechanism, sound is created by active muscle contractions (AMC). Here, a regular pattern of successive EMG bursts (e.g. 20--30 Hz for cat purrs) causes the intrinsic laryngeal muscles to modulate the respiratory airflow.

Elephants are the largest land mammals. They produce low-frequency vocalizations in the infrasonic range (fundamentals below 20 Hz). Both AMC and MEAD have been suggested in the literature as sound production mechanisms, but to date no physiologic evidence for either case has been produced.

Using high-speed video, acoustic and electroglottographic recordings, we documented flow-induced, self-sustaining oscillations of the vocal folds of an excised elephant larynx (Loxodonta africana) at fundamental frequencies below 20 Hz. We also observed a range of nonlinear phenomena, which are directly comparable to those documented in humans and other mammals. Due to the absence of any neural signals in the excised larynx setup, the AMC mechanism can be rejected for elephant infrasound vocal production. Rather, sounds are produced in a manner directly paralleling human speech or song.

We conclude that the same physical principles of voice production apply to mammals of various sizes (i.e. bats, humans, elephants), and that the myoelastic-aerodynamic theory extends across a remarkably wide range of body sizes and vocal frequencies (more than four orders of magnitude).
____

Reference: C. T. Herbst, A. Stoeger, et al., "How Low Can You Go? Physical Production Mechanism of Elephant Infrasonic Vocalizations," Science, vol. 337, pp. 595-599 2012.
C69. Christian T. Herbst (2013). Physiologische Grundlagen der Stimmproduktion. 11th World Voice Day (invited lecture), Austrian Voice Institute, Salzburg, Austria. April 15, 2013.
C68. Christian T. Herbst (2013). Vortrag und Praxisdemonstration: Physiologische Grundlagen der Stimmbildung und stimmphysiologische Diagnostik im Gesangsunterricht. Symposium Kinderchorleitung (invited lecture), Universität der Künste, Berlin, Germany. April 13, 2013.
C67. Christian T. Herbst (2013). Assessment of vocal fold vibration with videokymography and related techniques. Seminar, Dept. of Biophysics, Palacky University, Olomouc, Czech Republic. April 3, 2013.
C66. Christian T. Herbst (2013). Of elephants and men - common denominators in mammalian voice production. Seminar (invited lecture), Institute of Biology, University of Southern Denmark, Odense, Danmark. February 1, 2013.
C65. Christian T. Herbst (2012). Python-powered voice analysis. Py4Science lecture series (invited lecture), Research Institute of Molecular Pathology, University of Vienna, Vienna, Austria. December 7, 2012. - show abstract
This talk will focus on voice analysis tools programmed in Python. In particular, I will (a) show how digital kymograms can be created from high-speed videos with a Python plugin written for the FIJI/ImageJ image analysis framework; (b) present a set of modules for integrating Praat (a powerful scriptable voice analysis application) with Python-powerd signal processing algorithms; (c) use Python and ffmpeg to create video animations for teaching and presentations; and (d) demonstrate how a set of signals can be visualized and organized in an HTML browser.
C64. Christian T. Herbst (2012). Investigation of glottal configurations in singing. 6th International Conference on the Physiology and Acoustics of Singing (invited lecture), Department of Music, College of Fine Arts, University of Nevada, Las Vegas, October 18, 2012. - show abstract
It is well known that the voice timbre can be controlled in the vocal tract in various ways. The adjustment of the voice character at the laryngeal level, however, receives less attention, particularly in the pedagogic literature. Hence, this presentation focuses on the sound source: How can singers control and fine-tune the voice timbre by adjustments of the vocal folds? And what are the possibilities of monitoring these maneuvers in a pedagogical or therapeutic setting?

The timbral voice characteristics can be controlled at the laryngeal level by (a) cartilaginous adduction, i.e. the adduction of the posterior glottis via the arytenoids (controlled by the singer with the degree of ``breathiness'' / ''pressedness''); and by (b) membranous medialization through vocal fold bulging (controlled by the choice of vocal register, i.e. chest vs. falsetto). These two maneuvers can be controlled separately by both trained and untrained singers.

A pedagogical model that incorporates the two described physiological parameters consists of four quadrants: aBducted falsetto, aDducted falsetto, aBducted chest, and aDducted chest. Accomplished singers can ``navigate'' this map at will, thus facilitating subtle timbral changes at the laryngeal level. This concept is very promising for voice pedagogy and therapy, and for better understanding various singing styles.

In conclusion of the presentation, a novel method for monitoring vocal fold contact in voice production is put forward: the electroglottographic (EGG) wavegram. It is shown how features seen in this non-invasive technique are related to cartilaginous adduction and membranous medialization. The applicability of the method in the singing study and in a speech therapy setting is discussed.
C63. Christian T. Herbst (2012). Von der Stimmbandschwingung zum Klang -- Qualität und Beeinflussbarkeit des Stimmtimbres auf glottaler Ebene. 51. Berliner Gesangswissenschaftliche Tagung (invited lecture), Universität Potsdam, October 13, 2012. - show abstract
Für die Stimmproduktion in Sprache und Gesang sind drei physikalische Systeme nötig: eine Energiequelle (die Lungen), eine Klangquelle (die Stimmlippen), und ein Modifikator des Klanges (der Vokaltrakt). In dieser Präsentation wird auf die Funktion der Klangquelle auf physischer und physiologischer Ebene näher eingegangen. Ganz salopp gesagt werden folgende zwei Fragen erörtert: "Wie wird Klang eigentlich produziert?'' und "Wie kann der Sprecher/Sänger das steuern?''

Der Stimmklang wird durch die Modifikation des von den Lungen kommenden Atemluftstroms erzeugt. So entstehen periodische Änderungen des Luftdruckes, welche die Basis für den resultierenden Stimmklang bilden. Die relevanten Größen, nämlich der zeitvariable Luftdruck bzw. Luftfluss können direkt auf laryngealer Ebene nur mit extrem invasiven Methoden bzw. gar nicht gemessen werden. Aus diesem Grund greift man in der Stimmdiagnose bzw. der Stimmforschung auf indirekte Messmethoden zurück. Zwei dieser Verfahren, die videoendoskopische Untersuchung (Stroboskopie oder Hochgeschwindigkeits-Videoaufnahmen) und die Elektroglottographie werden hier kurz vorgestellt. Die auf jene Weise gewonnene Daten werden in Verhältnis zu dem resultierenden Stimmklang gesetzt.

Bei der Frage nach der Beeinflussbarkeit des Stimmklanges auf laryngealer Ebene muss zwischen Faktoren in verschiedenen Zeithorizonten unterschieden werden: anatomische Rahmenbedingungen ändern sich meist stetig über viele Jahre (Alterungsprozess, hormonelle und Umwelteinflüsse, organische Veränderungen durch Stimmgebrauch). Auf motorischer Ebene sind mittelfristige (Muskeltonus) bis kurzfristige Beeinflussungen des Stimmklangs möglich. Die Konfiguration des Larynx vor und während der Stimmproduktion durch die extrinsische und intrinsische Kehlkopfmuskulatur spielt sich meist im Zehntelsekundenbereich ab. Die Stimmlippenschwingung selbst, welche als passives physikalisches Phänomen nicht von muskulärer Aktivität abhängig ist, kann nur in (Bruchteilen von) Millisekunden gemessen werden (Schwingungsfrequenzen von ca. 50 - 3500 Hz).

In diesem Vortrag soll auch kurz auf die Beeinflussung der Stimmlippenschwingung (und des resultierenden Klanges) durch die intrinsische Kehlkopfmuskulatur eingegangen werden. Stimmlippenschluss kann durch zwei Arten der glottalen Adduktion (i.e. die Annäherung der Stimmlippen in Phonationsstellung) erreicht werden: durch (a) Adduktion des membranösen Teils der Glottis (eine Verdickung der Stimmlippe durch Aktivität des m. vocalis, gesteuert durch das verwendete Gesangsregister); und (b) Adduktion des knorpeligen Teils der Glottis (durch Positionierung der Aryknorpel, entlang der Dimension "behaucht" vs. "gepresst"). Beide Adduktionsformen können unabhängig voneinander gesteuert werden, und zwar auch von Laiensängern. Auf diese Art kann das Stimmtimbre auf glottaler Ebene beeinflusst werden, entsprechend den ästhetischen Rahmenbedingungen des jeweiligen Gesangsstiles. Dieser Ansatz ist auch in einem therapeutischen Kontext vielversprechend, und zwar bei der Behandlung funktioneller Stimmstörungen (z.B. psychogene Dysphonie).
C62. Christian T. Herbst (2012). How does the singer's instrument work? - Some physical and physiological insights for a conductor's daily work. European Academy for Choral Conductors (invited lecture), Chorverband Österreich, Graz, Austria. September 13, 2012.
C61. Christian T. Herbst (2012). Freddie Mercury - Acoustical Voice Analysis. 10th International Voice Symposium Salzburg (invited lecture), Austrian Voice Institute, Salzburg, Austria. August 26, 2012.
C60. Christian T. Herbst (2012). From vocal fold vibration to sound - why is glottal closure important?. Voice Symposium Salzburg, pre-symposium workshop: Interventional Laryngology and Indirect Phonosurgery (invited lecture), Austrian Voice Institute, Salzburg, Austria. August 24, 2012.
C59. Christian T. Herbst, Jan G. Švec, Josef Schlömicher-Thier, W. T. S. Fitch (2012). Analyzing the female middle register with EGG wavegrams. The Voice Foundation's 41st Annual Symposium: Care of the Professional Voice, May 31, 2012. - show abstract
The choice of singing register and the degree of vocal fold adduction are two concepts that are not easily discriminated by inexperienced singers. This is particularly true for the mid range (pitch C4 -- C5) of untrained female classical singers, where adducted falsetto, the desired sound quality in this range, is rarely observed. As an underlying physiological principle, vocal fold adduction can be separately controlled by (a) cartilaginous adduction, i.e. the adduction of the posterior glottis via the arytenoids (controlled by the singer with the degree of ``breathiness'' / ''pressedness''); and by (b) membranous medialization through vocal fold bulging (controlled by the choice of vocal register, i.e. chest vs. falsetto) [1].

In this study, singing exercises and instructions for adjusting adductory settings (cartilaginous adduction vs. membranous medialization) in the female mid-range were performed by both trained and untrained female classical singers. Phonation was monitored by acoustic recording, electroglottography (EGG) and laryngeal imaging. EGG wavegrams [2], a novel method for displaying EGG signals, were used for data analysis.

EGG wavegram data revealed distinct differences between the targeted phonation types for each individual. The observed differences established themselves as (a) presence/absence of vocal fold contact; (b) duration of vocal fold contact per glottal cycle; (c) changes in the overall EGG signal amplitude; (d) distinctness of opening/closing events; (e) perturbations seen in the wavegrams. Inter-subject data variation suggests that the individual's anatomy influences vocal fold contact in singing. EGG wavegrams proved to be useful in documenting changes of both singing register and glottal adduction.

References
[1] C. T. Herbst, et al., "Membranous and cartilaginous vocal fold adduction in singing," J Acoust Soc Am, vol. 129, pp. 2253-2262, 2011.
[2] C. T. Herbst, et al., "Electroglottographic wavegrams: a technique for visualizing vocal fold dynamics noninvasively," J Acoust Soc Am, vol. 128, pp. 3070-8, Nov 2010.

C58. Christian T. Herbst (2012). Introduction to voice acoustics: formants, articulation and formant tuning. BVA Acoustics Study Day (invited lecture), British Voice Association, London, UK. May 20, 2012. - show abstract
When analyzing the human voice as an acoustical system, it can be decomposed into three parts: the power source (i.e. the lungs); the sound source (i.e. the larynx); and the sound modifiers (i.e. the vocal tract). In this 60 minute tutorial, the basic physical and physiological mechanisms of sound modification through the vocal tract are discussed: After introducing periodic vibration, harmonic series and the sound spectrum, the acoustic filter function of the vocal tract, facilitated by formants, is explained. The concept of formant tuning is established, and two example applications thereof (female and male singing voice) are portrayed.
C57. Christian T. Herbst (2012). Investigation of the mammal voice source in an excised larynx setup. CogBio Seminar, University of Vienna, Department of Cognitive Biology, Vienna, Austria. April 23, 2012. - show abstract
The source of the human voice originates in the larynx. It is in most cases generated by the vibrating vocal folds. In this presentation, a new method for visualization and analysis of the electroglottographic (EGG) signal (i.e. a physiological correlate of vocal fold vibration) is presented. This method, termed ``EGG wavegram'', allows to display EGG signals (and their first derivative, DEGG) across various phonations in one graph, whilst retaining the original appearance of the unaltered waveform.

The EGG signal is decomposed into consecutive individual cycles, each of which is normalized in both duration and amplitude, and is displayed on the y-axis, going from bottom to top. Overall time is shown on the x-axis. In a DEGG wavegram, the first derivative of the EGG signal is used as the input signal. In such a display, the contacting and de-contacting phases for each glottal cycle are approximated by (a) one or more dark horizontal line(s) at the lower end of the graph (contacting phase), and (b) one or more light horizontal line(s) in the upper section of the graph (de-contacting phase).

Much like in a sound spectrogram, information on vibratory behavior developing in time is compacted into one single graph, thus providing insight into changes of vocal fold dynamics. As such, the wavegram allows intuitive assessment of the time-varying contact phase of phonation over a longer period of time, indicating physiological changes of laryngeal configuration, such as vocal register. EGG wavegrams promise to be useful in research, clinical diagnostics, voice therapy and voice pedagogy.


References
Christian T. Herbst, W. T. S. Fitch, Jan G. Švec (2010). "Electroglottographic wavegrams: a technique for visualizing vocal fold dynamics noninvasively." J. Acoust. Soc. Am. 128 (5), 3070-3078
Christian T. Herbst (2012). Investigation of glottal configurations in singing. Palacký University in Olomouc, the Czech Republic (Doctoral Dissertation)
C56. Christian T. Herbst (2012). Control of sound source properties in singing. 2nd International Artistic and Scientific Symposium on Choral Art, Singing and Voice (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. April 13, 2012.
C55. Christian T. Herbst (2012). The sound source in singing: acoustical and physiological principles. 2nd International Artistic and Scientific Symposium on Choral Art, Singing and Voice (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. April 12, 2012.
C54. Malte Kob, Jan G. Svec, Christian T. Herbst (2012). Akustische Analyse von Stimmparametern. 9. Wiener Fortbildungskurs "Praxis der Stimmdiagnostik" (invited lecture), Medizinische Universität Wien, Univ.-HNO-Klinik, Klinische Abteilung Phoniatrie-Logopädie, Vienna, Austria. March 31, 2012.
C53. Christian T. Herbst (2012). Registerbeschreibung mit Hilfe der Elektroglottographie. 9. Wiener Fortbildungskurs "Praxis der Stimmdiagnostik" (invited lecture), Medizinische Universität Wien, Univ.-HNO-Klinik, Klinische Abteilung Phoniatrie-Logopädie, Vienna, Austria. March 30, 2012.
C52. Christian T. Herbst (2012). Akustische Analyse der Singstimme von Freddie Mercury. 5. Freiburger Stimmforum "populärer Gesang" (invited lecture), Freiburger Institut für Musikermedizin, Universitätsklinikum Freiburg, Freiburg, Germany. March 24, 2012.
C51. Christian T. Herbst (2012). Electroglottographic Wavegrams -- a new tool to assess sound source properties in speech and singing. Seminar, Audio Engineering Society (invited lecture), University of York, UK. February 1, 2012. - show abstract
The source of the human voice originates in the larynx. It is in most cases generated by the vibrating vocal folds. In this presentation, a new method for visualization and analysis of the electroglottographic (EGG) signal (i.e. a physiological correlate of vocal fold vibration) is presented. This method, termed ``EGG wavegram'', allows to display EGG signals (and their first derivative, DEGG) across various phonations in one graph, whilst retaining the original appearance of the unaltered waveform.

The EGG signal is decomposed into consecutive individual cycles, each of which is normalized in both duration and amplitude, and is displayed on the y-axis, going from bottom to top. Overall time is shown on the x-axis. In a DEGG wavegram, the first derivative of the EGG signal is used as the input signal. In such a display, the contacting and de-contacting phases for each glottal cycle are approximated by (a) one or more dark horizontal line(s) at the lower end of the graph (contacting phase), and (b) one or more light horizontal line(s) in the upper section of the graph (de-contacting phase).

Much like in a sound spectrogram, information on vibratory behavior developing in time is compacted into one single graph, thus providing insight into changes of vocal fold dynamics. As such, the wavegram allows intuitive assessment of the time-varying contact phase of phonation over a longer period of time, indicating physiological changes of laryngeal configuration, such as vocal register. EGG wavegrams promise to be useful in research, clinical diagnostics, voice therapy and voice pedagogy.


References
Christian T. Herbst, W. T. S. Fitch, Jan G. Švec (2010). "Electroglottographic wavegrams: a technique for visualizing vocal fold dynamics noninvasively." J. Acoust. Soc. Am. 128 (5), 3070-3078
Christian T. Herbst (2012). Investigation of glottal configurations in singing. Palacký University in Olomouc, the Czech Republic (Doctoral Dissertation)
C50. Jan G. Švec, Jaromir Horacek, Tomas Vampola, Christian T. Herbst, Donald G. Miller, Radovan Havlik, Petr Krupa, Mojmir Lejska (2012). Acoustic and articulatory adjustments in operating singing: spectral analysis, magnetic resonance imaging and finite-element modeling. International Voice Symposium. Subglottal Pressure Measurement and Source-Filter Interaction: Two Current Issues in Voice Research (invited lecture), New York University, Steinhardt School of Culture, Education, and Human Development, New York. January 7, 2012. presented by Jan G. Švec.
C49. Jakob Unger, Tobias Meyer, Christian T. Herbst, Michael Döllinger, Jörg Lohscheller (2011). PVG-Wavegramm: Dreidimensionale Visualisierung von Stimmlippendynamik. 28. Wissenschaftliche Jahrestagung der Deutschen Gesellschaft für Phoniatrie und Pädaudiologie e. V., Zurich, Switzerland. September 10, 2011. presented by Jakob Unger.
C48. Christian T. Herbst, W. T. S. Fitch, Josef Schlömicher-Thier, Jan G. Švec (2011). Observing the female middle register using EGG wavegrams. 9th Pan-European Voice Conference (PEVOC) (invited lecture), September 1, 2011. - show abstract
The choice of singing register and the degree of vocal fold adduction are two concepts that are not easily discriminated by inexperienced singers. This is particularly true for the mid range (pitch C4 -- C5) of untrained female singers, where sounds are often produced in either (a) fully adducted chest register or (b) breathy falsetto register. An adducted falsetto register, which is the desired sound source function of classical singing above a pitch of D4 is often not observed in untrained females.

As an underlying physiological principle, vocal fold adduction can be separately controlled by (a) cartilaginous adduction, i.e. the adduction of the posterior glottis via the arytenoids (controlled by the singer with the degree of ``breathiness'' / ''pressedness''); and by (b) membranous medialization through vocal fold bulging (controlled by the choice of vocal register, i.e. chest vs. falsetto).[1] The electroglottographic (EGG) signal is well suited to detect changes in both membranous medialization (i.e. registers)[2] and cartilaginous adduction[3] in singing.

In this study, the EGG wavegram[4], a novel method for displaying and analyzing EGG signals was used as a real-time feedback tool in the voice studio. In the context of singing exercises and instructions designed for this purpose, it was employed to help amateur female singers to understand and to better control the wide range of adductory settings (cartilaginous adduction vs. membranous medialization) in their middle range.

Wavegram data reveals distinct differences between abducted and adducted falsetto register for each individual. The observed differences established themselves as (a) presence/absence of vocal fold contact; (b) degree of irregularities reflected in the EGG signal to noise ratio; (c) absence/presence of DEGG double peaks. The results suggest that subjects can learn to increase cartilaginous adduction in their falsetto register using real time EGG wavegram feedback.


[1] C. T. Herbst, et al., "Membranous and cartilaginous vocal fold adduction in singing," J Acoust Soc Am, vol. accepted for publication, 2011.
[2] N. Henrich, et al., "Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency," J. Acoust. Soc. Am., vol. 117, pp. 1417-1430, 2005.
[3] C. T. Herbst, et al., "Using Electroglottographic Real-Time Feedback to Control Posterior Glottal Adduction during Phonation," J. Voice, vol. 24, pp. 72-85, 2010.
[4] C. T. Herbst, et al., "Electroglottographic wavegrams: a technique for visualizing vocal fold dynamics noninvasively," J Acoust Soc Am, vol. 128, pp. 3070-8, Nov 2010.

C47. Christian T. Herbst, Jan G. Švec (2011). Voice acoustics, microphones, recording and computers. One-Day Crash Course on 'Voice' (invited lecture), European Academy of Voice, August 30, 2011.
C46. Jakob Unger, Tobias Meyer, Christian T. Herbst, Michael Döllinger, Jörg Lohscheller (2011). PVG-Wavegrams: Three-dimensional visualization of vocal fold dynamics. 7th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA), August 25, 2011. presented by Jakob Unger.
C45. Christian T. Herbst, W. T. S. Fitch, Jan G. Švec (2011). Wavegrams: A new technique for visualizing vocal fold dynamics noninvasively using electroglottographic signals. 40th Annual Symposium: Care of the Professional Voice, The Voice Foundation, June 2, 2011. - show abstract
A new method for analyzing and displaying EGG signals (and their first derivative, DEGG) is introduced: the electroglottographic wavegram (short: wavegram). It (a) allows monitoring the EGG (or DEGG) signal over time; and (b) provides an intuitive means for quickly assessing the duration of glottal closure and its variation over time.

Based on the EGG or DEGG signal, the time-varying fundamental frequency is calculated and consecutive individual glottal cycles are identified. Each cycle is locally normalized in duration and amplitude and the cycles are then plotted consecutively. The plotting process resembles that of a spectrogram, but instead of spectral amplitudes, the signal deflections are encoded by color intensity. The wavegram maps time on the x-axis, normalized cycle duration on the y-axis and the signal deflection on the color-intensity-coded z-axis.

Variations in vocal fold contact appear in the wavegram as a sequence of events, rather than single phenomena. These events take place over a certain period of time and change with pitch, loudness and register. Multiple DEGG peaks are revealed in wavegrams to behave systematically, indicating subtle changes of vocal fold oscillatory regime. As such, EGG wavegrams promise to reveal more information on vocal fold contacting and de-contacting events than previous methods.

In this presentation, wavegrams of human and mammal phonations are shown. Their physiologic relevance is discussed in relation to glottal configurations and vocal fold vibratory patterns, as seen in laryngeal imaging.

Reference: Christian T. Herbst, W. T. S. Fitch, Jan G. Švec (2010). "Electroglottographic wavegrams: a technique for visualizing vocal fold dynamics noninvasively." J. Acoust. Soc. Am. 128 (5), 3070-3078
C44. Christian T. Herbst (2011). Vocal folds and the voice timbre. 3rd Czech-Slovak Symposium on ART VOICE (invited lecture), Hlasové a sluchové centrum Praha, s.r.o., Prague, Czech Republic. May 21, 2011. - show abstract
It is well known that the voice timbre can be controlled in the vocal tract in various ways. The adjustment of the voice character at the laryngeal level, however, receives less attention, particularly in the pedagogic literature. Hence, this presentation focuses on the sound source: How can singers control and fine-tune the voice timbre by adjustments of the vocal folds?

We recognize three basic vocal fold adjustments: (a) adduction (and abduction) of the posterior glottis; (b) thickening/bulging of the vocal folds; (c) elongation of the vocal folds. In this presentation, the first two adjustments are examined more closely:

(a) Cartilaginous adduction, i.e. adduction of the posterior glottis, is maintained through the (lateral) cricoarytenoid and the interarytenoid muscles. Phonation with a fully adducted glottis is characterized by strong high-frequency partials ("overtones"), thus giving the voice a "brassy", "ringing" or "resonant" quality. On the other hand, phonation with a posterior glottal gap (the posterior glottis is not fully adducted) creates high-frequency partials of lesser strength. The voice then has a "fluty" or "dull" quality, and most likely contains noise components ("breathy voice").

(b) Membranous medialization, i.e. thickening (bulging) of the vocal folds is controlled via the singing register. In chest (modal) register, the thyroarytenoid (vocalis) muscle is contracted, and the vocal folds are medially bulged. This introduces vertical phase differences into the vocal fold vibration, effectively shortening the open phase and increasing the amount of high-frequency partials. Phonation with a relaxed thyroarytenoid muscle, on the other hand, is usually identified as falsetto (or sometimes "head") register. It is characterized by a less "resonant" sound, containing weaker overtones.

It has been shown that these two maneuvers can be controlled separatedly by both trained and untrained singers [1, 2].
The ability to individually and gradually control cartilaginous adduction and membranous medialization allows experienced singers to produce a great variety of vocal timbres at the laryngeal level, thus increasing the quality of their artistic performance. This concept is also very useful in voice pedagogy, particularly in training the female mid-range in classical singing.

References:
[1] Herbst C., Svec J., Ternström S. (2009) Investigation of four distinct glottal configurations in classical singing - a pilot study. J.Acoust.Soc.Am. 125:EL104-EL109.
[2] Herbst C.T., Qiu Q., Schutte H.K., Svec J.G. (2011) Membranous and cartilaginous vocal fold adduction in singing. J Acoust Soc Am 129:2253-2262
C43. Christian T. Herbst (2011). Wie funktioniert Stimme?. Regensburger Stimmtag: ein regionaler Beitrag zum World Voice Day (invited lecture), Regensburger Ärztenetz e.V., Regensburg, Germany. April 16, 2011.
C42. Christian T. Herbst (2011). Beschaffenheit und Ausbaumöglichkeit der Kinderstimme. Kinderchorleitungssymposium (invited lecture), Universität der Künste, Berlin, Germany. February 11, 2011.
C41. Christian T. Herbst (2011). Stimmbildungsunterricht mit Knaben und Mädchen. Kinderchorleitungssymposium (invited lecture), Universität der Künste, Berlin, Germany. February 11, 2011.
C40. Christian T. Herbst (2010). Understanding vocal timbre in singing - a tutorial. Europa Cantat General Assembly 2010 (invited lecture), Europa Cantat, Namur, Belgium. November 28, 2010. - show abstract
Timbre, known in psychoacoustics as tone quality or tone color, distinguishes different types of sound production. In singing, timbre is mainly influenced by: amplitude; fundamental frequency and variations thereof; the amount of high-frequency energy components (``overtones'', singers' formant); vowel quality; and the noise level (degree of breathiness). In a 45 minute tutorial, an overview over those sound qualities is given. It is shown that they are mainly controlled by two physiologic means: adjustments of the vocal tract and adjustments of the sound source, i.e. the laryngeal configuration. The practical application of this knowledge to choir singing practice is briefly discussed.
C39. Christian T. Herbst, Josef Schlömicher-Thier, Matthias Weikert (2010). Berufsstimmbetreuung in der HNO-Praxis - eine stimmdiagnostische, -therapeutische und gesangsdidaktische Synopsis. Stuttgarter Stimmtage (invited lecture), Staatliche Hochschule für Musik und Darstellende Kust, Stuttgart, Germany. October 2, 2010.
C38. Josef Schlömicher-Thier, Hans E. Eckel, Christian T. Herbst (2010). Interventionelle Laryngologie in der HNO-Praxis. 54th annual Meeting of the Austrian Society of Oto-Rhino-Laryngology, Head and Neck Surgery, Salzburg, Austria. September 17, 2010. presented by Josef Schlömicher-Thier. - show abstract
Sprech- und Singstimmprobleme sind hauptsächlich durch eine Beeinträchtigung der Form bzw. Beweglichkeit der laryngealen Strukturen bedingt. Während Formdefizite in der allgemeinen HNO-Heilkunde relativ leicht erkannt werden können, stellen Beweglichkeitsdefizite eine weit größere Herausforderung dar.

Die Bewegung von laryngealen Strukturen kommt auf zwei Arten zustande: (1) größere, relativ langsame, durch Muskelkontraktion bedingte Bewegungen, die mit freiem Auge (Kehlkopfspiegelung) erkennbar sind (< 15 Hz), z.B. Adduktion der Stimmlippen und Taschenfalten, Längsspannung der Stimmlippen, Änderung der vertikalen Kehlkopfposition; und (2) kleinere und relativ schnelle Bewegungen der Stimmlippen, i.e. Oszillation der Stimmlippen bedingt durch aerodynamisch-mechanische Vorgänge (> 50 Hz). Die Gründe der negativen Beeinflussung von Qualität bzw. Periodizität jener Schwingungen sind bei bloßer Kehlkopfspiegelung bzw. ohne videostrobolaryngoskopischer Untersuchung oft nicht zu erkennen.

In diesem Workshop wird auf Spezialfälle von Störungen der Stimmlippen-Beweglichkeit eingegangen. Neben einer grundlegenden Erläuterung von akustischen und physiologischen Rahmenbedingungen werden Diagnose- und Therapieansätze vorgestellt. Insbesondere wird auf folgende Pathologien eingegangen:

(1) Neurologische Stimmstörungen: a) Paresen (Technik der Stimmlippenaugmentation, Vocastim-Therapie); b) Spasmodische Stimmstörungen (Botulinumtoxin-Therapie); (2) Umgang mit der chron. Laryngitis, insb. Diagnostik und Therapie der Refluxerkrankung; (3) Indirekte phonochirurgische Maßnahmen bei organischen Stimmstörungen; (4) Psychogene Stimmstörung: psychotherapeutische Maßnahmen, funktioneller Stimmaufbau
C37. Christian T. Herbst, W. T. S. Fitch, Jan G. Švec (2010). Wavegrams: A new technique for visualizing vocal fold dynamics noninvasively using electroglottographic signals. 9th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research (AQL), September 1, 2010. - show abstract
Electroglottography (EGG) is a non-invasive low-cost method to monitor relative vocal fold contact area (VFCA) during phonation. Increase and decrease of VFCA is related to glottal closing and opening, respectively. In this study, a new method for analyzing and displaying EGG signals (and their first derivative, DEGG) is introduced: the electroglottographic wavegram (short: wavegram). It (a) allows monitoring the EGG (or DEGG) signal over time; and (b) provides an intuitive means for quickly assessing the duration of glottal closure and its variation over time.

Based on the EGG or DEGG signal, the time-varying fundamental frequency is calculated and consecutive individual glottal cycles are identified. Each cycle is locally normalized in duration and amplitude and the cycles are then plotted consecutively. The plotting process resembles that of a spectrogram, but instead of spectral amplitudes, the signal deflections are encoded by color intensity. The wavegram presents the time on the x-axis, normalized cycle duration on the y-axis and the signal deflection on the color-intensity-coded z-axis.

The wavegram reveals changes of vocal fold contact duration in time. It also shows phenomena that remain overlooked in traditional EGG-display techniques, such as multiple DEGG peaks. While these phenomena have usually been considered artifacts, the wavegram displays revealed consistent behavior of these peaks in a large number of subjects. They indicate subtle changes of vocal fold oscillatory regime.

Wavegram analysis suggests that the phenomenon of vocal fold closing and opening is more complex than commonly assumed. Rather than a single event, vocal fold opening and closing should be considered a sequence of events, taking place over a certain period of time. Data show that the sequence of these events can change with pitch, loudness and register. The EGG signal thus promises to reveal more (physiological) information on vocal fold closure and opening events than previously thought.
C36. Christian T. Herbst (2010). Das Timbre im klassischen Gesang: akustisches und physiologisches Tutorial. 9th Voice Symposium Salzburg, Austrian Voice Institute, Salzburg, Austria. August 28, 2010. - show abstract
Timbre, known in psychoacoustics as tone quality or tone color, distinguishes different types of sound production. In singing, timbre is mainly influenced by: amplitude; fundamental frequency and variations thereof; the amount of high-frequency energy components (``overtones'', singers' formant); vowel quality; and the noise level (degree of breathiness). In a 30 minute tutorial, an overview over those sound qualities is given. It is shown that they are mainly controlled by two physiologic means: adjustments of the vocal tract and adjustments of the sound source, i.e. the laryngeal configuration.
C35. Christian T. Herbst, W. T. S. Fitch, Jan G. Švec (2010). Visualizing electroglottographic signals with wavegrams. 5th International Conference on the Physiology and Acoustics of Singing (PAS5), Kungliga Tekniska Högskolan, Stockholm, Sweden. August 11, 2010. - show abstract
Electroglottography (EGG) is a non-invasive low-cost method to monitor relative vocal fold contact area (VFCA) during phonation. Increase and decrease of VFCA is related to glottal closing and opening, respectively. In this study, a new method for analyzing and displaying EGG signals (and their first derivative, DEGG) is introduced: the electroglottographic wavegram (short: wavegram). It (a) allows monitoring the EGG (or DEGG) signal over time; and (b) provides an intuitive means for quickly assessing the duration of glottal closure and its variation over time.

Based on the EGG or DEGG signal, the time-varying fundamental frequency is calculated and consecutive individual glottal cycles are identified. Each cycle is locally normalized in duration and amplitude and the cycles are then plotted consecutively. The plotting process resembles that of a spectrogram, but instead of spectral amplitudes, the signal deflections are encoded by color intensity. The wavegram presents the time on the x-axis, normalized cycle duration on the y-axis and the signal deflection on the color-intensity-coded z-axis.

The wavegram reveals changes of vocal fold contact duration in time. It also shows phenomena that remain overlooked in traditional EGG-display techniques, such as multiple DEGG peaks. While these phenomena have usually been considered artifacts, the wavegram displays revealed consistent behavior of these peaks in a large number of subjects. They indicate subtle changes of vocal fold oscillatory regime.

Wavegram analysis suggests that the phenomenon of vocal fold closing and opening is more complex than commonly assumed. Rather than a single event, vocal fold opening and closing should be considered a sequence of events, taking place over a certain period of time. Data show that the sequence of these events can change with pitch, loudness and register. The EGG signal thus promises to reveal more (physiological) information on vocal fold closure and opening events than previously thought.
C34. Christian T. Herbst, Jan G. Švec, Qingjun Qiu, Harm Schutte (2010). Membranous and cartilaginous glottal adduction in singing - experimental findings and pedagogic considerations. Choice for Voice Conference, British Voice Association, London, U. K.. July 1, 2010. - show abstract
While it has been recognized that glottal adduction is an important parameter in speech, relatively little has been known on the adjustment of the glottal adduction when changing the voice quality in singing. Our previous pilot data on a single subject suggest that the cartilaginous and membranous parts of glottis play different roles in singing -- while the membranous part is expected to aid in switching between the chest and falsetto registers, the cartilaginous part is expected to play a primary role for adjusting the sound quality within the desired register. The goal of this study was to design singing exercises that enable both trained and untrained singers to independently manipulate cartilaginous and membranous glottal adduction and to verify these exercises on a set of subjects laryngoscopically.

A baritone, who was previously found capable of independently manipulating the cartilaginous and membranous glottal adduction, served as an instructor in this study. 6 female and 6 male subjects, singers and non-singers, were asked to imitate the instructor in producing 4 phonation types, i.e. (FaB) 'aBducted falsetto'; (FaD) 'aDducted falsetto'; (CaB) 'aBducted chest'; and (CaD) 'aDducted chest', at a pitch located within the range of the chest/falsetto register transition (C#4 to F4). In order to maintain the desired register, the target notes for chest and falsetto were reached by singing an ascending and descending, respectively, scale of five notes starting in the desired register. The subjects were asked not to 'blend or mix the registers'. The phonation was monitored by videostroboscopy, videokymography (VKG), electroglottography (EGG) and audio recording.

The results showed distinct laryngeal configurations and vocal fold vibration characteristics for the four phonation types. As expected, all the subjects showed a less adducted posterior, i.e. cartilaginous, glottis in phonation types FaB and CaB than in phonations types FaD and CaD. Changes in the membranous part of the vocal folds were reflected in videokymographic imaging which revealed that the chest phonations, as compared to the falsetto phonations, had larger mucosal waves, sharper lateral peaks and longer closed quotient.

The findings indicate that the singers succeeded in independently manipulating the membranous and cartilaginous adduction of the glottis. Individual control over these two types of glottal adduction is expected to be a key factor for the experienced singer to create different vocal timbres. The designed singing exercises were found useful in training the subjects for achieving this goal.

In the final part of this presentation, some practical considerations and possible pedagogical strategies for classical singing are discussed.
C33. Jan G. Švec, Jaromir Horacek, Tomas Vampola, Christian T. Herbst, Donald G. Miller, Radovan Havlik, Petr Krupa, Mojmir Lejska (2010). Acoustic and articulatory adjustments in operatic singing: Spectral analysis and magnetic resonance imaging. COST 2103: 4th Advanced Voice Function Workshop AVFA'10, York, U.K.. May 20, 2010. presented by Jan G. Švec.
C32. Christian T. Herbst, Josef Schlömicher-Thier (2010). Die Sängerbetreuung in stimmpädagogischer und sängermedizinischer Kooperation. Berner Symposium Medizin, Logopädie, Gesangspädagogik (invited lecture), Hochschule der Künste Bern, Bern, Switzerland. April 17, 2010.
C31. Christian T. Herbst, Josef Schlömicher-Thier (2010). Pedagogical and Medical Cooperation in Voice Patient Care. Symposium Ars Choralis (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. April 10, 2010. presented by Christian T. Herbst. - show abstract
In voice patient care, we are approached by two types of clients: those who want to "be able to sing again" and those who want to "be able to sing better". In both cases, the underlying predicament has several aspects: emotional, life-style related, medical, pedagogical. Since those aspects are often related to each other, a multi-dimensional approach for treating the patient is required.

In this presentation, several boundary conditions for a "good" voice are established. To illustrate the effects of medical and pedagogical measures, we show case studies from our regular work with voice patients.
C30. Christian T. Herbst (2010). Voice Timbre in Singing. Symposium Ars Choralis (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. April 10, 2010. - show abstract
Timbre, known in psychoacoustics as tone quality or tone color, distinguishes different types of sound production. In singing, timbre is influenced by vowel quality, amount of high-frequency energy components ('overtones', singers' formant) and the noise level (degree of breathiness). Those sound qualities are mainly controlled by two physiologic means: adjustments of the vocal tract and adjustments of the sound source, i.e. the laryngeal configuration.
C29. Christian T. Herbst (2010). Vocal Fold Adduction and Registers in Classical Singing. Symposium Ars Choralis (invited lecture), Croatian Choral Directors Association, Zagreb, Croatia. April 8, 2010. - show abstract
Register control in singing is physiologically achieved mostly by the vocalis muscle (membranous adduction). On the other hand, the degree of adduction of the posterior part of the glottis (cartilaginous adduction, regulated by laryngeal adductory muscles PCA and IA) is known to have an influence on the 'richness' of the vocal sound source.

In a recent study it has been shown, that both trained and untrained singers can independently vary those two types of laryngeal adjustment. The independent control over cartilaginous and membranous adduction allows singers to create different vocal timbres at the laryngeal level. In singing pedagogy, this knowledge can be used to effectively address certain technical problems, such as running out of breath, or register violations.
C28. Ramona Steiner, Christian T. Herbst, David Howard (2009). Electroglottographic (EGG) real-time biofeedback to enhance glottal adduction in patients with unilateral vocal fold pareses. 8th Pan European Voice Conference (PEVOC), Dresden, Germany. August 28, 2009. presented by Ramona Steiner. - show abstract

The central deficit in unilateral vocal fold pareses (UVFP) is insufficient glottal adduction. Several well-established therapy methods exist, but their efficiency is rarely evaluated objectively. Except for auditory feedback via bone conduction, patients have no means to assess the targeted change in voice quality. In a recent study, electroglottography (EGG) has successfully been used as a real-time biofeedback tool in order to increase the degree of posterior glottal closure in a healthy amateur singer. EGG has also been used recently in an attempt to document voice quality in patients with vocal fold pareses. In this study we investigate whether electroglottographic real-time biofeedback can be used to increase therapy efficiency by enhancing glottal adduction in patients with UVFP.

For this experiment four patients with diagnosed infranuclear UVFP act as subjects. Habitual phonation was documented simultaneously by means of videolaryngoscopy, electroglottography and audio recording when sustaining a vowel at a comfortable pitch. In a therapeutic session, using phonatory exercises (conservative approach), subjects were shown a real-time EGG-waveform (normalized in both amplitude and time) representing one glottal cycle which changes over time. As they followed the instructions of the therapist, they were asked consciously to introduce changes into the shape of the displayed EGG-waveform. Therapy sessions were documented by means of simultaneous recording of acoustic and electroglottographic data. Immediately after therapy session the patients' attempt to apply the potentially improved phonatory behaviour was documented simultaneously by means of videolaryngoscopy, electroglottography and audio recording, again when sustaining a vowel at a comfortable pitch.

First tests showed that the EGG-signal could be detected in a patient with a chronic UVFP, and that he was able to willingly introduce changes into the displayed EGG-waveform during therapy for a sustained vowel. The sessions further explored the effect of training on the ability of patients to change the shape of the EGG-waveform at will which provides support for the use of EGG in therapy.
C27. Christian T. Herbst, Jan G. Švec, Qingjun Qiu, Harm Schutte (2009). Membranous and cartilaginous glottal adduction in singing. 8th Pan European Voice Conference (PEVOC), Dresden, Germany. August 27, 2009. - show abstract
While it has been recognized that glottal adduction is an important parameter in speech, relatively little has been known on the adjustment of the glottal adduction when changing the voice quality in singing. Our previous pilot data on a single subject suggest that the cartilaginous and membranous parts of glottis play different roles in singing -- while the membranous part is expected to play an important role for switching between the chest and falsetto registers, the cartilaginous part is expected to play a primary role for adjusting the sound quality within the desired register. The goal of this study was to design singing exercises that enable both trained and untrained singers to independently manipulate cartilaginous and membranous glottal adduction and to verify these exercises on a set of subjects laryngoscopically.

A baritone, who was previously found capable of independently manipulating the cartilaginous and membranous glottal adduction, served as an instructor in this study. 6 female and 6 male subjects, singers and non-singers, were asked to imitate the instructor in producing 4 phonation types, i.e. (A) 'naïve falsetto'; (B) 'quality falsetto'; (C) 'light chest'; and (D) 'heavy chest', at a pitch located within the range of the chest/falsetto register transition (C#4 to F4). In order to maintain the desired register, the target notes for chest and falsetto were reached by singing an ascending and descending, respectively, scale of five notes starting in the desired register. The subjects were asked not to 'blend or mix the registers'. The phonation was monitored by videostroboscopy, videokymography (VKG), electroglottography (EGG) and audio recording.

The results showed distinct laryngeal configurations and vocal fold vibration characteristics for the four phonation types. As expected, all the subjects showed a less adducted posterior, i.e. cartilaginous, glottis in phonation types A and C than in phonations types B and D. Changes in the membranous part of the vocal folds were reflected in videokymographic imaging which revealed that the chest phonations, as compared to the falsetto phonations, had larger mucosal waves, sharper lateral peaks and longer closed quotient.

The findings indicate that the singers succeeded in independently manipulating the membranous and cartilaginous adduction of the glottis. Individual control over these two types of glottal adduction is expected to be a key factor for the experienced singer to create different vocal timbres. The designed singing exercises were found useful in training the subjects for achieving this goal.
C26. Jan G. Švec, Jaromir Horacek, Tomas Vampola, Christian T. Herbst, Donald G. Miller, Radovan Havlik, Petr Krupa, Mojmir Lejska (2009). Acoustic and Articulatory Adjustments for Singers' Formant Production: Spectral Analysis, MRI and Finite Element Modeling. The Voice Foundation's 38th Annual Symposium, The Voice Foundation, Philadelphia. June 1, 2009. presented by Jan G. Švec.
C25. Jan G. Švec, Christian T. Herbst, Sten Ternström (2009). Membranous versus cartilaginous glottal adduction in four singing voice qualities: Pilot laryngostroboscopic and videokymographic observations. Proceedings of AVFA09, 3rd Advanced Voice Function Assessment International Workshop, May 18, 2009. presented by Jan G. Švec. - show abstract
This study investigates four qualities of singing voice in a classically trained baritone: "naïve falsetto", "countertenor falsetto", "lyrical chest" and "full chest". Laryngeal configuration and vocal fold behavior in these qualities were studied using laryngeal videostroboscopy, videokymography, electroglottography, and sound spectrography. The data suggest that the four voice qualities were produced by independently manipulating mainly two laryngeal parameters: (1) the adduction of the arytenoid cartilages and (2) the thickening of the vocal folds. An independent control of the posterior adductory muscles versus the vocalis muscle is considered to be the physiological basis for achieving these singing voice qualities.
C24. Christian T. Herbst, Josef Schlömicher-Thier (2009). Stimme - Ausdrucksmittel und Werkzeug im Kunstbetrieb. Symposium: Internationales Theaterinstitut der UNESCO - Centrum Österreich, Vienna, Austria. March 28, 2009.
C23. Jan G. Švec, Christian T. Herbst, Radovan Havlik, Jaromir Horacek, P. Krupa, M. Lejska, Donald G. Miller (2008). Singer's formant: Preliminary results of MRI and acoustic evaluations of singers. Proceedings Interaction and Feedbacks 2008, Prague:Institute of Thermomechanics AS CR, November 1, 2008. presented by Jan G. Švec.
C22. Christian T. Herbst, Elke Duus (2008). Stimmliche Leistungsbeurteilung von SängerInnen im Amateurchor. 8th Voice Symposium Salzburg, Austrian Voice Institute, Salzburg, Austria. July 27, 2008.
C21. Christian T. Herbst (2008). Pressen, behauchtes Singen und Registerdivergenzen - Einfluss der glottischen Konfiguration auf das Timbre im klassischen Gesang. 8th Voice Symposium Salzburg, Austrian Voice Institute, Salzburg, Austria. July 26, 2008.
C20. Christian T. Herbst (2008). Einfluss der Stimmlippentätigkeit auf das Gesangstimbre. Guest Lecture, Hochschule für Musik, Köln, June 11, 2008.
C19. Christian T. Herbst, Josef Schlömicher-Thier (2007). Visualization and Analysis of Electroglottographic Waveforms. XVI Annual PVSF/UCLA Voice Conference, Los Angeles, CA. October 25, 2007. presented by Josef Schlömicher-Thier.
C18. Christian T. Herbst (2007). Glottal Contact in Singing. 5th international logopedics and phoniatrics course 'THE ARTISTIC VOICE' (invited lecture), La Voce Artistica, Ravenna, October 18, 2007.
C17. Christian T. Herbst (2007). Kehlkopfkonfigurationen beim Singen. Symposium: Stimmbildung in Knaben-, Mädchen- und gemischten Kinderchören (invited lecture), Universität der Künste Berlin, Germany, October 4, 2007.
C16. Christian T. Herbst, Jan G. Švec (2007). Is the degree of posterior glottal adduction relevant for "voix mixte" phonation?. 7th Pan European Voice Conference (PEVOC), Groningen, The Netherlands. September 1, 2007. - show abstract
In the context of a voice coaching situation, a 52 year old semi-professional baritone was diagnosed to have a limited upper range. Starting with a pitch of about C4, the phenomenon of divergent registers occurred: with increasing pitch, phonation was only possible in either loud chest voice, or falsetto phonation. Messa-di-voce exercises in the range between Bb3 and F#4 exhibited register breaks in both the crescendo and decrescendo. The full chest voice reached its pitch limit at about 365 Hertz. Even though a voice range profile revealed a dynamic range from 72 to 114 dB at pitches from C4 to F4, the region between 94 and 102 dB could hardly be used for artistic purposes at those pitches.

An a priori stroboscopic examination revealed a habitual increase of the degree of posterior glottal adduction at pitches at and above C4. It was hypothesized that a lesser degree of posterior glottal adduction could increase the artistically usable pitch and dynamic range.

In order to test this hypothesis, the baritone was asked to sing various exercises while having constant visual feedback through videostroboscopic imaging. The arytaenoids were targeted to be spread slightly apart during phonation in the upper range. The targeted acoustic quality was described as 'almost breathy'.

When phonating in the upper pitch range with a lesser degree of posterior glottal adduction, the cartillagenous portion of the vocal folds became visible. The arytaenoids changed their position, suggesting active participation of the posterior cricoarytenoid muscle (PCA). The ventricular folds were slightly more retracted, thus widening the epilaryngeal tube.
Immediately after the session with the visual feedback, the baritone was able reach pitches as high as Bb4 without audible register breaks. The SPL was within the previously missing range of 90 to 100 dB. Electroglottographic evidence revealed a decreased waveform width, signifying a decreased duration of glottal closure as opposed to phonation with a high degree of posterior glottal adduction.

The findings suggest that posterior glottal adduction is an important physiological parameter in singing. It allows achieving a specific voice quality which is perceptually and dynamically between the chest and falsetto registers and thus could be considered to correspond to that of a 'voix mixte' tone production.
C15. Christian T. Herbst, David Howard, Josef Schlömicher-Thier (2007). Using electroglottographic real-time feedback to control posterior glottal adduction during phonation. 7th Pan European Voice Conference (PEVOC), Groningen, The Netherlands. September 1, 2007. - show abstract
The goal of this pilot study was to determine whether the ability to change the degree of posterior glottal adduction during phonation can be acquired more easily with the aid of electroglottographic real-time feedback.

A 37 year old untrained female chorister was asked to participate in the experiment. The initial perceptive evaluation of her singing voice revealed extremely breathy phonation, regardless of pitch and loudness. During the experiment, phonation has been monitored simultaneously with videostroboscopy, electroglottography and audio recording. While phonating, the chorister saw the normalized electroglottographic waveform representing one glottal cycle consecutively changing over time. After an initial 'placebo' phase, the actual relevance of the EGG waveform was explained to the chorister. The assignment was to increase the width of the EGG waveform during phonation. Data was collected for sustained notes at a pitch of B3, B4 and G5 respectively.

Laryngeal imaging revealed a considerable posterior glottal chink during habitual phonation. No considerable changes of phonatory quality could be documented for the 'placebo' phase. Once the relevance of the EGG waveform has been made clear to the chorister, visual, acoustic and electroglottographic evidence suggests that the subject was able to make intentional changes to the laryngeal configuration during phonation: An increase of the EGG waveform width coincided with the increase of high frequency partials and an increase of posterior glottal adduction. For pitches B3 and B4, a full glottal closure could be achieved. At G5, a reduction of the posterior glottal chink occurred.

The findings of this study suggest that (a) the skill to control the degree of posterior glottal adduction can be acquired, and that (b) electroglottographic real-time feedback can be a crucial element in optimizing the process of skill acquisition, but only if the context and nature of the feedback is explained.
C14. Christian T. Herbst (2007). Der Knabensolist in der Oper - Akustisches Portrait eines musikalischen Hochleistungssportlers. 75. Kongress der Deutschen Gesellschaft für Sprach- und Stimmheilkunde (DGSS), April 21, 2007.
C13. Gerhard Schmidt-Gaden, Christian T. Herbst, Ernst L. Schmid (2007). Talentschmiede Knabenchor?. 19. Jahreskongress des Bundesverbandes Deutscher Gesangspädagogen (BDG), April 20, 2007.
C12. Christian T. Herbst, Josef Schlömicher-Thier (2007). Stimmbildung und Stimmstörungsprävention. Internationale Tagung "Die Stimme Heute", Zentralkrankenhaus Bozen, Abteilung HNO, January 26, 2007. presented by Christian T. Herbst.
C11. Christian T. Herbst (2006). Stimmbandschluss im klassischen Gesang. Grazer Stimmtage, Hals-, Nasen-, Ohren-Universitätsklinik Graz, Klinische Abteilung für Phoniatrie, November 25, 2006.
C10. Christian T. Herbst (2006). Physiologische Vorgänge beim Registerausgleich der Knabenstimme. 1. Internationales Symposium für Kinderstimmbildung, Freunde der Wiener Sängerknaben, November 4, 2006.
C9. Christian T. Herbst (2006). Acoustic Priciples of Voice Production - a Tutorial. 7th Voice Symposium Salzburg, Austrian Voice Institute, August 4, 2006.
C8. Josef Schlömicher-Thier, Phillip Janssen, Christian T. Herbst (2006). Phonetogram: Architecture of Speaking and Singing Voice. 3rd World Voice Conference, Istanbul, Turkey. June 22, 2006.
C7. Christian T. Herbst, Josef Schlömicher-Thier, Matthias Weikert (2006). Voice Disorders in Childhood & Management of Mutational Problems in Choirboys. 3rd World Voice Conference, Istanbul, Turkey. June 21, 2006.
C6. Christian T. Herbst, Jan G. Švec (2006). Investigation of four distinct glottal configurations in a classically trained male singer. 3rd physiology and acoustics of singing conference (PAS3-06), University of York, U.K., York, U.K.. May 11, 2006.
C5. Christian T. Herbst (2006). Untersuchung der Sängerstimme mittels Elektroglottographie (Workshop). 6. Wiener gesangswissenschaftliche Tagung, Institut Antonio Salieri, Universität für Musik und darstellende Kunst, Wien, Vienna, Austria. January 14, 2006.
C4. Christian T. Herbst (2005). The Singer's Voice: Glottal Configurations and Voice Source Properties. Seminar, School of Arts, Culture & Environment, University Edinburgh, Edinburgh, UK. October 27, 2005.
C3. Christian T. Herbst, Sten Ternström (2005). A comparison of different methods for measuring the electroglottographic contact quotient. 6th Pan European Voice Conference (PEVOC), London, UK. September 2, 2005.
C2. Christian T. Herbst (2005). Die Singstimme als physikalisch-akustisches System. Seminar, Musikum Salzburg, Salzburg, Austria. January 22, 2005.
C1. Christian T. Herbst (2004). The EGG Contact Quotient as a Means of Assessing Vocal Registration Quality in Classical Singing. Seminar, Dept. of Speech, Music and Hearing, Royal Institute of Technology, Stockholm, Sweden. August 24, 2004.
top of page
Media appearances (TV & radio)
M26. Nick Bright (2023). With Nick Bright -- live interview about Freddie Mercury. BBC Radio 5, July 10, 2023
M25. Raphael Krapscha (2023). Wissen Aktuell -- Walgeräusche, Artenschutzabkommen. Österreichischer Runkfunk Ö1, March 3, 2023 - show abstract
Wale: Geräuschproduktion ähnelt Menschen

Delfine, Orcas und Pottwale ähneln uns Menschen stärker als bisher angenommen. Ein Beispiel dafür ist die Produktion von Lauten. Forscherinnen und Forscher haben in einer aktuellen Studie herausgefunden, dass die Tiere ihre Klick- und Pfeiflaute auf die gleiche Weise produzieren, wie wir Menschen unsere Stimme.

Gestaltung: Raphael Krapscha

Mit: Christian Herbst, Stimmforscher
M24. Bea Sommersguter (2022). Moment am Sonntag -- Meine Stimme. Der akustische Fingerabdruck. Österreichischer Runkfunk Ö1, April 10, 2022
M23. Paul Lohberger (2019). Radiokolleg "Das Debüt von Led Zeppelin 1969". Österreichischer Runkfunk Ö1, March 11, 2019
M22. Matheo Duarte Sierra (2016). Christian T. Herbst, el cientifico que hizo un analisis de la voz de Freddy Mercury. La hora del regreso, Radio W Colombia, October 18, 2016
M21. Paul Lohberger (2016). Radiokolleg "Paul Simon - Der Grandseigneur der Popmusik". Österreichischer Runkfunk Ö1, October 11, 2016
M20. Michaela Graichen (2016). The science of Freddie Mercury's voice. BBC Newshour, April 26, 2016
M19. Gabe O'Connor (2016). Why Freddie Mercury's voice was so great - as explained by science. NPR, April 25, 2016
M18. Jim Drury (2014). Vienna study gives voice to elephant rumblings. Reuters TV, February 18, 2014
M17. Rainer Rosenberg (2013). Von Tag zu Tag: Vom Gesang zur Stimmforschung. Die Abenteuer des Biophysikers Christian Herbst (live broadcast, 30 min.). Österreichischer Runkfunk Ö1, July 25, 2013
M16. Uli Pförtner (2013). Operation Dolittle - mit Tierstimmenforschern unterwegs. ARTE, April 18, 2013
M15. Michael de Werd (2012). Olifantentaal - De Ochtend. VRT - Radio 1, August 26, 2012
M14. Christine Ricken (2012). Große Tiere, tiefe Töne. Neue Erkenntnisse über die Sprache der Elefanten. SWR2 Impuls, August 21, 2012
M13. Josef P. Glanz (2012). ZIB Flash. ORF, August 7, 2012
M12. Josef P. Glanz (2012). Salzburg Heute - Elefantenforscher. ORF, August 7, 2012
M11. Josef P. Glanz (2012). Heute in Österreich - Elefantensprache entschlüsselt. ORF, August 7, 2012
M10. Miriam Stumpfe (2012). Brummende Elefanten - Was die Dickhäuter über die Entstehung der Stimme verraten. BR5 - Aus Wissenschaft und Technik, August 5, 2012
M9. Kerry Klein (2012). Science Podcast: How Elephants Vocalize. AAAS, August 3, 2012
M8. Arndt Reuning (2012). Stimmbildung bei Elefanten - Interview mit Christian Herbst, Uni Wien. Deutschlandfunk - Forschung Aktuell, August 3, 2012
M7. Martina Preiner (2012). Elefanten sprechen wie Menschen - Laute entstehen durch Luftstrom durch Stimmlippen. WDR5 - Leonardo, August 3, 2012
M6. Paul Lohberger (2011). Radiokolleg - Die Stimme als Instrument. Österreichischer Runkfunk Ö1, September 5, 2011
M5. Paul Lohberger (2011). Radiodoktor - Das Ö1 Gesundheitsmagazin: Stimmbildung - Therapie und Körpererfahrung. Österreichischer Runkfunk Ö1, March 23, 2011
M4. Katrin Müller-Höcker (2010). Vibrierende Muskeln, klingender Atem - Wie funktioniert das Wunderwerk Stimme?. Bayern 2, November 28, 2010
M3. Paul Lohberger (2010). Radiokolleg - Queen. Österreichischer Runkfunk Ö1, May 4, 2010
M2. Paul Lohberger (2009). Radiokolleg - Die Stimme. Österreichischer Runkfunk Ö1, April 29, 2009
M1. Bayerischer Rundfunk (2008). Engelsgleich - Über die Physik der Knabenstimme. Bayern 4 Klassik, April 30, 2008
top of page