Håkan Lundström
Search for other papers by Håkan Lundström in
Current site
Google Scholar
Jan-Olof Svantesson
Search for other papers by Jan-Olof Svantesson in
Current site
Google Scholar
Performance templates
Method, results, and implications

This chapter contains a discussion of the methods used in the book. The results are summarized and their implications are discussed in relation to other research in ethnomusicology and linguistics, as well as to emerging interdisciplinary results. It turned out to be possible to recognize forms of human communication that can be described as vocal expressions in the borderland between song and speech. In addition, it was seen to be feasible to design a method for studying them that leads to new knowledge of this borderland.

Our research has focused on vocal expressions in the area where speaking and singing overlap. Our ultimate interest has been neither the description of performance practices nor their relation to possible ‘culture areas’, no matter how interesting these things may be, but how those principles that make vocal expressions possible are constructed. Therefore, the cultures under study have been chosen not for reasons of comparison, but for their suitability in studying the borderland between song and speech, and on the basis of the participating researchers’ special interests. We have found that in all these cultures – cultures which, with the exception of Alaska, are concentrated in East and South-East Asia – there are vocal expressions that can be understood as being made from performance templates. These practices have some basic functions in common, functions that make the following features possible:

  • variation, re-creation, and creation of vocal expressions;
  • specific vocal expressions for situations involving social or communal interaction;
  • specific vocal expressions for situations involving the spiritual world.

This is demonstrated by the Kammu material, which contains a number of genres of vocal expressions, each related to a specific spiritual context, a specific social context, a specific time and/or a specific geographical context such as a village, fields, or a forest [3–10].1 The very general nature of these functions indicates that instantaneous re-creation of vocal expressions is as fundamental to human nature as speech, and can indeed be understood as a different mode of speech.

Obviously, different ways of using vocal expressions for these basic needs of human society have evolved in their separate cultural contexts. The vocal expressions appear to be the result of parallel developments in language and music, and this accounts for similarities as well as differences in the material under study here. The performance templates serve to organize the vocal expression of words in accordance with the principles of several parameters labelled melody, rhythm, form, phrasing, initial/final formulae, word variations, and lexical tones. It has been shown that the contexts studied all have principles for organizing these and other parameters; but each context does it in its own unique way – it is, for instance, easy to tell a Kammu vocal expression from an Athabascan one merely by hearing, while telling it apart from an Akha vocal expression takes a little more experience of the styles. Seediq canonic imitation is unique in the material studied here, but those vocal expressions can still be explained by the use of performance templates [19–20].

We have focused on certain musical and linguistic parameters and have not considered others – for instance gesture, movement, body posture, or musical scales – and we have related this to social or cultural context only in certain cases. These areas could generate additional relevant parameters for the description of performance templates. With these delimitations, we have found that the performance templates have a common feature in that they combine and integrate musical and linguistic parameters in certain ways.

The descriptions of performance templates may appear to be similar to descriptions of musical style in general. The difference is that the relevant parameters are central to how musical and linguistic parameters coexist and are integrated. The template is more than a description in that it aims at capturing what the performer does when applying these principles in performance, which in some cases is done more or less instinctively, in others through strategic planning. This is, in turn, different from a ‘deep structure’ of the kind used in generative grammar in order to explain an underlying general principle for performances.

Performance template as method

Approaching vocal expressions from the perspective of performance templates has certain advantages. The alternative musicological way of studying vocal expressions would be to transcribe variants of expressions and use comparative analysis to find one or more lowest common denominators: a basic or ‘original’ melody. Using performance templates works in the opposite way: one performance – or a few – is/are analysed. If a performance template can be extracted, it will be tested on a few similar performances; and if it meets the test requirements, it has been proven. Since a performance template is basically a set of principles, it can be further refined when and if further study demands more details in some respects. In reality, then, the researcher starts out with the hypothesis that a performance can be explained by a performance template, which is then defined, tested, and – if necessary – further refined.

In many cases when ordinary methods of analysis yield no special results, the use of a performance template as a method produces new knowledge. Performances that may otherwise be disregarded as ‘recitations’, or nothing but ‘heightened speech’, may provide a basis for new knowledge and an explanation of factors inherent in vocal expressions and in the relationship between music and language. What is gained is that (1) the knowledge produced is of a generic kind that says something about how the vocal expression is constructed and how the performer handles the principles of the template in performance, and that (2) qualitative knowledge can be gained from a large amount of material within a comparatively limited time.

For instance, detailed transcription and analysis of some 200 or more vocal expressions in the repertoire of one Kammu singer is time consuming; and the information gained about the expressions is, in principle, limited to the description of melodic and rhythmic movement and comparisons of variants. Defining relevant performance templates from a handful of performances or segments of long performances does not take much time. The information gained gives a fairly deep insight into what the performer is actually doing and affords an understanding of the reasons for some variations between performances that might otherwise have been left unexplained. Once this is done, a researcher could, if necessary, continue studying details of the performance, individual variation, intonation, tonality, and so on.

The most time-consuming task for us has been to explore if and how performance templates can be applied to the Kammu and Athabascan material. In the case studies of the Akha, Seediq, and Ryukyuan/Japanese material, in which the performance template as a method is merely tested, it was easy to see that the method would be useful and could produce knowledge.

This method makes certain demands of the source material. Apart from audio or audio-visual documentation, the words of each performance should be transcribed with the systematic use of symbols employed by linguists, and preferably also glossed word-for-word. In the case of tone languages, the lexical tones need to be included in the transcriptions.2 Though a specialist in a particular language may do this long after the recordings have been made, ideally it should be done when the material is collected in the course of fieldwork, while the researcher has the possibility of discussing words and their meanings and other aspects of the performance with the performer or other persons who know the language well.

The performance template as an approach appears to be particularly useful for analysing and describing large samples of vocal expressions and extremely long performances. The largest of the samples used in this study are those of Kammu and Akha, but on-going digitization shows that there are many similar samples in archives of various kinds, samples that still await analysis.3 The method is probably most useful for genres that are improvisatory or involve much variation, while being rather less useful for more determined musicopoetical forms for which other methodologies already exist. However, it did prove relevant to the study of those Athabascan vocal expressions that are composed beforehand and are rather settled in form [13–18]. No sharp borders were found between the processes of re-creation carried out in the performance situation and composition prepared beforehand. The differences between the two are perhaps better defined by the individual’s aim for the activity. A Kammu performer’s aim would be to re-create a vocal expression and vary it so that the result would be more or less different every time. An Athabascan performer, who is also a composer, would aim at creating the vocal expression and refine it until it has acquired a definite form, and would then expect it to be performed more or less in the same way every time. The use of a performance template in the analysis yields insights into the creative stage in both cases.

The interdisciplinary combination of linguistics and ethnomusicology with the concept of the performance template has proved useful for continued studies of samples of vocal expressions in the borderland between song and speech which – as for instance the Akha material used here – include all the necessary information. This paves the way for research on recent documentation, as well as on many as yet unstudied samples stored away in archives that could increase our knowledge of the relationship between music and language, and of the role of vocal expressions in human life.

Parameters in performance templates

This summary and discussion is organized in accordance with the parameters used in the analyses and in the same order: melody, rhythm, form, phrasing, initial/final formulae, word variations, and lexical tones. The parameters have been divided into sub-sections. While the previous chapters have focused on each separate cultural context, this section is based on a comparative perspective, applied in relation to existing research.


The melody parameter in our material spans from intonation and melody with a tonal centre to monotone (Table 9). In speech melody, the intonation of ordinary speech dominates the melodic movement, and pitches are not very fixed. In melody with a tonal centre pitches are more fixed, and the basic melodic shape is dominated by the relationship between the various pitches and the tonal centre. Since descending motion is dominant, the tonal centre is also often the lowest pitch. In the monotones, the speech intonation is flattened and the fixed pitch – or pitches, where lexical tones are realized – functions as a tonal centre. This pattern occurs in the Kammu prayers [2], which is not unexpected, since ‘leveling speech prosody on a flat, monotone, contour … is usually associated with transcendental speech as in prayers and magic spells’.4 It is also found in the hrlɨ̀ɨ Kammu genre [4] and in the performance of Japanese waka [25].

In the Athabascan material, the overall contour of the Caribou song [12] – starting higher, finishing lower, and with the negative emphasized – closely matches the contour of a Minto intonational unit. Dratakh ch’elik, performed at memorial feasts (potlatches), also display prose-like qualities in wordy sections, with the melody following tone and intonation to a greater or lesser extent. There are examples, though, of one and the same word in different musical settings, so even though the music and prosody interact, melodic shape is not only driven by word prosody, such as lexical tone or morphological stress. Ch’edzes ch’elik are highly rhythmic and the speech tone and intonation are not expressed directly, but melodic patterns dominate. In Seediq, speech rhythm and intonation seem to play important roles for the rhythmic and tonal characteristics of the corresponding musical motifs. Exclamations [19 section A] and questions [19 section B] parallel spoken language rather closely.

Speech melody
  • Closely follows pitch contour of speech [11–12].
  • Melodic movement with relatively fixed pitches and no tonal centre [1–2]
  • Musical phrase pairs: the second is lower than the first, corresponding to downstep [1, 26].
  • Speech intonation is present in a rising start and a falling ending [26].
Melody with a tonal centre
  • Short melodic phrase(s) [3, 7, 10, 19–20].
  • Descending [5–6, 8–9, 12–18, 26].
  • Initially rising, then descending [14, 21–24].
  • Undulating with much downward motion [21–24].
  • Successive lowering of pitches within each prosodic group [5–6].
  • Level contour without speech intonation but with a tonal centre [4, 25].
  • Two pitch levels reflecting lexical tones but without speech intonation [4].

The waka performances reveal a deviation from Japanese speech intonation. The melody of waka performance has a high-level pitch in the same pitch register throughout. Ryūka performance, however, exhibits some similarities to the intonation of the Ryukyuan language: 1) initial pitch rise, 2) final pitch fall for the two sections, as well as a lower pitch register in the second section, 3) the last intonation unit becomes lower in pitch register. Though these three characteristics are equally present in the intonation of Japanese, this is not reflected in the waka performance. In the ryūka performance, a rapid reduction of amplitude occurs, in combination with a lowering of pitch in the final section, which is also a characteristic of Ryukyuan intonation.

In Kammu, lexical tones restrict the use of intonation, which is otherwise the same in both tonal and non-tonal dialects of spoken Kammu. In the case of some melody-centred genres, we found that the relationship between lexical tones and melodic pitches is very similar to that in speech. In both cases, melody/intonation is dominant until it conflicts with lexical tones. The separate sections of the krùu spells are marked by a higher and louder ending [10], much like the narrative style [1] and the Ɔ̀ɔc [3], so the endings of the melodic contour are dominated by the intonation. A thematic episode is marked by a High boundary tone that influences the realization of a final Low lexical tone by raising it. The same pattern is found in speech: intonational marking of boundaries by high-phrase boundary tones influences the realization of lexical tones.5

When all the studied vocal expressions are taken into account, it appears relevant to consider those vocal expressions that permit the incorporation of speech intonation as made up of music-centred and language-centred parts. The role of prosody in the relationship between music and language – or, more precisely, singing and speaking – is important and complex. Ivan Chow and Steven Brown have presented a method for using musical notation in order to obtain a more detailed understanding of speech intonation.6 Our approach, in which we combine musicological and linguistic methods in the study of vocal expressions as totalities with both musical and linguistic ingredients, was also found useful in the study of genres that are close to speech [1–2]. This is partly because certain aspects in the performances, such as the degree of regularity in pitches and rhythm, become more obvious in notation, and partly because the approach supplies a basis for comparison with other genres.


Rhythm is recognized in both language and music, and in vocal expressions the two interact in various ways. Aniruddh Patel formulated a definition of rhythm that was intended to cover both language and music contexts:

[rhythm] denotes periodicity, in other words, a pattern repeating regularly in time … Although all periodic patterns are rhythmic, not all rhythmic patterns are periodic. That is, periodicity is but one type of rhythmic organization … I will define rhythm as the systematic patterning of sound in terms of timing, accent, and grouping.7

The presence of a regular beat is often referred to in characterizing music as different from spoken language. In a discussion about what Alan Lomax calls parlando rubato, ‘in which no regularly recurring beat can be distinguished’, he characterized it as ‘often close to speech in general effect; accents and rhythmic patterns are grouped in meaningful ways, but without reference to a regular division of time into steady beats’.8 Sometimes the concepts beat and pulse are understood as synonyms.9 Here, pulse refers to the slower rhythm made up of units of beats that correspond to the musical concept of ‘measures’.

In each of the cultural contexts that our sample stems from, there are examples of a strong presence of speech rhythm in the vocal expressions (Table 10). The majority of the vocal expressions have a regular steady rhythm or beat that is closer to music than to speech. Some are syllabic and may be dominated by one tone duration. Iambic rhythms are comparatively common. Even though some may be said to have an even metre, generally 4-beat units, there is no case in which the wider metrical level is regular, consisting of sections of, for instance, 4, 8, or 12 units or measures. Instead, sections are of varying length, even if there is an even metre. This is, in its turn, closely related to phrasing (see below).

Speech rhythm
  • Rhythmic movement in rather uniform repeated patterns [1–2, 26].
  • Rhythmic movement, basically created by the lengthening of final syllables in prosodic phrases [2].
  • Closely adheres to rhythm of speech [11–12].
  • Words are grouped in four-syllable units corresponding to four pulse beats [15–16].
  • Isorhythmic motif corresponds to a morphological stem [14].
  • Certain grammatical forms can only be in certain positions within a rhythm pair, which is achieved by the use of e.g. prefixes, suffixes, and filler syllables [21–24].
  • Speech rhythm plays a role for the rhythmic realization [19–20].
Steady rhythm/beat
  • Regular pulse [3–10, 12–25].
  • [Mainly] syllabic [3, 4, 10, 12–14, 18].
  • One tone duration dominates [4–6, 12, 15, 16, 18].
  • Iambic movement dominates [7–9, 13, 21–24].
  • Dotted rhythms occur [17].
  • Isorhythmic organization [12–18].
  • Prosodic phrases are performed to variants of: - | – - – - –– where the first tone in the bar and the last long one are stressed [19–20].
  • Even metre, generally 4-beat units [25].


There is a close relationship between poetical form and musical form (Table 11). Several vocal expressions from Kammu, Seediq, and Akha are of the litany type, by which is meant a continuous repetition of phrases in groups of two or more that are linked by parallelism and/or rhymes and are performed to one or two musical phrases that are also repeated throughout.10 Two of these are organized as call-and-response between two or more performers [1, 20].

Binary form, described as A–B, is present in a great many cases, particularly in the Athabascan material [11–18]. The pairing principle is a fundamental formal concept that can be varied and developed in many ways, also as extended variants. Many of the Athabascan vocal expressions are strophic, and stanzas are often built on the A–B pattern. In Kammu tə́əm, one may speak of two separate binary forms: a linguistic poetical form spanning over two stanzas and a musical form spanning over one stanza [9]. There are also cases in which the linguistic binary form is not paralleled in the music, which then consists of a short phrase repeated with minor variations. This occurs in some cases where Kammu trnə̀əm poems are performed in other genres than tə́əm [4–7]. The pairing principle will be further discussed in connection with phrasing.

In a study of the music of ethnic groups in Taiwan, particularly Ami and Puyuma, I-to Loh found litany in its different variations to be a common form of performance. He relates form to function and manner of performance:

A Shaman’s ritual conducted with assistants is performed first in responsorial manner; the leader sings phrase a and the assistants sing phrase b. After the second statement, the music becomes antiphonal between the two groups, each repeating its own phrase with little or no variation … The Shaman’s ritualistic action is repeated over and over again, and the words of exorcism, cursing or healing may last a long time. After repetitive singing of the same formula, they may go into a trance in order to attain maximum magic power for executing the rite. This may have accounted for this particular form.11

  • Consecutive prosodic phrase pairs connected by repeated word(s)/rhymes [1–3, 10].
  • A number of consecutive prosodic phrases form a thematic episode [1, 10].
  • Divided into sections of various lengths [10, 19–24].
  • A performer repeats each line using the same rhythmic/melodic motif [19–20].
  • Call (higher pitch level) and response (lower pitch level) [1].
  • Performer 1 starts with one phrase, while performer 2 repeats the same words and melody with a time delay [19–20].
  • One prosodic phrase performed twice [12].
  • A–B [15–18, 21–26].
  • A–A–B–B′ [17, 18].
  • A–B–A–B′–A [13].
  • A–B–C–A′–B′–C [14].
  • Strophic [4–9, 16].
  • A stanza is held together by sound repetition, parallelism, and rhymes [19–20].
  • The initial part of a stanza contains vocables and key words with lexical meaning [13–15, 17–18].

Binary organization also occurs abundantly in most of our material in word-pairs, rhyme-pairs, paired phrases, etc. At the micro-level, this feature is fundamental for the construction of both prosodic and melodic phrases.


In the vocal expressions studied here, language and music are closely integrated and interdependent with regard to phrasing. In the discussion, there has been reason to look at prosodic phrasing and musical phrasing as two aspects of the totality (Table 12). Generally, though, the performances are dominated by verbal phrasing, which, in principle, means that when prosodic phrases are prolonged, the musical phrases are prolonged as well. There is also the technique of squeezing more syllables into a musical phrase by means of placing two or more syllables in the space designated for one syllable, a technique called contraction. In Kammu tə́əm, this results in increased prominence of musical phrasing [9]. Tə́əm performances may thus oscillate between primarily verbal phrasing and primarily musical. The Athabascan material [11–18] is an exception, partly owing to the dominance of vocables and partly because of the isorhythmic organization and the basic four-syllable pattern (compare Table 10).

The length of a prosodic phrase is normally marked by a distinct ending which may be intonational by using boundary pitches [1–2] or prolongation. The latter is commonly the case in musical phrases in each of the cultures studied. Since musical phrases depend on the prosodic phrases for their length, the two are normally aligned, so that beginnings and endings coincide. There are some exceptions, however, particularly where musical phrases are extended to include two or more prosodic phrases, which sometimes occurs in the Kammu material when speed is increased [9–10].

Without exception, the vocal expressions that were analysed are based on parallel pairs. In many cases, the parallel phrases create A–B forms that coincide at the musical and linguistic levels (compare Table 11). Parallel pairs occur on every level:

  • word-pairs, sometimes rhyme-pairs (Kammu, Akha, Seediq);
  • linguistic phrase-pairs: anaphors that repeat the previous phrase (Kammu), mirrored phrases (Kammu, Seediq);
  • stanza-pairs (actually linguistic phrase-pairs on a more extensive level, Kammu);
  • musical phrase-pairs: musical phrases repeated, often so that the second musical phrase is lower than the first (Akha, Athabascan, Seediq, Kammu, Ryukyuan/Japanese);
  • musical paired phrasing: the repetition of a musical phrase and the corresponding words (Athabascan, Seediq).

A similar practice is described as an organizing principle in Antoinet Schimmelpenninck’s study of the shan’ge traditions in southern Jiangsu:

One such organizing principle is antithesis, the combination of parallel or opposed images … Many dialogue songs in the Wu area are antithetical, with one phrase (or a pair of lines or a stanza) contrasting with the next.12

Prosodic phrasing
  • Successive right-edged phrase boundary tones, with the highest boundary tone coinciding with the end of each episode [1].
  • A high boundary tone coincides with the end of a phrase [3].
  • Small syntactic groups end on a high boundary tone, with a lengthening of the final syllable [2].
  • Prosodic phrases are organized as phrase pairs as repetition, antithetically or as question-and-answer [1–3, 10, 21–24].
  • Prosodic phrases are based on iambic rhythm pairs with variations [21–24].
  • Prosodic phrases may be prolonged by words or additional phrases [8–9].
  • Creakiness occurs at the end of a prosodic group and might signal a phrase boundary [5–6].
  • Verbal phrasing dominates (a 7-syllable line is two tone durations longer than a 5-syllable line, etc.) [3–10, 12, 21–25].
Musical phrasing
  • Prosodic and musical phrases are generally aligned [3–10, 19–25].
  • The last word of a phrase is (normally) prolonged [2, 5–7, 15, 18–25].
  • Musical phrases end with tone repetitions: phrase-final vocable syllables are lengthened over several beats [14–16].
  • Musical phrases end on shortened morae and lowered amplitude [26].
  • Musical phrases are organized in pairs, where the first phrase is generally slow and long while the second is faster and shorter [21–24].
  • Prosodic phrases are divided into prosodic groups marked by a higher first lexical tone [5].
  • In verbal metre, musical phrases are either prolonged or contracted [9–10, 21–25].

There is a rich literature on parallel pairs, from word-pairs to parallel phrases or stanzas.13 Word-pairs are common in the vocal form lam which occurs in Laos and north-eastern Thailand.14 Nguyen Van Huyen saw word-pairs as a fundamental organizational principle in the poetry of Vietnamese alternating songs.15 Emeneau built his analysis of Toda song poetry, India, on ‘three-syllable song-units from which are built the longer syntactic structures and the paired parallel units and sentences’, especially by the construction of parallel pairs.16 From micro to macro level, binary form is sufficiently widespread to be considered as universal in vocal expressions in the borderland between song and speech.

The relationship between phrasing and metre is complex. By analysing a large number of performances, it is, for instance, possible to create basic or underlying forms of the trnə̀əm poetry used for Kammu tə́əm performance. Poetic lines and stanzas can be defined by phrasing and phrase endings. Lines will usually have 5 or 7 syllables [4: Example 27]. In performance, though, they may be combined with new words, and often two or more trnə̀əm are superimposed on one another.17 From our study of the Kammu performance templates, it is evident that in most cases a 7-syllable line is simply longer than a 5-syllable line [4, 8]. In more complex performances, there are lines with slots for 5, 7, 9, and 11 syllables to permit the prolongation of lines [8: Example 46]. A similar technique is used in the Seediq performances, but here each section has its individual number of syllables [20: Example 99]. Prolongation of a similar kind occurs in Chinese practice as well:

How can a singer produce such a line without crippling the basic structure of his melody? The answer is that, during the course of this prolonged phrase, he stays within the realm of one or two intervals of the melody. He freely repeats these intervals until he reaches the end of the line, and then continues with the rest of the melody.18

When even more words are performed in a Kammu tə́əm, some slots are divided to contain the extra syllables.19 This kind of contraction also occurs in the Akha shaman performance [24: Examples 112113] and in the Japanese waka [25: Example 118]. Only in the case of waka is there a clear adjustment of poetical metre to musical metre [25: Example 118]. In other cases, it seems more relevant to refer to a word-based metre in which poetical and musical metre are treated as one. This also happens in the North Athabascan dratakh ch’elik, while musical phrasing dominates the ch’edzes ch’elik that have very few words with lexical meaning.

There are many examples of similar techniques, especially in East and South-East Asia. Thus ‘added phrases that result in the lengthening of the melodic line are common in the narrative songs in northern China, such as the Peking drumsong and Shandong drumsong’.20 As regards Beijing opera, Elizabeth Wichmann reports a similar practice:

The insertion of padding written-characters increases the number of written-characters in a seven-written-character line to as many as sixteen … they extend the line beyond its standard length to clarify its meaning.21

From these descriptions and from the musical notations, it is obvious that both prolongation and contraction occur in Beijing opera. In studies of the music of Chinese music drama, notations are generally in regular 2- or 4-beat measures, so apparently alterations of syllables or words in performance must relate to a musical metre. Bell Yung identifies seven speech types in Cantonese opera.22 Considering the many levels of speech (or ‘song’) and the line lengthening practice, some of these forms of oral delivery are probably suitable for analysis by means of performance templates. Such analysis might reveal more parallels with the vocal expressions in our research.

Poetical and musical metre on the one hand and word-based phrasing on the other are not total opposites; but it seems reasonable to conclude that re-creation based on orally learnt performance templates permits a flexible relation to metre, in many cases.

Initial/final formulae

Many vocal expressions have beginnings or endings that stand out from the main part of the performance (Table 13). They often consist of a few syllables that are performed at a pitch or duration that contrasts with the main part of the performance.

Certain initial formulae are relatively long and start with a long vocative at a high pitch that descends while the first syllables of the meaningful words begin. Such a beginning marks the start of a prayer [2] and a shaman’s performance [10, 21–24], calling attention to the vocal expression and signalling a genre. This, too, is the case in social forms of vocal communication such as the Kammu tə́əm, which also has an initial formula for later phrases in a stanza and a final formula, as well as being highly distinct as regards the genre [9]. In most cases, a final formula signals the end of a phrase or a section of a performance.

Initial formulae
  • Initial vocative phrase (with lengthened syllables) starting at a high pitch and descending [2, 9, 21–24].
  • An initial formula at the very beginning of the performance [3].
  • The first syllable(s) of a phrase is/are performed low [4, 6].
  • Lines start with a glissando from a lower pitch on the first syllable [25–26].
Final formulae
  • In a final formula, the pitches level out at the tonic [21–24].
  • The final syllable of the phrase elongated to 3–4 pulse beats [12].
  • Ends with a tone repetition, mainly on the tonal centre [14–18].
  • The penultimate syllable of a line is lengthened [3–4].
  • The penultimate syllable of a phrase and the very last syllable of a stanza are longer and form a final formula [4].
  • A short final formula in each phrase [3].
Initial and final formulae
  • Both initial and final formulae [8–10].

Initial and final formulae may be purely musical, using pitches or a pattern that deviates from the main part of the performance while using the words that belong to the prosodic phrase or poetic line in question [4]. In many cases, the formulae combine musical and linguistic factors that make them stand out as formulae. One characteristic type consists of an initial high-pitched həəəy (or similar) followed by a descending movement on the part of the first prosodic phrase [2, 9, 21–25]. This is common in vocal expressions among South-East Asian ethnic groups. Another variant that combines musical and linguistic factors is initial formulae in the form of shortened musical phrases with words that may be non-lexical or lexical [3: Example 23; 10: Example 62], or a full phrase [10: Example 63]. In such cases, the vocal expressions tend to be aimed at a listener or listeners: human as in the case of social feasting, and both human and spiritual in the case of prayers and spells. In his description of musical litany form, Alan Lomax notes that a simple litany consists of ‘[o]ne or two phrases repeated over and over again in the same order with little or no variation: A A A A or AB AB AB AB … even when such a form is preceded by one or two phrases of introduction’.23

Hans Oesch characterized final formulae in the Yao tradition, Thailand, as ‘musical culminations where music often dominates’.24 Final formulae normally mark the end of musical phrases and coincide with the end of prosodic phrases – sometimes by a lengthening of the last syllable [10], as is often the case in normal speech, sometimes in a manner different from speech, for example by lengthening the penultimate syllable [3], or by a specific musical motif [9].

In the Athabascan tradition, most vocal expressions start with a poignant rhythmic/musical motif, but this functions as a first section of a performance rather than as an initial formula. Endings of phrases or stanzas, however, are very prolonged, with characteristic tone repetitions, often using vocables [13–18]. These are particularly important to the musical form.

Word variations

There are differences in the use and pronunciation of words in many of the vocal expressions studied, in comparison with everyday speech (Table 14). They also include words that are not used in speech, or that acquire a special organizational function in the performances.

Vocables are words without lexical meaning. They occur in several examples in our sample, but are particularly dominant in the North Athabascan performances. This is a common phenomenon in Native American vocal expressions, where performances may consist purely of vocables. Vocables have many functions, including a structural dimension.25 This structural dimension becomes obvious when looking at the vocal genres in performance templates: vocables are indispensable parts of the performance templates, and they play a vital structural role in the composition process.

In Beijing opera, vocables are referred to as ‘empty words’.26 Vocables are common among all ethnic groups in Taiwan, including Seediq, where they are often fixed in each vocal expression. There are also vocal expressions made up entirely of vocables, much like the North Athabascan genres.27

Some vocal expressions have lexical meaning but are not generally used in daily speech. They are pronounced differently or appear not to belong to the poetry that is being performed. These are generally known as ‘song-words’. Among the Dyirbal, a population of hunter-gatherers in Australia, about 300 song-words were identified, explained as ‘words used only in poetic diction and never in ordinary communication … they made up almost one third of the occurring words’.28 We found that song-words, just like non-lexical words, may be constituent factors in performance templates (Athabascan; Kammu: [9: Example 48]).

In many cases, no distinction is made between long and short vowels, even though vowel length changes the meaning of a word. Reduplication of syllables is common in Kammu vocal expressions. Hans Oesch has published a transcription of a Yao song, from Thailand, with a phenomenon similar to reduplication. In this case, the syllable a is pronounced with the final consonant of a word, regardless of which vowel preceded it. There are other syllables without meaning, too.29

  • Vocables [13–20].
  • The vocables are restricted in vowel quality, comprising only [ei] and [o] [15].
  • A word without lexical meaning at the end of each line: ís [3].
  • Song-words with lexical meaning [8–9, 18].
  • Vocative song-words (Həəəy, Həə, Eee) [9, 23: Example 110].
  • An auxiliary word sáh, ‘I say’, at the start of phrases, performed very short [4, 9].
  • Lexical words are pronounced in ways that differ from ordinary speech [14, 18].
Vowel length
  • Same tone duration for long and short vowels, minor and major syllables [3–9].
  • The moraic nasal N is treated as one mora [25–26].
  • The two light syllables (be- and kw-) and the lexical stems (-ni, -lá) in the text have the duration of one beat each [12].
  • Coda prolongation is frequent [5–7, 9].
Syllabic reduplication
  • Syllabic reduplication is frequent [5–9].
Schwa vowels
  • Schwa vowels have the same duration as all other vowels that are performed as short [3–7, 10, 19–20].
  • All vowels, including schwa (ə), are of approximately the same duration. In addition, schwa are long when falling on the second long tone of an iambic unit [8–9].
  • Schwa vowels are not audible [1].
  • Schwa vowels are very short and hardly audible [2].
  • When a word is not vowel-final, vowel syllables are added [15].

Syllables whose vowels are more or less inaudible in speech, referred to as schwa or epenthetic vowels, are often of the same duration as other vowels in performance templates, pronounced ə in Kammu or u in Seediq.30 This also occurs elsewhere, as demonstrated by François Dell in connection with Tashlhiyt Berber, which he calls ‘a language that allows vowel-less syllables’:

[S]chwas do not play any role in the phonology of the language (e.g. in syllable structure) nor in versification. In text-to-tune alignment, however, schwas acts as carriers for pitch, exactly like bona fide vowels[.]31

In two of the Kammu genres [8, 9], however, schwa (ə) will be performed as long when it is reduplicated or when it falls on the second long tone of an iambic unit.

John D. Smith found three versions in the Rajasthani Epic of Pâbûjî: a nuclear or unembellished underlying text, a sung text, and a declaimed spoken text. Particularly at the beginning of lines and before cadences in the sung version, ‘particles, vocatives, pronouns, and similar redundant sentence-fillers, together with repeated key words’ were added. He concluded that these additional embellishments were made in order to fit the poetry to the demands of different melodies, thereby obscuring the metre.32

Words that appear to be ‘added’ or ‘extra’ are often referred to as padding syllables that have the function of coordinating poetical and musical metre. This is the case in the Akha shaman performance [21–24], where the syllable lɔ̀ is used to create a rhyme-pair of a monosyllabic word and where prefixes or suffixes are used to fill ‘empty’ spaces in pairs where a syllable must have a certain position within the pair. Similarly, in the Athabascan material, words are adapted to fit 4-word units [15–16]. Syllable reduplication in certain iambic Kammu vocal genres can also be partly explained along similar lines [5–9].

This is not the only function of such words, however. In relation to Cantonese opera, Bell Yung defines six types of padding syllables used in performances:

  1. added phrase;
  2. phrase leader syllable: syllables moved from a stressed beat to an upbeat;
  3. multiplets: squeezing several syllables into a slot (the contraction mentioned above);
  4. interlude filler: words are sung to an instrumental interlude;
  5. tail syllables: grammatical particles added to the end of a line;
  6. nonsense syllables: vocables added after a regular syllable.33

Evidently, padding syllables can be many things, and they are common in performances in which the performer re-creates the vocal expression. The words həəəy, approximately ‘hey’, which is used at the beginning of a Kammu tə́əm performance, and kàay sáh, approximately ‘I say’, used at the end of the last line of a stanza, may appear to be rather arbitrary fillers. They coincide with the initial and final formulae of tə́əm. Like the vocables in North Athabascan vocal expressions, they are indispensable for the structure of the performance and are thus integrated into the performance template. Apart from this, they are also important genre markers. They signal that the performance is of the tə́əm variety, and they signal that the performer comes from the Yùan dialect area in northern Laos. Other Kammu dialects in the area around Yùan use different words to start and finish lines in their performance templates, words that are typical for each dialect.34 Similarly, concerning shan’ge in southern Jiangsu, China, Antoinet Schimmelpenninck noted: ‘Some non-semantic syllables are apparently associated with specific songs or specific song regions.’35 Analysing performances from the perspective of a performance template has made it possible to understand meanings of words or syllables that have sometimes been seen as arbitrary ornamentation.

Lexical tones

Tonal languages add the factor of lexical tones (Table 15). In the Kammu language, with two lexical tones, lexical tones are realized in all the different vocal genres performed by the same person [1–10], though there are genres in which the lexical tones are not systematically produced in the performance [8], or where the melodic formula is rather fixed and limits the possibility of realizing lexical tones [3]. Tones are realized on the original syllable (near the vowel onset or throughout the vowel, depending on the genre), while reduplicants ignore tones. In some genres there is tonal movement in each syllable, which implies a different technique for realizing tones, in comparison with genres that have a level tone on each syllable. It also shows that a performer can shift between genres that require lexical tones to be treated differently in performance. The most consistent pattern proved to be the combination of melody-centred parts, mainly at the beginning and end of phrases where melodic movement dominates over lexical tones, and in tone-centred parts where lexical tones dominate over melodic movement.36 The Akha shaman recitation with three lexical tones evinced a similar order.37

Exact pitch
  • High and low pitch levels correspond to lexical tones [4].
  • Lexical tones are realized at three pitch levels, and a neutral pitch or medium pitch is often used both for High and Low lexical tones [8–9].
  • High lexical tones are generally performed above the tonic, Low lexical tones are performed significantly lower, and Mid lexical tones tend to be between the High and the Low in cases where the lexical tone is realized [21–24].
  • Low lexical tones are generally 1 (but may be higher) or falling (3⇒1) [10].
  • The realization of tones is relative: a High tone is higher than a preceding Low tone, and a Low tone is lower than a preceding High tone [6].
  • Owing to melodic declination, a High tone can start lower than a preceding Low tone in the same prosodic group [5].
  • The high tone of the negative stem in the word bekwlá ‘there are none’ is reflected by a high pitch [12].
Tonal movement
  • Lexical tones are realized at the onset of base syllables [7–9].
  • The melodic movement within syllables is flat or falling [2].
  • There is tonal movement in almost every base syllable (pitch goes up or down, or up–down, down–up) [7].
  • High lexical tones start at a high pitch level, or occasionally at a medium pitch level, in which case the high pitch occurs within the vowel [7].
  • High lexical tones are performed with a sliding upward motion when approached from a lower pitch; a High lexical tone followed by a Low is performed with a sliding downward movement [4, 10, 21–24].
  • A ‘neutral’ pitch (n) is often used for both High and Low lexical tones, but Low and High may be realized by sliding from the initial pitch: Low may be performed l–n or n–h, High may be performed h–n [8, 9].
  • There is very little movement in words with Low lexical tones that are realized as low, while there is movement in words with High tone [4].
  • Low lexical tone may make the High boundary tone at the end of an episode low [1].
Lexical tones not prominent
  • Lexical tones are not very pronounced [2].
  • Lexical tones are often ignored. When realized, lexical tones are relative [5].
  • Lexical tones are not realized in a systematic manner, but contrary movement is rare [3].
  • Reduplicants ignore lexical tones [6–7].
  • In long combinations of phrases, all or most syllables of the second line may be low with no contrary motion in such a passage [9–10].
  • Movement contrary to lexical tones is common [2].

There are genres where lexical tones are realized by fixed pitch, which means that a High lexical tone is performed at a higher pitch-level than a Mid level, which is higher than a Low. In other genres, the realization is relative and similar to spoken language, a High tone being higher than a preceding Low tone and a Low tone lower than a preceding High tone [6]. Because of melodic declination, a High tone may start lower than a preceding Low tone in the same prosodic group [5]. In some Kammu genres, there is also a neutral pitch level that may be used for both High and Low tones. Within syllables the pitch is more or less level, or slides up or down. There may also be sections within the performance of consecutive syllables at the lowest pitch level regardless of lexical tone [9–10], which is due to performance practice when many syllables are repeated rather fast in order for the performer not to run out of breath.

Many languages in China and South-East Asia are tonal, both among the majority populations and in ethnic minorities. There are many tonal languages in Africa, and there are Native American tonal languages as well. Consequently, the relationship between music and lexical tones has been well studied. In the relevant literature, the process of combining lexical tone and melodic pitch is generally called text-setting, a term that also includes the combination of poetic and musical phrasing. Robert Ladd and James Kirby regard text-setting as a concept that is primarily useful in the context of Western-influenced music – such as Cantonese pop songs and Vietnamese tân nhạc or ‘new music’ – that differs from tone/melody correspondence in more performance-based styles:

Correspondences between tone and melody in traditional art forms such as Cantonese opera [are] also well studied …, but the problem of matching tune and text in these cases is to some extent a matter of performance practices rather than text-setting. Roughly speaking, in the acculturated (i.e. Western-influenced) musics of much of East and Southeast Asia, melodies are relatively fixed and texts must be chosen to fit, whereas in many ‘traditional’ forms the melodies are fairly abstract templates and may be modified in performance to achieve optimal tone-melody correspondence with a particular text. This issue also arises in the analysis of tone-melody correspondence in a number of Southeast Asian vocal traditions …38

The concept of text-setting has a history in the study of Western ‘written music’; but it does not fit in equally well with the process when lexical tones and musical pitches are combined in vocal expressions re-created in accordance with a performance template. In this context, they are combined simultaneously with all other stylistic factors in a manner that seems to be as instinctive as intonation or lexical tones in ordinary speech. François Dell expresses a similar view:

A song is a composite object with two components, a linguistic object, the ‘text’, and a musical object, the ‘melody’. The two objects have structures that are independent of one another, and each can be realized in the absence of the other. An essential feature of singing is that the text and the melody are produced simultaneously by the same machinery, i.e. the mind and the vocal apparatus of the same person.39

In our approach, however, we do not see the words and melodies as two separate objects. For instance, Kammu trnə̀əm – i.e. the words – and the melody associated with those templates that may be used for performing trnə̀əm seldom if ever appear alone. Nevertheless, in practice, researchers sometimes use these concepts to discuss the same things as in our research. Thus, the outcome of the analysis of differences in handling lexical tones in a number of Kammu performances supports Teresa Proto’s comment that ‘textsetting should be studied while simultaneously taking into account multiple levels of interaction between both the musical and the prosodic structures in a given piece’.40

Ladd and Kirby define four ways in which lexical tones and pitches can be combined. This is illustrated by a disyllabic word consisting of a Low and a High lexical tone (L – H) and different combinations of low and high pitch (l, h): contrary = h – l, similar = l – h, oblique = l – l or h – h. All these combinations occur in the Kammu material. In the tə́əm performance, there are also examples of L – L being performed h – l. This is caused by a ‘neutral’ pitch (n) between high and low, which means that L – L may be performed n – l and H – L may be performed h – l or n – l. In languages with several lexical tones, and perhaps especially in art music forms, the relationship between lexical tone and musical pitch is sometimes very complex. In Thai court song, it has been shown that certain combinations of lexical tones may result in longer sequences of pitches.41

George List used graphic transcriptions in the form of spectrograms in his studies of speech and song in central Thailand.42 Modern technology and software like Praat or Melodyne made detailed studies and experimental studies more feasible.43 In our study of the realization of lexical tones in Kammu vocal expressions, the technology made it possible not only to see the general outline of pitch movements but also to measure factors of tonal movement within single syllables/pitches, and to study where in the syllable a tone is realized. It has also made it possible to discern detailed movements related to lexical tones in passages in which melodic movement is fairly dominant.


The above section summarizes the factors we encountered in the borderland between song and speech. Comparisons with other research show that our results are not limited to the cultures, languages, or language families represented in our material. In the following discussion, our results will be considered in relation to wider research areas.

The continuum from speech to song

The borderland between song and speech was conceived as a segment of a broader continuum from speech to song, or in a still wider perspective: from language to music. This view of a continuum is shared by a number of researchers, especially, perhaps, in ethnomusicology. In Chapter 2 on Kammu vocal genres, that aspect was exemplified in a table of a continuum of poetic complexity (Table 1). In Table 16, the other parameters have been added in order to see how they relate to one another. Generally, the different parameters appear to change gradually with the genres, and basically in a similar fashion. This finding supports the assumed continuum.

Of the other cultures represented in our material, only the Athabascan has more than one genre. The Caribou song (senh ch’eliga’) closely adheres to the pitch contour and rhythm of speech while having a regular pulse and a tonal centre. This would place it to the left of the middle in the continuum in Table 16. The dratakh ch’elik performed at memorial feasts are strophic, with distinct melodic movement and strong musical phrasing, while prosodic rhythm is prominent in sections consisting of words with lexical meaning. It is located further to the right on the spectrum. The ch’edzes ch’elik used for dancing is on the far right of the spectrum. The Athabascan material does not contradict the Kammu continuum.

Two parameters behave slightly differently, though. The initial/final formulae are lacking in Analyses 5 and 7. This might not be one of the most important parameters and it should be noted that there is a floating border between phrase endings and final formulae. More important is the fact that the assumed continuum from speech melody to musical melody is disrupted by the monotone. The monotone includes vocal expressions that lack speech intonation, and it is based on one pitch or – in the case of Kammu with two lexical tones – on two pitches (initial formula disregarded). Since monotonic scale is recognized, one may speak of a tonal centre in a case of monotone. This is definitely the case with hrlɨ̀ɨ [4]. It could be seen as only partly monotonic; but this would not solve the problem of the continuum if, for instance, waka performances are considered [25]. With regard to rhythm, waka with a musical metre belongs rather far to the right in the spectrum, disrupting the melody continuum.

In his classification of a continuum from speech to song, George List sees two routes between the two phenomena: one goes via expansion of intonation and stability of pitch, the other via negation of intonation and expansion of scalar structure.44 In the latter case, List places the monotone as the extreme. List’s model has the advantage of including the monotone, and it is sometimes used in order to place vocal expressions somewhere on a scale; but since it builds on a presupposed idea of evolution, it also has limitations.

It is notable that certain factors in the continuum in Table 16 coincide: tonal centre, fixed pitches, and regular beat coincide with word variation, namely same tone-duration for long and short vowels and the same duration for schwa vowels and song-words. Hence, it seems that the combination of regular rhythm and tonality provides the basis for durational variations of words. Similarly, the combination of tonality and iambic rhythm and increased prominence of musical phrasing coincide with syllabic reduplication. These particular word variations are language-specific, or specific to languages that are similar to Kammu in these respects. Nevertheless, the observation that tonality and regular beat coincide and may, in their turn, generate other changes could be more general. That also goes for the addition of a higher proportion of musical phrasing.

In all essential respects, our material supports the idea of a continuum. The exceptions seem to indicate that one cannot expect a continuum that runs parallel in all possible parameters; rather, it needs to be envisioned as a multi-dimensional system.

Vocal expressions and genres

In Chapter 3 on North Athabascan music, two genres were discussed in detail. There are structural similarities between them, for instance the use of tone repetitions at the end of a phrase; but they also differ, particularly in the use of words with or without lexical meaning. Both are performed at potlatch feasts, but they have different functions there. Dratakh ch’elik serve the purpose of honouring the deceased, whereas ch’edzes ch’elik belong to the feasting that follows this ceremonial part. The genres are deeply linked with cultural and social practice.

Lɔ̀ɔŋ (narratives) Kàm à-thí-tháan (prayers) Ceremonial expressions (Ɔ̀ɔc, etc.) Krùu (spells) Hrlɨ̀ɨ Hrwə̀, Húuwə̀ Yàam Yùun tíiŋ Tə́əm
Analysis 1 Analysis 2 Analyses 3, 8a Analysis 10 Analysis 4 Analyses 5–6 Analysis 7 Analysis 8b Analysis 9
Word variations
Schwa barely audible
Schwa same duration as all other vowels that are performed as short
Same tone duration for long and short vowels
Syllabic reduplication
Schwa can be long Schwa can be reduplicated
Initial/final formulae
Initial and final Initial and final
Prosodic phrase pairs
Prosodic and musical phrasing aligned
Verbal phrasing dominates
Musical phrasing gradually more prominent
Speech rhythm
Regular beat
One-beat rhythm
Iambic rhythm
Speech melody
Tonal centre, fixed pitches

The vocal expressions analysed in the Kammu chapter represent several different genres, in most cases performed by the same person. These genres range from mainly language-centred performances to mainly music-centred ones. Several of the genres are distinctly different in the handling of the studied parameters. In Kammu culture, these genres are deeply embedded in cultural practice. The vocal expressions have their different time, place, and function. Kammu music activity, as a whole, is closely tied to different periods in the life cycle and to different stages of the farming year. Its formal characteristics also signal the situation to which they belong. In the case of the music of the farming year, that function is integrated into other practical and ceremonial activities, thus serving as a calendar.45

The fact that cultural context produces different genres is not unique for North Athabascan and Kammu culture; indeed, as exemplified by a number of studies, it is the norm rather than the exception. In his study of the music of the Venda culture in northern Transvaal, John Blacking presents a graph that shows ‘the relationship between the performance of communal music and the seasonal cycle’.46 Similarly, concerning Aka, on the border between the Central African Republic and the Democratic Republic of the Congo, Susanne Fürniss lists genres, situations, voice, and instruments in a graphic representation called ‘the musical universe of the Aka’.47 Writing about the Kalankira in the Bolivian Andes, with chapters dealing with ‘Orchestrating the year: Seasonal alternation, calendars and powers’ and ‘The music of a year’, Henry Stobart says:

Musical knowledge, discourse and practices in Kalankira are not neatly separated from other spheres, but they are deeply integrated into more general ideas about ‘production’ … The ‘Poetics of production’ in this book’s title aims to stress the mutual independence of musical and socio-economic production …48

This demonstrates the social importance of vocal expressions in human culture; and since many of these exist in the borderland between song and speech, there is good reason to include them in the study of music and language.

The mono-melodic principle

The mono-melodic, sometimes called mono-thematic, phenomenon refers to a practice in which one melodic framework is used for a number of sets of words. In the Yùan dialect area of Kammu culture, for example, the Yùan tə́əm performance template is used for practically all social singing on festive occasions. One collection contains about 150 trnə̀əm performed in accordance with the same Yùan template.49 The mono-melodic organization of vocal genres in Kammu culture is a special case in which a number of different mono-melodic genres are used in very specific contexts, and one and the same trnə̀əm may – at least in theory – be performed in accordance with either genre’s performance template [4–9].

Mono-melodic practices are widespread in southern China and South-East Asia, on islands as well as on the mainland, not least in the cultures of the numerous ethnic groups in that part of Asia, and they are closely related to alternating singing.50 For instance, a mono-melodic system is reported by Antoinet Schimmelpenninck concerning the shan’ge traditions in southern Jiangsu:

Every village in the Wu area appears to have its own tune or its own local variant of a regionally popular tune. Within every village, every singer appears to have his own, personal interpretation of such a tune. This ‘monothematism’ of the performance is one of the most remarkable aspects of shan’ge singing in the area … This leaves me with the need to define the notion of a ‘melodic framework’ … No doubt several other ‘monothematic’ regions exist in the Wu area.51

As in Chinese narrative genres, one can also see the ‘fixed tunes’ or ‘tune types’ in Chinese music dramas as examples of mono-melodic systems.52 Similarly, lullabies in South-East Asia and elsewhere often use a particular melodic framework. This is the case in Thailand:

lullabies are generally sung to a short melodic formula which is repeated over and over with more or less variation. In some cases many short nursery rhymes are strung together which results in longer performances … Though the melodies of the different regions differ … they do have a core of tonal material in common.53

Lullabies are often tied together into very long performances built on the same melodic formula. This is due to the function of lullabies as explained by Johannes Kneutgen, who found, by measuring a child’s breathing rhythm, that an Argentine lullaby needed 14 repetitions before the child was asleep.54 In Swedish tradition, a common melodic formula for lullabies is called ‘The fishing ground tune’. Carl-Allan Moberg studied about 100 variants of the tune from different music genres in order to find its origin.55 The 19 lullaby variants show examples of prolonging and contraction, discussed above under ‘phrasing’. This is an example of an approach from the perspective of vocal expression and performance template that might place the focus on things other than – in this case – the perspective of the original and variants. The example also indicates that even though we have studied vocal expressions in a few cultural contexts in East and South-East Asia and Alaska, the method is likely to be useful in all parts of the world.

The cross-cultural aspect

The present study is cross-cultural, since it is based on material from five distinct cultural contexts. We were looking for diversity and richness of vocal expressions rather than similarities, so the comparative perspective has not been in the foreground. Comparative musicology – in German ‘vergleichende Musikwissenschaft’ – was a branch of musicology from the very beginning of the discipline in the late nineteenth century. The extensive material that was collected was analysed, compared, and ordered in accordance with the Kulturkreis school, i.e. Darwin’s theory of evolution applied to human culture and music. Researchers took an interest in basic vocal expressions because they fitted the assumption that everything had developed from the primitive to the advanced, rather than for their role as vehicles for expression and communication. In the case of vocal expressions, the factors that were measured were primarily pitches, the number of tones in scales, types of scales, and rhythmic organization. Another aspect of comparative musicology was diffusionism, according to which all things had a common origin and had spread from one place to another. Comparative musicology was thus much occupied with evolution and history and went hand in hand with racial biology.56

After the Second World War, these views were no longer viable. One critique within the developing field of ethnomusicology was that comparative musicology had built on generalizations concerning music that had been taken out of the musical context and then analysed from an ethnocentric Western perspective. Ethnomusicology instead focused on the study of music in cultures, which was the focus of Alan P. Merriam’s anthropology of music.57 There is now a rich material of studies of music in its cultural contexts in all parts of the world. On the other hand, there is no complete overview of music on a global scale. Linguistics has developed differently and seems to have escaped this dichotomy of views. One reason may be the early influence of structuralist theory, which made it possible to develop a generally accepted set of theoretical parameters for the study of languages.

Against this background, it is not surprising that new theoretical possibilities for cross-cultural studies are now developing in interdisciplinary research. Patrick E. Savage and Steven Brown are aiming at a new comparative musicology:

As with its sister discipline of comparative linguistics, comparative musicology seeks to classify the musics of the world into stylistic families, describe the geographic distribution of these styles, elucidate universal trends in musics across cultures, and understand the causes and mechanisms shaping the biological and cultural evolution of music.58

This field is not seen as a ‘replacement for ethnomusicology or historical musicology but as a specific stream within the overall umbrella of musicology’. Savage and Brown attempt to bridge the dichotomy of between-culture and within-culture facets of cultural diversity by considering the two simultaneously, using methodology developed in the study of genetic diversity in population genetics (analysis of molecular variance, AMOVA). Using music samples from a number of Austronesian-speaking peoples in Taiwan and the Philippines, they find that

[t]he multi-dimensional scaling plot clearly demonstrates the high level of internal heterogeneity in each population’s musical repertoire and the high degree of overlap between populations … This validates and quantifies the critiques of ethnomusicologists that Cantometrics’ cross-cultural approach underestimated the diversity of musical repertoires within each culture.59

One of the critiques against Alan Lomax’s worldwide mapping of folk-song styles, carried out by way of the application of his Cantometrics system, has concerned a number of parameters that were to be coded by the researcher and therefore risked being subjective.60 The use of samples that were sometimes small and gathered under varying circumstances was criticized, too. While Cantometrics focused on the social setting of performances, the coded parameters in the new CantoScore focus totally on musical theory.61 In a sense, this limitation is a return to the old comparative musicology, though a much larger number of parameters are now involved. The problem with limited samples and subjectivity in the coding process remains; but another factor constitutes a new problem, namely that the musical transcriptions that play a key role were made by different researchers, at different times, and with different aims.62

The study of the music of Austronesian ethnic groups on Taiwan and in the Philippines may be regarded as a pilot study.63 The results do not contradict our two Seediq performances, and further research may make it clear whether the use of new methodologies for classifying and mapping intra- and cross-cultural variation will compensate for some of the limitations of the method. In another article, the speech/language continuum is discussed in relation to cross-cultural research:

A truly universal approach cannot exclude ‘nonmusical’ vocalizations but must accommodate any type of vocalization sitting along the musilinguistic spectrum of communicative forms from speech, to songs, to everything in between.64

In our research, we have attempted to steer clear of the dichotomy between music and language. The concepts ‘vocal expression’ and ‘performance-template methodology’ have proved useful in that respect. This approach should have the potential to contribute to comparative cross-cultural research by making the dichotomy of language/music less problematic, by increasing knowledge of the degree and nature of variation in vocal expressions, and by isolating further key parameters for cross-cultural study. The approach might, in turn, also be used for validation of the statistical or mathematical methodologies that are employed in research along such lines.

Universals and evolution

Five principal research issues are listed in the description of the new comparative musicology: classification, cultural evolution, human history, universals, and biological evolution.65 In our study, we have viewed the performances and vocal genres as the contemporary practices they are, and have not touched upon matters of origin or evolution. It could be noted, however, that some results – for example the image of a continuum from speech to song (Tables 4 and 16) – coincidentally show similarities to Steven Brown’s ‘Musilanguage’ model, which was used to illustrate a theory of common origins for language and song.66

In the field of (ethno)musicology, the attitude to evolution and universals has mainly been one of suspicion. This may to some extent be explained by the fact that these terms were burdened by the way they had been used in connection with the pre-Second World War comparative musicology described above. A different focus is adopted by Steven Brown and Joseph Jordania in an attempt to focus on a positive approach to possible universals rather than conduct a meta-critique about the concept as such. They provide a list of possible universals in music, divided into four main groups at a fairly high degree of generalization.

Regardless of the type of category or system analyzed, there will be varying degrees of generality for any component when performing a cross-cultural comparison, as based on the frequency of appearance of that trait in the world’s musics. In other words, there will be a gradient of universality for the family of components, some components being more prevalent than others. This gradient should vary from complete universality to complete culture-uniqueness.67

When considering that vocal expressions in the borderland between song and speech would be as universal as music and language, their characteristics are likely to contain traits that could add to the list of possible universals and, as a result, contribute to a wider base for cross-cultural or comparative studies. This is definitely the case if potential universals are viewed as factors that could vary from complete universality to complete culture-uniqueness. Some recurring aspects of our material could easily be listed: binary form, the existence of vocables, the existence of ‘song-words’, initial and final formulae, prolongation, contraction, melody-centration, and word-centration, to mention just a few.

The Music Lab at Harvard University does broadly interdisciplinary research on music within the ‘The Natural History of Song’ (NHS) project, aiming at ‘a systematic investigation of the world’s vocal music’.68 Their comparative research is based on two representative corpora, one composed of ethnographic descriptions of song performances and one composed of field recordings. The aim is to conduct ‘systematic analysis of the features of musical behaviour and structure across cultures, using scientific standards of objectivity, representativeness, quantification of variability, and controls for data integrity’.69 The Music Lab scholars study four song types – dance, lullaby, healing, and love – through ‘automatic music information retrieval, annotations from expert listeners, annotations from naive listeners, and staff notation transcriptions (from which annotations are automatically generated)’ as well as a number of statistical operations, including reliability tests.

Our analyses of the NHS Discography show that four common song types, distinguished by their contexts and goals, have distinctive musical qualities worldwide. These results suggest that universal features of human psychology lead people to produce and enjoy songs with certain kinds of rhythmic or melodic patterning that naturally go with certain moods, desires, and themes. These patterns do not consist of concrete acoustic features, such as a specific melody or rhythm, but rather of relational properties like accent, meter, and interval structure.70

The results were clearest for the lullaby and dance categories. Inevitably, the research includes many approximations and generalizations. The approximations may be based on Western music theory (major and minor third as approximations of a number of intervals that in many cases are not stable, even if thus perceived) or views, such as the approximation of ‘single vowel’ for ‘vocalization’ in the case of the CantoScore, to mention a couple of examples. From the point of view of our research, which is basically intracultural, it is natural to question what cumulative effects a series of similar approximations in the basic data may have on the outcome.

In the material used for our present study, there are no lullabies, but it contains several vocal expressions that involve healing situations and dancing. These differ in character from the other vocal expressions. However, in Kammu tə́əm there are performances that deal with love, praising, criticizing, birds or nature, etc. All such different themes may occur in the same festive situations, although some cases are specific situations for love themes.71 This example shows that the functions of vocal expressions are not necessarily linked with differences in the music or performance. Future research may show whether Kammu is an exception, or perhaps whether lullaby and dance are exceptions as ‘universal archetypes’.

Endangerment and transmission

Since vocal expressions in the borderland between song and speech demand language knowledge – in some cases very deep knowledge – they are endangered to the same degree as the language in question. This circumstance is expressed as follows by Allan Marett and Linda Barwick with regard to Australian native tradition:

It is widely reported in Australia and elsewhere that songs are considered by culture bearers to be the ‘crown jewels’ of endangered cultural heritages whose knowledge systems have hitherto been maintained without the aid of writing. It is precisely these specialised repertoires of our intangible cultural heritage that are most endangered, even in a comparatively healthy language. Only the older members of the community tend to have full command of the poetics of song, even in cases where the language continues to be spoken by younger people.72

The Athabascan situation in interior Alaska is an example of this. Vocal expressions occur mainly at special feasts, some of which are related to a funeral or memorial. Very few elders are still living who know how to compose, while younger persons who start composing must struggle with the language. This has actually led to a situation in which music supports language revitalization in the potlatch feast context.73 The music and dance revival has also included staged performances; and the Minto dancers have taken part in pow-wows, where native musics are staged in the context of music and dance contests in meetings organized and run by Native Americans, basically as an internal activity.

In a study from a sociolinguistic perspective of two Alaskan Eskimo communities, Hiroko Ikuta reports a situation similar to the Athabascan in that ‘the heritage languages – Yupik on St Lawrence Island and Iñupiaq in Barrow on the Northern shore of Alaska – have secured a continued existence in the context of song-and-dance performances’.74 Ikuta continues:

It is the withdrawal of language and cultural performance, away from globalisation processes that have moved English in a dominant position, that has created a safe space for the use of the heritage language, be it that in this process the heritage language has been reduced to emblematic forms … I suggest that practice of Eskimo dancing and singing that local people value as an important linguistic resource can be considered as a de-globalised sociolinguistic phenomenon, a process of performance and localisation in which people construct a particular linguistic repertoire withdrawn from globalisable circulation in multilingualism.75

This is an example in which music, dance, and language in combination give those that do not speak the ‘heritage language’ an ethnolinguistic identity and the feeling of cultural continuity. The places – the arenas for performance, the potlatch in the Athabascan case – are of central importance for this form of language and music continuity.

The close relationship between language and music has led Catherine Grant to suggest that approaches relating to the maintenance of endangered languages can assist ways of supporting endangered music genres. She does this by developing ‘the Music Vitality and Endangerment Framework (MVEF), for identifying and measuring music endangerment, based on a framework developed by UNESCO for identifying and measuring language endangerment’.76 Similarly, Neil R. Coulter has demonstrated that GIDS (Graded Intergenerational Disruption Scale), which he adapted into GDMS (Graded Music Shift Scale), can be used to give a nuanced picture of music shift with regard to genres and generations among the Alamblak in Papua New Guinea.77 This approach makes it possible to express the degree of endangerment of a kind of music, or of specific music genres, in processes of change.

In many cases, change is brought about by national cultural policies. A common method, especially in the socialist cultural policies of China and Vietnam, is the staging of certain music styles and dancing. Staging usually means recontextualization of music in the sense that the music has moved to another place or a new social or cultural setting and has taken new roots there.78 Certain vocal or instrumental music and/or dances then become representative of individual ethnic groups. The music is often rearranged to fit the current national style, and it may be performed by individuals who do not belong to the ethnic groups in question, as in the Vietnamese ‘neotraditional music’. There may be a whole spectrum of contexts, from performances in the ethnic village on the one hand, to the music becoming part of a totally different context as an aspect of a politically created repertoire at the other extreme. This principle has been described by Sue Tuohy in relation to China.79 The consequences for music genres and for the people who perform them have been described by Ó Briain concerning Hmong in Vietnam.80 Catherine Ingram presents an account of the staging of dage, ‘big song singing’, of the Kam (or Dongzu) population in the Guizhou Province in southern China:

[A]s a result of the social and political restrictions in Kam villages during the Cultural Revolution … the singers involved in staged big song performances from the 1980s onwards began to include many Kam people with no experience in village big song singing. The 1980s also marked the beginning of Kam song classes in tertiary institutes specifically for training Kam professional performers. In the 1990s, staged performances increased in popularity and began to feature in televised broadcasts … Kam people refer to the staged performance of Kam songs, including big song, as cha tai dor ga – literally, ‘going onstage to sing songs’. The various features of staged big song singing that allow Kam people to distinguish it from big song singing occurring in the village context have remained virtually static over the last sixty years.81

There are no reports as yet about Kammu in the Yùan area in northern Laos, or the Akha of northern Thailand, becoming involved in a similar change. Thomas Turino speaks of cultural nationalism in Latin America and in colonial and post-colonial Zimbabwe.82 Cultural nationalism occurs everywhere and these processes are not limited to socialist countries, though methods may vary. In Taiwan, there are archives containing recordings of the music of ethnic groups, and revival movements are also going on among the Seediq.83

Sustainability is a sensitive matter and researchers are divided in this respect: some want to preserve local traditions, while others are prepared to accept recontextualization as one way for a music and its function for individual cultural or social identity to survive. This view was taken in a project led by Huib Schippers, in which matters of endangerment and sustainability were studied in 11 different contexts in different parts of the world; Schippers employed the same set of parameters, which included relationships with national states and the media. The result was an overview of similarities and diversity in sustainability processes.84 The point is that the relationship or tension between village (or local) music culture and national cultural policy is present everywhere:

The Vietnamese example demonstrates that the music culture of an ethnic minority group or village cannot only be seen as a separate unit but also must be seen in relation to the national music culture of which it is a part and to the processes going on there. This is crucial when thinking of the future of this music.85

Likewise, the transmission of musical and linguistic knowledge depends on national educational policies. Kathryn Marsh has studied children’s game songs in a global perspective.86 She found that the formulaic character of some children’s game songs means that the children know how to use the formulae for varying the songs and for inventing new ones. This phenomenon could be described in terms of vocal expressions in the borderland between song and speech, and be studied by means of performance templates. Marsh points out that this fact is generally not recognized when children start school, which sometimes leads to ‘deskilling’.87

Teachers are thus in the position to show an acceptance of children’s musical traditions and the varied sources on which they draw. From this position of acceptance, they can then broaden children’s musical perspectives by providing a wide range of music for performance, listening and as a basis for creation … Classroom musical activities can thus contribute to, rather than be antithetical to, the continued flourishing of multiple traditions of children’s musical play.88

The role that music education in school plays for the transmission of cultural knowledge is often underestimated, especially in view of the fact that Western school-music education is one of the most globalized musical activities, a process that started in the mid-nineteenth century.89 Awareness of children’s knowledge in creation, re-creation, and transmission of vocal expressions has increased as a result of research in ethnomusicology and music education. The fact that children also learn a structure for language expression is of relevance for literacy education and the communication arts, as exemplified by Akosua Addo concerning Ghana.90 In her study of ‘Miskitu children’s speech and song on the Atlantic coast of Nicaragua’, Amanda Minks has shown how the communicative competencies children possess, and have developed, are used in an intercultural context to create social identity.91

Transmission of vocal expressions is not limited to words and melodies; above all, it is a matter of the ability to relate to vocal expressions by learning, adapting, and transmitting.92 When traditional vocal expressions are recontextualized in the case of stage performance, or when taught in school, they usually appear in a normalized stable form that is learnt and then repeated. If children – or adults, for that matter – are not ‘deskilled’ from their ability to create and re-create vocal expressions, they become more receptive to the transmission of vocal expressions. Our research has shown that it is not enough to transmit the performance: knowledge of how to re-create vocal expressions in performance needs to be transmitted, too. This, in turn, calls for a certain level of music and language proficiency.

In our research, we have combined ethnomusicology with different specializations within linguistics in order to study vocal expressions as neutral objects – neither song nor speech, but both. There is a tradition of combining methodologies of musicology and linguistics, more perhaps for the study of music than for the study of linguistics. The term musicolinguistics that has been used by Steven Feld and Aaron Fox – and even by some in our own group – has recently been incorporated in the title of a study by Morgan Sleeper.93 This term may be useful as a label of interdisciplinary research involving musicology and linguistics as a sub-area within the two disciplines.

Written ‘MusiCoLinguistics’, the same term is used for language, music, and cognition research by Rie Asano at the University of Cologne.94 Cognition is an area that we have barely touched upon in this discussion. While we have collaborated in trying to use our specializations in musicology and different fields of linguistics, and also in bridging the gaps between them, it may be observed that there is still a gap between researchers in the humanities and those who use scientific methods, partly because of different perspectives, methods, and languages. In Aniruddh Patel’s words:95

The music–language relations is one area in which scientific and humanistic studies can meaningfully intertwine, and in which interactions across traditional boundaries can bear fruit in the form of new ideas and discoveries that neither side can accomplish alone. Studies that unify scientific and humanistic knowledge are still uncommon …

Scientific and humanistic knowledge cannot be unified until both kinds of research – the scientific and the humanistic – can understand and evaluate each other’s results. There is a large gap between them in this respect – a gap that may be bridged only in interdisciplinary collaborations which provide contexts for learning to communicate these matters.

Our research is a step in that direction. While linguists often define music in terms of pitches and rhythm, most ethnomusicologists avoid definitions and go by feeling to search for, as Ian Cross puts it, ‘something like music’.96 In our case, we chose to focus on a borderland between song and speech. This choice has made definitions of music, language, or the difference between them unnecessary, thereby paving the way for collaboration.

We have found that it is possible to recognize forms of human communication that can be described as vocal expressions in the borderland between song and speech, and that it is also possible to design a method for studying them that leads to new knowledge of this borderland. It is a fairly distinct yet loosely defined category that – in addition to general knowledge of one’s own cultural context – calls for knowledge of the language and of the performance templates that are necessary for realizing these forms of human communication. We have also shown that it is possible to speak of a continuum from speech to song, even though it may not always be as clear cut as is often depicted.

It is not obvious where this borderland ends. A pragmatic view would be to say that it ends where the method does not produce results and cannot be replaced by a different method that works. While we have necessarily had to restrict our studies to a limited number of vocal expressions, they are widely disseminated and hence relevant to many – if not most – areas of musicological and linguistic research, as the preceding paragraphs have demonstrated. Examinations of these vocal expressions consequently add to the state of knowledge in all those areas.

1 In the following, references to analysis numbers are placed in square brackets.
2 Compare the wishlist presented by Barwick 2006.
3 This is our experience from the collaboration with the ongoing research project ‘Rwaai: Digital Multimedia Archive of Austroasiatic Intangible Heritage’ at Lund University, Sweden, led by Niclas Burenhult.
4 Banti and Giannattasio 2006: 196.
5 Ewald 2013 found similar templates in spontaneous narratives in the Austroasiatic languages Jahai and Mah Meri, Malaysia.
6 Chow and Brown 2018.
7 Patel 2008: 96.
8 Lomax 1968: 49.
9 Fabb 1997: 97.
10 The term ‘litany’ in connection with musical form was introduced and defined in Lomax 1968: 58–61.
11 Loh 1982: 230.
12 Schimmelpenninck 1997: 197.
13 Here, the typology of parallelism presented in Fabb 1997 and the parallelism of Toda songs, India, should be mentioned, as described together with differences between spoken and sung language in Emeneau 1966.
14 Compton 1979, Miller 1985.
15 Nguyen 1954.
16 Emeneau 1971: 15.
17 For basic forms and transcriptions of words in such combined performances, see Lundström and Tayanin 2006: 33–199 and 201–206.
18 Schimmelpenninck 1997: 192.
19 This is summarized in Lundström 2010: 76–77. The concepts of contraction and prolongation approximately correspond to the embedding and conjoining of Feld 1990: 253.
20 Yung 1989: 94–95.
21 Wichmann 1991: 34.
22 Yung 1989: 57.
23 Lomax 1968: 58.
24 Oesch 1979: 18–19.
25 Mulder 1994 and O’Keeffe 2007: 57.
26 Wichmann 1991: 151.
27 Loh 1982: 202–206.
28 Dixon and Koch 1996 quoted in Banti and Giannattasio 2006: 306.
29 Oesch 1979: 16–17.
30 This factor is so strong in Kammu hrlɨ̀ɨ that it was used as additional proof of the existence of schwa vowels in so-called minor syllables; Lundström and Svantesson 2008: 124–125.
31 Dell 2011: 182–183.
32 Smith 1979 and 1991: 27 ff.
33 Yung 1989: 94–98.
34 See Lundström 2010: 86, 213–17 for four other dialect templates and transcriptions.
35 Schimmelpenninck 1997: 195.
36 See Table 4 on p. 119.
37 See the Akha shaman performance template, p. 224.
38 Ladd and Kirby 2020: 678–679, note 1.
39 Dell 2011: 173.
40 Proto 2015: 126.
41 Tanese-Ito 1988.
42 List 1961 and 1963.
43 See for example Morey and Schöpf 2011, Crupi 2014, Schellenberg 2014, and Pooley 2018.
44 List 1963: 9.
45 This is summarized in Lundström and Tayanin 1982: 132–135, Table 1.
46 Blacking 1965: 30, Fig. 2.
47 Fürniss 2006: 166, Fig. 5.1.
48 Stobart 2006: 5.
49 Lundström and Tayanin 2006.
50 See Lundström 2018: 989–991.
51 Schimmelpenninck 1997: 224, 226, 267.
52 Yung 1989: 128–137, Pian 1993.
53 Stone and Lundström 2000.
54 Kneutgen 1970.
55 In Swedish: ‘Fiskeskärsmelodin’. Moberg 1950.
56 See Brabec de Mori 2017: 116–119 for a summary of this research method.
57 Merriam 1964.
58 Savage and Brown 2013: 148.
59 Rzeszutek, Savage, and Brown 2011: 1608.
60 Lomax 1968.
61 Savage, Merritt, Rzeszutek, and Brown 2012.
62 The method CantoCore and the practical coding are presented in Savage, Merritt, Rzeszutek, and Brown 2012. The limitations of the method are discussed in Savage and Brown 2013: 149–151.
63 Savage and Brown 2013.
64 Savage, Merritt, Rzeszutek, and Brown 2012: 89.
65 Savage and Brown 2013: 150.
66 See Brown 2000: 274 ff.
67 Brown and Jordania 2011: 233.
68 Department of Psychology, Harvard University, www.themusiclab.org/.
69 Mehr and Singh 2018: 2.
70 Mehr and Singh 2018: 2.
71 See Lundström and Tayanin 2006.
72 Marett and Barwick 2001: 144.
73 See Sleeper 2018: 83–88 for music in language revitalization.
74 Ikuta 2010: 172.
75 Ikuta 2010: 171–172.
76 Grant 2014.
77 Coulter 2011.
78 Schippers 2010: 121.
79 Tuohy 2001.
80 Ó Briain 2018.
81 Ingram 2012: 439–440.
82 Turino 2003 and 2008: 145 ff.
83 The movie Warriors of the Rainbow: Seediq Bale (2012) actually includes a Seediq canonic imitation performance. Internet reference: Seediq Bale song.
84 Schippers and Grant 2016.
85 Lundström 2018: 1001.
86 Marsh 2008.
87 Marsh 2008: 314.
88 Marsh 2008: 317.
89 See Cox and Stevens 2010.
90 Addo 2013.
91 Minks 2013.
92 See further Lundström 2012: 654–656.
93 Feld and Fox 1994, Lundström and Svantesson 1996, and Sleeper 2018.
94 Internet reference: MusiCoLinguistics.
95 Patel 2008: 417.
96 Cross 2012: 317.
  • Collapse
  • Expand

All of MUP's digital content including Open Access books and journals is now available on manchesterhive.


In the borderland between song and speech

Vocal expressions in oral cultures


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 279 143 35
PDF Downloads 101 69 3