Chapter marking within podcasts

Smartphones are facilitating our listenership to podcasts

As we listen to more spoken-word audio content in the form of podcasts and the like, we may want to see this kind of audio content easily delineated in a logical manner. For that matter, such content is being listened to as we drive or walk thanks to the existence of car and personal audio equipment including, nowadays, the “do-it-all” smartphones being connected to headphones or car stereos.

This may be to return to the start of a segment if we were interrupted so we really know where we are contextually. Or it could be to go to a particular “article” in a magazine-style podcast if we are after just that article.

Prior attempts to delineate spoken-word content

In-band cue marking on cassette

Some people who distributed cassette-based magazine-style audio content, typically to vision-impaired people, used mixed-in audio marking recorded at high speed to allow a user to find articles on a tape.

This worked with tape players equipped with cue and review functionality, something that was inconsistently available. Such functionality, typically activated when you held down the fast-forward or rewind buttons while the tape player was in play mode, allowed the tape to be ran forward or backward at high speed while you were able to hear what’s recorded but in a high-pitch warbling tone.

With this indexing approach, you would hear a reference tone that delineated the start of the segment in either direction. But if you used the “cue” button to seek through the tape, you would also hear an intelligible phrase that identified the segment so you knew where you were.

Here, this function was dependent on whether the tape player had cue and review operation and required the user to hold down the fast-wind buttons for it to be effective. This ruled out use within car-audio setups that required the use of locking fast-wind controls for safe operation.

Index Marking on CDs

The original CD Audio standard had inherent support for index marking that was subordinate to the track markers typically used to delineate the different songs or pieces. This was to delineate segments within a track such as variations within a classical piece.

Most 1980s-era CD players of the type that connected to your hi-fi system supported this functionality. This was more so with premium-level models and how they treated this function was markedly different. The most basic implementation of this feature was to show the index number on the display after the track number. CD players with eight-digit displays showed the index number as a smaller-sized number after the track number while those with a four or six-digit display had you press the display button to show the track number and index number.

Better implementations had the ability to step between the index marks with this capability typically represented by an extra pair of buttons on the player’s control surface labelled “INDEX”. Some more sophisticated CD players even had direct access to particular index numbers within a track or could allow you to program an index number within a track as part of a user-programmed playlist.

As well, some CDs, usually classical-music discs which feature long instrumental works that are best directly referenced at significant points made use of this feature. Support for this feature died out by the 1990s with this feature focused on marking the proper start of a song. It was considered of importance with live recordings or concept albums where a song or instrumental piece would segue in to another one. This was of importance for the proper implementation of repeat, random (shuffle) play or programmed-play modes so that the song or piece comes in at the proper start.

There was an interest in spoken-word material on CD through the late 1990s with the increase in the number of car CD players installed in cars. This was typically in the form of popular audiobooks or foreign-language courseware and car trips were considered a favourite location for listening to such content. But these spoken-word CDs were limited to using tracks to delineate chapters in a book or lessons within a foreign-language course.

But CD-R with the ability to support on-site short-run replication of limited-appeal content opened the door for content like religious sermons or talks to appear on the CD format. This technology effectively “missed the boat” when it came to support for index marking and most CD-burning software didn’t allow you to place index marks within a track.

The podcast revolution

File-based digital audio and the Internet opened the door to regularly-delivered spoken-word audio content in the form of podcasts. These are effectively a radio show that is in an audio file available to download. They even use RSS Webfeeds to allow listeners to follow podcasts for newer episodes.

Here, podcast-management or media-management software automatically downloads or enqueues podcast episodes for subsequent listening, marking what is listened to as “listened”. Some NAS-based DLNA servers can be set up to follow podcasts and download them to the NAS hard disk as new content, creating a UPnP-AV/DLNA content tree out of these podcasts available to any DLNA-compliant media playback device.

The podcast has gained a strong appeal with small-time content creators who want to create what is effectively their own radio shows without being encumbered by the rules and regulations of broadcasting or having to see radio stations as content gatekeepers.

The podcast has also appealed to radio stations in two different ways. Firstly, it has allowed the station’s talent to have their spoken-word content they broadcast previously available for listeners to hear again at a later time.

It also meant that the station’s talent could create supplementary audio content that isn’t normally broadcast but available for their audience, thus pushing their brand and that of the station further. This includes the creation of frequently-published short-form “snack-sized” content that may allow for listening during short journeys for example.

Secondly a talk-based radio station could approach a podcaster and offer to syndicate their podcast. That is to pay for the right to broadcast the podcast on their radio station in to the station’s market. It would appeal to radio stations having programming that fills in schedule gaps like the overnight “graveyard shift”, weekends or summer holidays while their regular talent base isn’t available. But it can also be used as a way to put a rising podcast star “on the map” before considering whether to have them behind the station’s microphone.

Why chapter marking within podcasts?

A lot of podcast authors typically ran their shows in a magazine form, perhaps with multiple articles or segments within the same podcast. As well, whenever one gave a talk or sermon, they would typically break it down in to points to make it clear to their audience to know where they are. But the idea of delineating within an audio file hasn’t been properly worked out.

This can benefit listeners who are after a particular segment especially within a magazine-style podcast. Or a listener could head back to the start of a logical point in the podcast when they resume listening so they effectively know where they are at contextually.

This can also appeal to ad-supported podcast directories like Spotify who use radio-style audio advertising and want to insert ads between articles or sections of a podcast. The same applies to radio stations who wish to syndicate podcasts. Here they would need to pause podcasts to insert local time and station-identity calls and, in some cases, local advertising spots or news bulletins.

Is this feasible?

The ID3 2 standard which carries metadata for most audio file formats including MP3, AAC and FLAC supports chapter marking within the audio file. It is based around a file-level “table of contents” which determine each audio chapter and can even have textual and graphical descriptions for each chapter.

There is also support for hierarchical table of contents like a list of “points” within each content segment as well as an overall list of content segments. Each of the “table of contents” has a bit that can indicate whether to have each chapter in that “table of contents” played in order or whether they can be played individually. That could be used by an ad-supported podcast directory or broadcast playout program to insert local advertising between entries or not.

What is holding it back?

The main problem with utilising the chapter markers supported within ID3.2 is the lack of proper software support both at the authoring and playback ends of the equation.

Authoring software available to the average podcaster provides inconsistent and non-intuitive support for placing chapter markers within a podcast. This opens up room for errors when authoring that podcast and enabling chapter marking therein.

As well, very few podcast manager and media player programs recognise these chapter markers and provide the necessary navigation functionality. This could be offered at least by having chapter locations visible as tick marks on the seek-bar in the software’s user interface and, perhaps allowing you to hold-down the cue and review buttons to search at the previous or next chapter.

Better user interfaces could list out chapters within a podcast so users can know “what they are up to” while listening or to be able to head to the segment that matters in that magazine-style podcast.

Similarly, the podcast scene needs to know the benefits of chapter-marking a podcast. In an elementary form, marking out a TED Talk, church sermon or similar speech at each key point can be beneficial. For example, a listener could simply recap a point they missed due to being distracted thus getting more value out of that talk. If the podcast has a “magazine” approach with multiple segments, the listener may choose to head to a particular segment that interests them.

Conclusion

The use of chapter marking within podcasts and other spoken-word audio content could make this kind of content easier to deal with for most listeners. Here, it is more about searching for a particular segment within the podcast or beading back to the start of a significant point therein if you were interrupted so you can hear that point in context.