When considering any commercial video player, clients must ask platform vendors for a statement of conformance with applicable WCAG and/or Section 508 standards. This will help better understand the risks involved with potential use.
Terms
Captions
Captions are intended for individuals who are deaf or hard of hearing however they have applications that benefit everyone. The example I like to give with clients is that televisions in loud areas like bars or airports typically have captions on to allow people to get information despite the noisy environment. Captions can either be open or closed.
Closed captions can have their visibility toggled by users. This is typically done via a [CC] button within player controls.
Open captions are always visible and can't be disabled. They are part of the video.
I've been with organizations that have used 3Play Media for captioning and live transcription and have been impressed with their services.
Subtitles
Subtitles are a translation of the audio into another language. As with Captions, these can also be Open or Closed. These are not an accessibility feature as they are intended to translate rather than interpret.
Audio Description
Audio Description is an audible narration that describes significant visual aspects of the content. The audio description never interrupts dialog and instead spoken within gaps of dialog. The following video is an excellent example of audio description:
Self-described Videos
Self-described videos are those where the information is all conveyed or explained through both the audio and video. This can either be done by narration or it can be part of the dialog.
Narration example: The end of an advertisement video visually displays a website address. The audio track includes a narrator announcing "Go to the address www . some address . com for more information".
Dialog example: A tutorial video where the instructor verbally describes each step as they are performing them.
Transcript
A transcript is a textual alternative to the multimedia content. Transcripts include the spoken dialog as well as descriptions of non-speech information such as sound effects or laughter. Not only is the dialog displayed but it also identifies the speaker each time the speaker changes to ensure users can identify who is speaking.
Audio-only content must provide an equivalent to auditory information. An audio-only podcast, for example, could include a text transcript of the dialog.
Video-only content must provide an alternative that presents an equivalent of the visual information. A short tutorial video with no dialog, for example, must provide a separate audio track or a textual description of the video content.
An audio description track OR a media alternative must be provided for multimedia. If a video is self-described, an audio description is not necessary.
A synchronized spoken description of visual content must be provided with videos. Note: This is not necessary if the video is self-described wherein the information conveyed visually is also conveyed in the dialog or other audio.
Any audio that plays automatically for more than 3 seconds must provide a metho to pause or stop the audio or a method to control the audio volume separate from the system volume. Basically, you must give the user an option to disable the audio. Note: It's never a good idea to auto-play audio. It can be disruptive and distracting and may interfere with assistive technology. For example, a user navigating with a screen reader may not be able to hear how to stop or pause the audio because the audio track may be conflicting with their screen reader output.
Blinking, scrolling, or auto-updating content that (1) starts automatically, (2) lasts more than 5 seconds, and (3) is presented with other content must provide a method for the user to pause, stop, or hide the information. Continuous movement can be distracting to those with cognitive disabilities and attention deficits.
Section 508
The Revised 508 Standards adhere to WCAG level AA conformance. -Source
What do I check for?
Does the multimedia play automatically? ❌
Does the player include controls for playback? Are they accessible with the mouse AND keyboard? Do they provide accessible names and roles? ✅
Are there captions for the audio? ✅
Is all the information conveyed visually also conveyed through the audio? ✅
If the information presented visually is NOT conveyed as a self-described video, is there narration or an audio description track for the missing information? ❌
Is a transcript required?
For audio-only content such as podcasts: Yes
For video-only content with no audio: Yes OR an audio description
For multimedia with video and audio content: No if other requirements are met
If I come across a video and I'm able to gather the same information from audio and video and the content includes captions (that are accurate), I won't require a transcript. That said, a transcript is always going to be beneficial to many users for different reasons. For example, users can style the text with custom stylesheets and parse the information at their own speed when it's represented in text.