What to Include

Focus on what the listener needs to know to understand the video’s message:

  • Who is speaking (names/roles when not already clear in audio)
  • Key actions (what is happening, movement, demonstrations)
  • Setting and scene details when they add context
  • Facial expressions or reactions when they affect meaning
  • Charts/graphics and on-screen text that communicate key information

Avoid describing decorative visuals that do not add meaning.

What Counts as Important Visual Information?

  • What Counts

    Important visual information is anything a viewer must understand in order to follow the message of the video.

    This typically includes:

    • Dates or deadlines
    • Names of buildings or locations
    • Program names or titles
    • Statistics or charts
    • Instructions or calls to action
    • Labels identifying people or speakers
    • Demonstrations of procedures
    • On-screen text containing key information
  • What Does Not Count

    Visual details that are decorative or atmospheric usually do not require description.

    Examples of decorative visuals:

    • Scenic campus footage
    • Crowd shots at events
    • Students walking through campus
    • Background imagery supporting narration

Writing Tips

Use these guidelines to keep audio descriptions clear and useful:

  • Describe what is observable (not assumptions about motives or feelings).
  • Use present tense and describe actions as they happen.
  • Keep it concise and focused on meaning.
  • Use plain language and avoid jargon when possible.
  • Aim for natural pauses in the original audio so the description fits smoothly.
  • If you cannot fit essential information into natural pauses, consider extended audio description (where the video effectively includes additional time to accommodate the description).

Standard and Extended Audio Description

For most social video, standard description (or scripting descriptions into the narration) is the most practical approach.

  • Standard audio description: Fits within natural pauses in existing audio.
  • Extended audio description: Adds time/space for description when the video has few pauses or dense visual content.

Note About Speaker Identification

If speaker identity is important to the story, and is only conveyed visually (e.g., a lower-third name graphic), it must also be available in audio or captions, depending on the accessibility need. 

Note About Using Closed Captions to Help Satisfy Accessibility

When a speaker is identified visually through a lower-third graphic, their name and title should also appear in the closed captions when they first speak so viewers who cannot hear the audio can identify who is talking.

This is especially important in news-style videos where speakers often do not introduce themselves verbally.

Including speaker identification in captions can also help satisfy WCAG 2.1 AA requirements when the identity of the speaker is conveyed visually but not spoken aloud, ensuring that critical information is still communicated in an accessible format.

While captions help students and members of the public who are deaf or hard of hearing, they do not replace audio description for viewers who cannot see the screen; when possible, speakers should still identify themselves verbally.

Be aware that YouTube may strip or ignore speaker labels in SRT caption files, which can remove the identification you intended to include.

If this occurs, consider using a different video player that preserves speaker labels, manually inserting the speaker’s name into the caption text for each line, or using an AI captioning tool to convert the SRT file into a format that remains compliant when uploaded to YouTube. 


Share Your Success and Help Your Peers

Request to join the virtual Social Media Committee by emailing social@umaryland.edu