In reading Joe Dolson’s recent piece on the intersection of AI and accessibility, I absolutely appreciated the skepticism that he has for AI in general as well as for the ways that many have been using it. In fact, I’m very skeptical of AI myself, despite my role at Microsoft as an accessibility innovation strategist who helps run the AI for Accessibility grant program. As with any tool, AI can be used in very constructive, inclusive, and accessible ways; and it can also be used in destructive, exclusive, and harmful ones. And there are a ton of uses somewhere in the mediocre middle as well.
I’d like you to consider this a “yes… and” piece to complement Joe’s post. I’m not trying to refute any of what he’s saying but rather provide some visibility to projects and opportunities where AI can make meaningful differences for people with disabilities. To be clear, I’m not saying that there aren’t real risks or pressing issues with AI that need to be addressed—there are, and we’ve needed to address them, like, yesterday—but I want to take a little time to talk about what’s possible in hopes that we’ll get there one day.
Alternative text
Joe’s piece spends a lot of time talking about computer-vision models generating alternative text. He highlights a ton of valid issues with the current state of things. And while computer-vision models continue to improve in the quality and richness of detail in their descriptions, their results aren’t great. As he rightly points out, the current state of image analysis is pretty poor—especially for certain image types—in large part because current AI systems examine images in isolation rather than within the contexts that they’re in (which is a consequence of having separate “foundation” models for text analysis and image analysis). Today’s models aren’t trained to distinguish between images that are contextually relevant (that should probably have descriptions) and those that are purely decorative (which might not need a description) either. Still, I still think there’s potential in this space.
As Joe mentions, human-in-the-loop authoring of alt text should absolutely be a thing. And if AI can pop in to offer a starting point for alt text—even if that starting point might be a prompt saying What is this BS? That’s not right at all… Let me try to offer a starting point—I think that’s a win.
Taking things a step further, if we can specifically train a model to analyze image usage in context, it could help us more quickly identify which images are likely to be decorative and which ones likely require a description. That will help reinforce which contexts call for image descriptions and it’ll improve authors’ efficiency toward making their pages more accessible.
While complex images—like graphs and charts—are challenging to describe in any sort of succinct way (even for humans), the image example shared in the GPT4 announcement points to an interesting opportunity as well. Let’s suppose that you came across a chart whose description was simply the title of the chart and the kind of visualization it was, such as: Pie chart comparing smartphone usage to feature phone usage among US households making under $30,000 a year. (That would be a pretty awful alt text for a chart since that would tend to leave many questions about the data unanswered, but then again, let’s suppose that that was the description that was in place.) If your browser knew that that image was a pie chart (because an onboard model concluded this), imagine a world where users could ask questions like these about the graphic:
- Do more people use smartphones or feature phones?
- How many more?
- Is there a group of people that don’t fall into either of these buckets?
- How many is that?
Setting aside the realities of large language model (LLM) hallucinations—where a model just makes up plausible-sounding “facts”—for a moment, the opportunity to learn more about images and data in this way could be revolutionary for blind and low-vision folks as well as for people with various forms of color blindness, cognitive disabilities, and so on. It could also be useful in educational contexts to help people who can see these charts, as is, to understand the data in the charts.
Taking things a step further: What if you could ask your browser to simplify a complex chart? What if you could ask it to isolate a single line on a line graph? What if you could ask your browser to transpose the colors of the different lines to work better for form of color blindness you have? What if you could ask it to swap colors for patterns? Given these tools’ chat-based interfaces and our existing ability to manipulate images in today’s AI tools, that seems like a possibility.
Now imagine a purpose-built model that could extract the information from that chart and convert it to another format. For example, perhaps it could turn that pie chart (or better yet, a series of pie charts) into more accessible (and useful) formats, like spreadsheet
Leave a Reply