Commercial AT Product Platforms
Prof Amit Prakash: Good morning everyone, welcome to this third day of EMPOWER 2021. It’s a pleasure to have with us today, Dr. Ajit Narayanan who leads teams at Google that work on accessibility for people with cognitive and motor disabilities. Ajit’s team has worked on several products and features within Google’s platforms Chrome and Android. This includes switch access, voice access, select to speak action docs and more. Before Google Ajit was the co-founder and CEO of ours, a Chennai-based company that pioneered augmentative, and Alternative Communication in India and around the world. Ajit has been recognized as one of the top young innovators worldwide by MIT TR 35. He is a TED speaker and has won the national award for the empowerment of people with disabilities from the President of India. Over to Ajit, for your talk.
Ajit Narayanan: Thank you very much. And thank you to everyone who’s attending today on the last day of the conference. And I’m very glad to be able to have an opportunity to represent Google and talk to you about some of the work that we’re doing at Google in the accessibility and assistive tech space. So I’m going to share my screen now and I will present. Okay, hopefully, you can hear me now. Okay. Yeah, sorry about that. So the objective of the talk is to give you a little bit of an overview of the accessibility work that’s happening at Google. My goal is not so much to showcase individual products or to talk about specific technologies. We did have a session, a couple of days back where some of my colleagues and demonstrated some of the work that is happening in applying machine learning to the area of assistive technology. Rather, I want to give a more of an insight into the strategic and philosophical approach that we take to assistive technology in Google. This is a little different from, you know, the work that I did before Google, and this was a little bit of a revelation to me as well. So I want to share some of some of what I have learned as well. And it welcome questions and interactions as as as we talk about some of these things. So with that, let me get started. This is a little bit of an introduction about myself.
So I lead Google’s research efforts in platform accessibility. I focus on cognitive disabilities and motor impairments, and I’m based in Mountain View in California. For those that are visually impaired, and who are participating in today’s presentation, I can give you a short video description of myself, I’m male Indian wearing glasses with a beard and a mustache and wearing a brown-colored sweater. Before I joined Google, I ran a company in Chennai, we were part of the IIT Madras startup ecosystem. And I was the creator of Avaz. Avaz was an AAC device. And today, it is one of the most most widely used AAC aids in India and around the world for people with speech disabilities. I’ve been working in accessibility for a really long time. So I don’t want to start with this disclaimer. I mean, I am making this presentation in Google’s capacity. But this includes many insights, personal insights that I’ve had in personal opinions as well. So not everything that I’m going to be talking about necessarily represents Google’s official view here.
So there are two things I want to talk about. I’ll give you a brief overview of Google. How Google views accessibility, and then I’ll talk a little bit about the work that my team does, exploring what I think of the next frontiers of accessibility. So Google’s mission statement is that, you know, Google would like Google’s mission is to organize the world’s information, and make it universally accessible and useful. The word accessible is right here in the mission statement. And we do take it very seriously. It has been a very successful investment for Google, you know, having invested in making our products work better for people with disabilities, having invested in using our technology to create assistive technologies, that has led to a significant economic value for Google.
It has been very important for us to do this as we participate in various, you know, government contracts. And as we scale up a lot of our business areas, it’s also led to a significant boost for just how the world views Google. And in that there’s a quote on this slide from a person who says, Seeing these products and how they’re helping people with disabilities shows me that Google is for the people, as a company that makes us, you know, still makes a significant amount of money from directly working with end users. This sort of brand shine that we get is really important as well. But having said that, the people that work in the accessibility teams at Google are very passionate about this space. And we bring more focused attention to this area with a very deep conviction that we need to make our technologies work better for people with disabilities. You know, this is the start all the way at the top This slide shows a quote from our CEO, Sundar Pichai, who says, As long as there are barriers for some, there’s still work to be done. So we do view it as an area for continuous improvement from Google as well.
Now, it could also be a bit about how accessibility engineering is organized within Google. Google has sort of a hub-and-spoke model of accessibility. So I’m part of what’s called the central accessibility team. So you know, we often are the thought leaders within Google for accessibility work, we try to understand various domains of accessibility very deeply. We have a pretty large UX research and UX design team. And we also have a pretty large engineering team. And we work on a number of different products. Some of our work is external facing. Some of our work is internal facing as well. So in addition to bringing out products that directly, are used by end-users, we also educate other people at Google about accessibility, we provide technical and testing support. And they’re often the link the glue between Google and the community of people with disabilities around the world. It’s a really fantastic team. I’ve been at Google for about three years now. And I would highly recommend working at Google in the accessibility domain, it’s been a fantastic experience for me personally.
So I think, you know, I want to jump off here into how I view, accessibility, and Google I, in my opinion, there are three different categories of products that Google makes in the accessibility space. So there are products and this slide shows five examples of these. These are products that use Google’s technologies to make the world more accessible for people with disabilities. We have an app called Lookout that converts what the phone sees through its camera into spoken feedback that describes the world around it. For people who have who are blind or have visual impairments. Euphonia is mentioned here. This is a project that converts dysarthric speech into text. And there’s a parallel project that converts dysarthric speech into speech that is more typical. The Sound Amplifier amplifies sound, speech sounds in conversation so that a person will likely or incivility might be able to hear much better. Live Transcribe provides a transcript of what’s on the what’s being spoken. And Live Relay helps people who are hard of hearing to make phone calls. So these are examples of where the interaction is happening between a person who has a disability and either another person or just the world around them. And technology is intermediating, to make that interaction more effective.
The second category are products that we build, which target developers, so they’re not products that would typically be used by someone who has a disability. But because of these products, developers are able to create accessible products on their own. So accessibility scanner, for example, is an app on Android, that if you run this app, it will continuously check, whatever app, whatever other app you’re developing. And it’ll tell you, it’ll give you a list of various accessibility errors and accessibility bugs that you might then want to go and fix. The Play Store pre launch report is a set of accessibility checks that are built into the Google Play Store. So whenever a developer uploads an app into the Google Play store, it’s able to evaluate, and it’s able to give you a report of any accessibility issues that might be present there. And then material design. This is Google’s design language. And this contains accessibility at the heart of it. Every component in the Material Design catalog, incorporates accessibility and best practices so accessible. So this is a really important category.
But the third category in this category, you can see that, you know, there are actually a number of different products on this. On this slide, I’m not going to go through all of them, there are about 10 of them that have shown here and that actually even more in this portfolio. But these are the these are the products and features that make Google’s platforms more accessible. So Google has a large number of very successful products. You know, I think about 10 products, as of today that have more than a billion users. You know, these are products like Gmail and search and assistant and photos. What this set of features enabled is it makes these products, these mainstream products products that are meant for every person in the world. It makes them much easier to use, it makes them accessible for people that have disabilities.
You know, this includes screen readers like talkback, it includes the ability to control your Android device with a switch. It includes apps that make a phone or a browser more accessible for someone with a cognitive disability. It includes captions within YouTube within Google meet it so it takes in an entire, an entire array of different products. I mentioned this hub and spoke model that Google accessibility is organized as and so the central accessibility team does the thought leadership, but really every product team within Google, whether it’s Drive, or YouTube, or the Play Store, Android, they have their own accessibility engineers, their own accessibility focused designers and researchers. And if you want a full, if you want more details about all the different accessibility features google.com/accessibility is where you want to go but I want to talk a little bit about why there were so many more icons on the slide that spoke about Google’s internal, sort of platform accessibility features, and this is also the area that I lead within research.
And the insight is that there are many, many activities that used to be performed in person, you know, even 15 years ago, that are today being performed on digital surfaces, right? I think the pandemic has made this extremely, extremely, it has brought a very sharp contrast here. But this was a trend that was anyway happening, whether it was access to education, access to employment, you know, people working from home, people being able to call a cab using their phone, people doing much social interaction today, through the medium of a mobile phone instead of face to face. The fact that many of these actions are being now moved to digital services means that if we make these digital services accessible, this is an enormous opportunity for inclusion of people with disabilities when you have to access education, so the physical channel, if you have to go to a classroom, and you have to sit in a classroom, there are many other factors unrelated to the actual imparting of knowledge that might inhibit the access to education that a person might have, they may not have accessible transportation, the school building itself may not be accessible. If the teacher is writing on on a blackboard, a personal visual impairment may not be able to see the school may not have the infrastructure to be able to provide sign language interpreters.
So it’s not just the last mile and becomes inaccessible, it’s any of the steps in the previous and whole journey that could create an accessibility barrier. So if we move this to the digital domain, it doesn’t make the problem go away. In fact, in some cases, making things digital can actually accentuate accessibility barriers and can make things even harder. But it’s an opportunity where if we’re able to get it right, it can be incredibly empowering. The thing that’s exciting about the opportunity for Google here is that we, you know, we are the stewards of these two platforms, with each reach nearly half of the world’s population, Chrome, which is the world’s most popular browser, and Android, which is the world’s most popular operating system. And if we are able to make these platforms accessible, not just for our own content, not just for our own Apps, Google, but also for all of the other content providers, all of the other developers who are creating content that’s being accessed through these platforms. I think these represent really our longest levers for impact. So this is really what my presentation is going to be mostly about achieving accessibility on digital surfaces, especially on Google surfaces, which is Android and Chrome, across all disabilities. So now we go into the second half of the presentation. And this is a little more, I think, philosophical, and maybe a little more scientific speculation. So so bear with me while I share some of my own ideas here.
So there are so many different accessibility domains. And on this slide I, you know, I’ve got icons that represent four of those accessibility personas, blind or low vision, deaf or hard of hearing, motor mobility impairments, cognitive and learning disabilities. So there are so many different accessibility personas. And it’s not always obvious how we can make a phone or you know how we can make the Internet accessible to people that have these sorts of disabilities. Some of these have been studied over several years so we can teach kids for example, in the blind as well. In space, we do have screen readers, we have support for Braille support for large print. Similarly, in the area of learning disabilities or dyslexia, people do use text to speech pretty extensively. I think there’s there’s quite a lot of opportunity for research-driven innovation in any of these spaces as well. For example, a screen reader works by reading out the content that’s on the screen. And if the screen is well annotated, you can, you can access all of the information that is shown on the screen through the audio medium. However, this does increase the amount of working memory that a user needs to bring in order to be able to absorb information and to be able to find information that they want to. So it comes at a cost. Similarly, if someone is doing switch scanning, if they’re using a single switch, someone with cerebral palsy, for example, that that’s an incredible drain on someone’s time, it’s much slower, and therefore also requires a greater amount of cognitive burden. TTS is great, you know, as a data computation, but there are many affordances that reading has, you can quickly go back and forth, for example, if you’re reading, which is not that easy to do with TTS, and we don’t even understand how to build accessibility for people that might have conditions like dementia, if someone has difficulty learning if someone has difficulty with memory, how does how can you even achieve accessibility.
So this is actually the core focus of my team at Google, where we’re trying to understand new forms of accessibility. But you find a lot of success, I think, historically, in driving accessibility through this one approach, and that approach is creating accessibility guidelines. And these guidelines are not necessarily created by any individual company, they are created by standard bodies like the W3C has created the web content accessibility guide, also called WCAG. Or we can, and, you know, these, these, these are standards for how to make your website more accessible. And because you make your website more accessible, the idea is that someone with visual impairment or someone with a different disability might be able to still access the content on these websites, and equivalently on apps as well. And these are codified as legislation in several countries. So these accessibility guidelines actually play an incredible role. I mean, Google is one company, but there are, you know, millions of companies that are creating software products. And these standards actually create a play a huge role in driving accessibility. So if you introspect about these guidelines a little more, what is it about accessibility guidelines that actually enable accessibility? I think there are three categories of guidelines, right. And in this slide, I call them don’ts, settings, and services.
Let me talk about each of these in very brief. So when I say don’t, what I mean is inclusive design. And so when you have a guideline, for example, in the context of visual impairment, you might have a guideline that says you always have to ensure that you have a minimum contrast ratio of one to 4.5. This is actually the contrast that is enforced on Android. The reason you do this is that without that much degradation for you know, a user designer or whatever, you’re still you’re able to prevent something from becoming inaccessible for a person that that that might have a visual impairment. So this is just enforcing a certain design pattern to prevent inadvertent inaccessibility.
The second category of guidelines is settings, right. So for example, if you say, always make sure that text displays correctly, even if the user has selected large fonts, so you make sure that you know, the text never clips or the cover crops, that’s basically giving a little bit of control to the user in adjusting their settings or adjusting their controls to make their device more accessible.
And the third category of guidelines is the guidelines that ensure interoperability with things like screen readers. So a screen reader is a completely separate product. It’s not part of you know, your mail app, it’s not part of your game that you play, or your calendar app or whatever, right? It’s a different app, or a different accessibility service, that that’s able to interact with the app that you’re actually using, and makes that app accessible. So a status standard or guideline in this particular domain might be to ensure that every single element on your, on your screen, or in your UI has a content descriptor. So there are three major labels that eat them. So you see that sort of you have these three different buckets, you know, contrast on the one hand as an inclusive package. There is support for settings like large fonts as a control that users can tweak and content descriptors as a service that enables accessibility.
So it does require innovation, I think on all of these three fronts to be able to support a wide range of disabilities. So we don’t need to do user research to find out where accessibility pitfalls are. So for example, on my team, we do conduct research with people with dementia to see when they use a phone, what are the areas that trip them up? Why can’t they complete certain tasks? When we think about settings, we think about what sort of UX designs can we create? What sort of alternative interfaces what sort of pickable controls can we create within our UX that avoids pitfalls for users? And services is mostly Engineering and Engineering Innovation? Can we use technology to transform interfaces into different modalities, we convert visual interfaces in the auditory interfaces, can we go the other way around, and therefore work across the accessibility barrier? So you know, I work on cognitive disabilities. And here’s an example of how these three categories might apply in the area of dyslexia and ADHD, inclusive design, you might want to enforce keeping texts, you might want to advise developers to keep text at grade seven reading level or lower. This is actually part of the WCC standards for Cognitive Accessibility. For settings, you might say, Okay, well, someone with ADHD, maybe, maybe, you know, they gotta have a setting in the browser that converts their content into a grayscale palette. So you know, maybe the setting guideline you have is to make sure that your UI works with a grayscale palette. And in terms of accessibility services, maybe a reader mode might help simplify text presentation. So you label, you asked the developer to label nonessential content so that they’re able to, they’re able to interoperate with a reader mode, that kind of presence in the browser.
So that’s how you, you know, we think about innovation, that, that takes accessibility to a vast spectrum of disabilities. You know if we want to take this to an entire continuum, not just a how people are bucketed into inhibition, or hearing or cognitive disabilities, but really adapting this to every individual that might have unique ways of, of accessing information, unique preferences of interacting with devices, I think there is some formalization of how users and devices interact, that might help us put a little bit of mathematical structure and put a little bit of rigor to designing for accessibility. And this is an important part of the research that we do. So I am sort of leaving you with this final slide. I think the ultimate vision here is really personalization. So imagine if you’re able to understand a user quite deeply, this could be information that they provide, this could be patterns of use, things that they find easy to use, difficult to use, you then combine that with multiple UI is provided by developers are multiple, multiple annotations that developers provide. Plus services are created by engineers, either working at companies like Google, Microsoft, and Apple or even by third parties. These are services like screen readers. And he also combined it with settings and knobs, which users can tweak on their own. And maybe it’s a combination of all of these different things, results in a user experience that has input-output modalities that are very specifically tuned for one individual user, right. So it’s not that every line user has the same user experience. It could be personalized, a lot more accurately than that. And I think that is the end goal of what accessibility might mean. And that’s the vision that you know, we’re trying to work towards in the long term. So with that, I’ll stop here. And I’m happy to take a couple of questions.
Prof Amit: There’s a question in the chat from Rahim.
Ajit Narayanan: Thanks for the question. I feel like his question is, what are your thoughts on a purely sound based UI and UX? This is a great question, I think, a question of immediate and urgent commercial interest to many of the large companies in the world. And maybe the, you know, the thing that enables or that makes this particular question as very important is the emergence of things like smart speakers, and smart year words, right? You have Google’s pixel buds for example. And you have Google’s you know, speakers, they have nest Mini,, the Google Home and Google Home Mini. So we do want the early audio based UI and UX for these sorts of devices. I think it’s possible. One project that I am currently, you know, deeply involved in, and and particularly, very intellectually stimulated by is, how can you make learning materials? So, for example, how can you make learning materials accessible through an audio medium? So many of us have probably listened to audio books, many of us are probably, you know, experience novels and things like that to audio. However, I don’t think too many of us actually experienced reading a textbook through the audio medium. And that’s often because there are several, there are several affordances that are provided by the reading medium that are not necessarily they don’t necessarily have an audio equivalent. For example, you can you can, you can, you can move your eye quickly back and forth between different parts of a page.There’s no equivalent to that, right. So I think those are very important and interesting challenges. And the good thing about them is that there’s also a very direct commercial application for it.
Akila: Hi Ajay. Thanks for the great talk. Enjoyed it. And yeah, I love Awaz and free speech. And it’s very inspiring to see you apply the learnings with Google Now. So my question is, I think this is a similar question from Manohar in the chat box. But I was asking from a different perspective, like, smartphones are ubiquitous know, like you said, half the world’s population has it. But other than some features, like for example, closed captions, which is like, obviously available on top of features, right? It’s like here, live transcript, it’s near the main features, I can see it. But other features like for example, Switch Control, these require training on a screen reader, not It’s not intuitive. People need to be trained. And so not many know about it, like everyone has this. It’s frustrating, right? Everyone has this in their hand. But they’re not. They don’t know about it, and they’re not using it. So how, what do you think about that?
Ajit Narayanan: And this is especially true for Android, because Android is a you know, it’s it’s available for free from Google. So everyone that has an Android device actually has many advanced features. There is a there is a need for training, there is a need for an ecosystem that supports skilling of people with disabilities to use assistive technology better, you’re absolutely right Akila and I know that you have also been at the forefront of thinking about some of these things over the last several years, we have a lot more work to do. And a lot of global inequality in how that kind of trading is actually is taken to the end recipient. So I agree that it’s necessary, and I think we have a long way to go.
Prof Ait Prakash: So thank you very much Saqib for joining us today. Saqib Shaikh is at Microsoft, where he leads teams of engineers to blend emerging technologies with natural user experiences, to empower people with disabilities to achieve more, and thus to create a more inclusive world for all. The Seeing AI project that he leads, enables someone who is visually impaired to hold up their phone and hear more about the text people and objects in the surroundings. It has won multiple awards and has been called life changing by users. Saqib has demonstrated his work to the UK Prime Minister and to the House of Lords. Saqib holds a BSc in computer science, graduating at the top of his class and an MSc in Artificial Intelligence, he has been recognized by the British British Computer Society as the young it practitioner player, thank you very much for joining Saqib.
Saqib Shaikh: Well, thank you very much for the introduction. It’s truly a pleasure to be here. As you said, I work at Microsoft, where I lead a team working on using emerging technologies to empower people with disabilities. And this is close to my own heart. I’ve I, I am blind myself. And I’ve worked for many years in the technology industry, doing mainstream products across Microsoft, a wide variety, but some years back, I really started to think, what is the thing that I really want to work on? And as someone who’s blind, and who has studied artificial intelligence, it was this personal goal of mine, how can we enable people of all abilities to do more, and I think everyone here knows, technology has this great ability to level the playing field. So often, you might see someone with a disability who, you know, has changed, there is some barrier, but they have so much capability. And the power that we have in our hands of technology is disability to break down those barriers to be able to say that, okay, through some assistive technology, we can level the playing field. And for sure, there’s so much further to go. But this is something that’s stayed with me through my career through my videos, since I was young, that technology has this ability to improve people’s lives to close the gap between what someone is capable of doing, and what they are able to do here now. So that’s sort of my background, my inspiration, I been a Microsoft for many, many years now. And I got the chance to start working on AI for people with disabilities. And, in particular, the seeing AI project is a mobile app available on iPhone today, which helps someone who’s blind know more about who and what is around them. As you hold up your camera, we analyze the world around you. And we describe text people objects around you. So if the slides are presenting, we can see a list of the different sort of features of this app. So I’m going to go through the things that, you know, the app can do today, a bit about the future. And then where we hope to take this technology beyond that.
So what is this Seeing AI app. In many ways, I consider this a conversation between the blind community and scientists and Microsoft and in academia. What I mean by this is, we went and talked to the community, and actively through our user researchers through just phone calls, interviews, but also through blogs, podcasts, conferences. So that’s the one side of the conversation. But the other side is with scientists, both a mix of research, but then also universities around the world. And thinking, what is the problems that people are facing? What is the technology that we have available to us today? And how do we close that gap to create these brand new solutions?
So for example, reading we realized was one of the big requests people had everything from just very quickly scanning text without having to do a full scan through to having very high quality OCR that will take some time. But then you need to help lining the PHONER. Because we found people might not be familiar with using a camera. So we give guidance to be able to line up a piece of paper. These were just some of the early learnings we had, that it is not just the AI and then merging tech, but it’s also the user experience. How do you enable someone to line up the phone give them real time guidance. An example of this was we thought great will recognize products based on their barcode. We did deals with a number of the large companies and we crawl the web to be able to get product information. But then we found wait a barcode is really hard to find on a package. So then, we created a machine learning model, which will detect how close you are to the barcode and play audio cues to guide you to line things up. And in that conversation with this community, you know, in the beginning, we had face recognition, you can recognize the people around you, for example, we heard from a teacher who has a Braille display connected to his phone. And now he silently here’s what so suddenly reads the names of people entering the classroom. And that’s just really empowering. But then, we also heard from users who were using face recognition to recognize the faces on banknotes, which I thought was very creative. But then that was the prompt for us, okay, we’re gonna create custom image recognition for recognizing currencies, not something we planned on doing. We started recognizing documents and headphone consumers wanted to be able to recognize handwriting, for example, to read greeting cards, or to recognize child’s homework. And so that is the feature that we brought on, and we’re expanding on as well.
There’s so much more that I could talk about. Because really, when you hold up your phone, you want to know about everything in the world. That’s not really practical or possible. So we really just taking these slices one at a time, and trying to create these solutions and seeing Yes, one of the more recent ones, was helping someone relive their memories. So you can hold up your phone and know what’s in front of you right now. But what about the memories that are in your photo gallery, we can recognize images that are shared from other apps like WhatsApp or Twitter, but also from your photo gallery will let you scroll through and hear the descriptions as you go. And the technology for captioning images is really quite remarkable right now, when we started this a few years ago, it took many seconds. Now you can run this in a fraction of a second. And the descriptions will include not only the people but their activities and colors. And you know, quite some detail. And we took this one step further by saying how do you then get more information about the photo. So we allow someone to run their finger over flat glass touchscreen, and feel the different elements of the photo, that here’s this person, here’s my mom, here’s my Dad, this is how they’re laid out. This is the food on the table. And that’s just by running your finger over and haptics will let you know as you enter an exit the boundaries and audio cues as well. So again, I’m just going through some of the examples that we’ve worked on recently, where we’re talking to the consumers to the customers, understanding this challenges, and bringing machine learning AI and other technologies to bear.
So I’m going to pause for a moment and we can play a video from one customer about how they incorporate singer in their life
I love the video, just because it really humanizes the impact of what we do in the lab. And that takes me on to some of the latest research and work that our group is doing around audio augmented reality. So okay, we can do image recognition, we can describe what’s in front of you. But our latest work is on taking 2D into 3D. Powered by technologies such as LiDAR, which can tell you how far away surfaces are and augmented reality, and spatial audio. We’ve been creating new experiences to understand the world around you to build up this model of the world, and then help someone with a disability, in this case, someone who’s blind, query that world, interrogate and interact with that world. So let’s give you some examples that were already shipped. But there’s a lot more coming here. As we think about the 3D model of the world, we let someone get a spatial summary of all the things around them. For example, as you point to the phone, it will describe things that can see emanate with the audio emanating from that location. When it says chair, you’ll hear the word chair coming from the chair, which is really quite a new experience when you have the headphones on doing head tracking and really quite immersive. So just got a short clip to show what can we be like, but again, the audio is not going to be special in this case.
So that is just a short clip in the office where we’re panning across the lobby, it’s describing the different things you can see. And you’ll see that there’s visuals, but then also the audio is spatial coming from those objects. So another example of this is what if we want to put an audio beacon on something. So if someone wants to find the door, how to do we, they could pan around using 2D with 3D, they could get a summary of the room to understand this unfamiliar environment. But we also built a feature that would let you place this audio beacon, and you’ll hear the sound coming from the door. So as you move, you can go towards the door, know how far it is in which direction. So let’s just take a look at that.
That’s just a short clip. But you’re sort of seeing that someone places the beacon. As you turn left and right, you’ll hear it moving from left to right. And as you get closer, it’ll get louder. So we’re doing a lot of more work in this area, to try and think once you’ve got a model of the world. How do you build on that? How do you let people navigate through the world understand the world interact with the world? How do you we can already recognize who’s in the world again in 3D space. But how do we enable more natural interactions, and empower the customer to empower someone who’s blind to be in the driver’s seat, really, there’s so many situations like in a conference room, you don’t know who’s around you or waiting for someone in Lobby, can we actually give them the autonomy to be the person not waiting for someone to approach them, for example, but to know what’s around them and to interact, be the first person to interact. So I could talk so much more about the ideas potential for this world channel. But that’s a sort of a glimpse at what we have already and where we might be going.
And, and really, it’s combining audio spatial audio with haptics, for example, you can feel how far away things are, because we can sense that with the LIDAR. So these are just some ideas. And our customers have been giving us, you know, a lot more days, but what they are looking for, and we’re working on building that into the product in the upcoming months. But we want to maybe flip a bit to the next section of a bigger vision. So where is all this going to be this is fine. This is an app, this is stuff we’ve built for the blind community. Now, for me, though, the big picture vision is something I call assistive agents. This is this idea that each and every one of us is different. And that could be people with disabilities. But it’s really everyone. When you have a billion people in the world with disabilities, it’s really not an issue anymore. Everyone is different. And how do we get these little agencies, I imagine that to disintermediate, the world to close a gap between what you’re capable of doing and what you’re able to do in this environment. So for myself, someone who’s blind, I can imagine a wearable that lets me know who and what is around me like a sighted guide, personalized to my interest, personalized to my capabilities. Oh, there’s a new shop over there. It wasn’t there yesterday, must have just opened up. Or, Oh, look, there’s a notice that looks interesting in that window over there, all your friends walking towards you. But for someone who’s hard of hearing, they might choose a different output medium, they might choose a heads up display. And they might want different information, for example, live transcription of speech, or visualizations of audio cues, but also just someone with different types of neurodiversity. Someone with not able to speak a particular language, or who wants to help remembering something. There’s a whole range of things that AI can do today, and more it will be able to do tomorrow. And I think if we can personalize those capabilities to each of our needs, this idea of assisted agents is really powerful. This idea that the agent will be able to detect the world, understand the world understand us, and present us with the information we want, when we need it, while keeping the human in control. And I must close with that. You know, the key point that at the end of the day, all of this technology should never be technology for its own sake. When computers and humans come together, the combination is so much more powerful. So it’s all about enabling and empowering. And you’ve seen a little bit about what we’ve done, where we’re going with this world and standing with these agents.
And really, as we work with our customers around the world, across different disabilities, across the real world, but then also the virtual world in VR, or the onscreen world, through different apps on your phone on your computer. This idea of this personalized agent who’s on your side, leveling the playing field, is really what gets me up every day, and something I look forward to working on in the years to come. So we just have a few minutes, and I’d love to open it up for conversation discussion. And I also I should just say that also our team is hiring. So we’re hiring in Hyderabad and Bangalore. So please reach out. My email address is up. So yeah, reach out if you have any questions or want to collaborate, or want to join the team. Thank you.
Prof Amit: Thank you. Thank you very much soccer. I think the hiring offer is something that would definitely give music to many people. Thank you very much wonderful work here. Been doing? And I’ll open it up to questions. Rahila raised your hand please.
Raheel: Good morning Saqib, sir. It was wonderful having to listen to the amazing technology you guys are working on. So my question is, what is the future of Braille? Will Braille be continued, Braille based UI, Braille based user experiences? Or will it be replaced eventually by AI and audio based experiences? Should we head in that direction of replacing Braille?
Saqib Shaikh: Absolutely not. I think Braille has its place. And I hope that Braille technology keeps advancing. So, you know, in this personalized agent world, I really see that the AI will understand all the things around you understand what you need to know when you need to know it. But for some people, Braille will be the output modality of choice, maybe someone who is deaf blind, or there’s a lot of research that just shows in education, it’s really, really critical that people learn literacy, through Braille above speech, again, this is a matter of opinion. So I could talk so much about design but Braille, my hope is head stable, then we should have other options, as well.
Prof Amit: There was a question on language from Nibin: How do you ensure language needs are taken care of, especially in countries like India, various other countries, there are different languages, different dialects, different ways in which the same language is spoken? How do you bring that into your technologies?
Saqib Shaikh: Yeah, languages, and then also cultural differences. So maybe I’ll touch on the second first, and also go to language. So I think cultural differences are equally important. So that’s where the personalization comes in. Because if you say, a particular object is, let’s just say, a chair, a small plant, tree, whatever you go for, it’s gonna look very different in different parts of the world. Now, when I talk about this, one of my favorite images, is, you know, there are some parts, well, we might have a toy car, that’s a more typical toy car, but then you have a toy car made out of wood and scrap pieces from another part of the world. And it’s really how do you get computers not to generalize. And this also applies to languages. Question is with on the other dialects is, how do we get the training data? How do we cope with having less available internet connection? You know, we, at the moment, do not collect any customer data, but could we get the customers to donate them their data to enable us to support more languages? So these are just some of the factors that we consider as, as we continue? And it’s definitely an important part as we scale out this technology to more and more people in the world.
Thank you. Ajit, would you like to add anything?
Mr. Ajit: hit the nail on the head. You know, I think certainly Google has a very intuitive approach to language is that we do try to make all of our products and features available in all of the languages that our users would like to see that it has become a lot more challenging as languages. So it used to be a question of localization of replacing strings, essentially. So you’d have tables of strings that you could then convert into other languages. So it was a very static problem. As soon as you know, we start applying machine learning and doing things like generating language, understanding language, and things like that. The question of training data, the question of collecting enough high quality data for these purposes, and therefore also controlling for bias and controlling for safety and trust to become important. Some of these things are addressed by things like what Saqib described having people donate data and stuff like that. But there are also machine learning advances that are now being discovered in the areas of natural language processing or speech processing that allows us to develop really good systems on low resource languages as well. I don’t consider Indian languages Indian, or at least the major Indian languages to be lowly sources, I do think we have very sophisticated modules for many of these languages. And I think it’s only a question of, you know, time and maybe not even that much of time before we start seeing all of these technologies available in major Indian languages,
Prof Amit: Thanks Saqib and Ajit, I’m sure Hindi, Tamil Telugu would be taken care of. But then the question is KKhundi, and Santhali and all that where the probably even don’t have a script. I guess that would be great. It would be great to see all these applications going to those places. Professor Bala has raised his hand, please.
Saqib: You can talk something about you the audio beacons, like how do they work? And like, is the user or triggers them? Or what is the implication in terms of other people who can hear it? And do they get disturbed?
Saqib Shaikh: Yes, every now, the only beacons require wearing headphones, so you get spatial audio. And of course, we recommend headphones that do not cover your ears, for example, bone conduction, or some of the speakers that go near your ears. But with those, then you’re the only one who can hear them. And using augmented reality, we can pin the sound to a point in 3D space. So we know where we saw the door, as you know, an XYZ coordinate, and we can actually make the sound emanate from there in terms of the stereo balance and the using an HRTF head relay function, and so forth.
It turns on devices with the LIDAR. So again, we’re using the LIDAR to now have more precise measurements of how far away different things are.
Prof Amit: Thank you, Akila has a question, I think for Saqib. Would you want to pose the question?
Akila: I was about when using technologies like seeing AI increase environmental awareness. That may be information overload, right, like, some piece of information, which is more important, may get lost in. In other things, like when I’m walking on the road, with seeing AI, there are the things that I that are more important to me like an approaching car, rather than some, some something else like object identification. So is there a way that the app works to deal with that? Something at least, like object identification. But how would the user know where he is?
Saqib Shaikh: I think for the most part, so I think you’re very much going to the navigation round. But a lot of our users, we here are interested in a specific task. So we use this metaphor changing channels on the TV.
So yeah, so someone can tell us if we just did a photo or video stream, and try to guess using AI what was important in this image, it might not be what’s important to the user. Do you want to know who the person is or what it says on the t-shirt or what color the t-shirt is. So instead, we have a blind person who can use the phone using a screen reader. So through that, they are able to select the thing of interest, whether it’s identifying the person or reading the text, or just knowing the object.
Prof Amit: Any more questions for a good first happy by No, here? We are eating into the break time, but I think unless Saqib and Ajit have any other pressing commitments? Maybe a question or two can stay.
I have one question if you would allow me to. And that is when we’re looking at AI and the way AI is being practiced today. It requires a lot of data and it requires a lot of data concentrated in big corporations like yours. So unless you have that kind of data, and unless you have that kind of training data will probably not be as accurate. How do we ensure the insecurity that comes to a user, because of this concentration of data, I know there are these regulations coming in. But is there a way we can still do AI. And we can still do it for localized personalized usages. Without the data getting concentrated into big tech operations.
Saqib Shaikh: I think of this in two centers, you’ve got big data, and personalized models. So in terms of the big data sets that you typically have needed for machine learning, two key points here are, I am a big fan of open datasets. But you also need diversity of the data to make sure that it is representative of people’s disabilities, because they’re all the standard datasets from academia, like ImageNet, or Coco, but they’re not real world. They’re not from the perspective of someone who’s blind, they’re not of the things someone wouldn’t want. So you really need diversity and true representation in the big datasets. And I do passionately believe that open data is a good thing to push the state of the art forward. But then what if you didn’t need the big datasets? That’s the other part of this. And we work with researchers on the field of meta-learning, where you can start with a big model trained on lots of data, but then you can personalize it to the things of interest to you without sending any data to the cloud, or, you know, doing it just on the device. So that’s an active area of research that we’re working with scientists on.
Ajit: I would echo the same thing, in terms of access to data being very important. But also, I mean, I, at least from my vantage, and I do see a trend towards more of on device machine learning. So if you look at, you know, Google’s equivalent screenreader app which is called Lookout, that is on-device, you know, the data that doesn’t delete it from the user doesn’t leave the device, live transcribe, again, that’s entirely on device. And so ASR models, TTS models, handwriting recognition, all of these things are now small enough that they can run out to use this device. There’s an entirely new class of algorithms that are now being investigated for these sorts of things, right? If you think about federated learning, we think about ways that we can train models in a distributed fashion without necessarily having access to sensitive data. I think those things are going to become mainstream even as early as next two or three years. So the field itself is changing. And maybe this question will adapt over the next couple of years, as we get more familiar with these various nuances.
Prof Amit: Thank you. We have two hands raised, maybe quick questions Nibin and Aarti.
Nibin: Thank you so much. My question is to Saqib. So I’m a user of this thing from last one year. It seems really, really giving a very good user experience in terms of very critical situations. And one of the question was, now is definitely it’s available only in the iPhone and iOS platform. So when we consider a country like India where Android is the dominating operating system. Why are we still waiting? I mean, what is there a challenge to replicate the same thing on Android? And I would like to get the answer for this.
Saqib Shaikh: Thank you. And also feel free to, you know, email for follow-ups. But ultimately, it’s been prioritization, we have a team of researchers, we’re trying to push the state of the art forwards. And it’s always a case of do that on one platform or two. And we’re always reevaluating that, you know, can we go deeper on one platform or broader for two? And, you know, we are on one right now, but that doesn’t mean that’s always the case.
Arti: Hi Saqib. Hi. Thanks. Interesting talk. Just had a question similar to what Nathan had asked. Like, the products are like we are moving to user context, just want to understand when like, because this product is being used by different countries specific to India with particular use cases or interactions that you had to consider which is only specific to themselves in India. So the product had to evolve to accommodate these different behaviors from user space to India.
Saqib Shaikh: Yeah, that’s something we’re continuing to be sort of hearing from the users and then also doing the research. And we’re finding that without going into specifics there, many of the features are just common. But then certain aspects of life also vary. And it’s something we’re continuing to look at. We discussed previously known languages and so forth. But I’m very excited about this idea of how do you train the system to do better in more diverse environments. But then also, there are potentially brand new areas where there are tasks and challenges which are unique to certain parts of the world. And in many ways, it’s even more exciting to think how do we bring technology to bear on solving some of those problems? So it’s a really good point and something that’s definitely on our radar.
Ajit: Yeah, we see some, some some some some interesting things happening in India that, frankly, we don’t completely understand. In terms of some of our products, for example, with live transcribe, that is one of our most popular accessibility apps. Number two language after English is Malayalam. And I don’t we really don’t know why it became that popular in Kerala. I know there were a few. Maybe some YouTubers picked it up or something like that, but worldwide Malayalam it turns out that we had to optimize for that. So we learn all kinds of interesting things from the Indian context.
Prof Amit: So thank you. Thank you very much, Saqib and I hit for joining us today and great to know about your works all the best. And we both all of you would continue to support us here at Empower, as we call. Thank you. Thank you very much.