Wearable & IoT Technologies in AT
Anil Prabhakar: And I take the opportunity to welcome our next speaker. This is a session on IoT and wearables for assistive technology and I am thrilled to have Dr. Arun Jayaraman joining us here from Chicago.He’s actually at the Northwestern University’s Feinberg School of Medicine as a faculty there. But he’s also the executive director of the technology and innovation hub at the ability lab. So lots of work, and I’m sure he’s going to tell us a lot about his work on prosthetics and rehabilitation. But more importantly, it’s also about wearable sensors and machine learning and its impacts on rehabilitation medicine. So welcome, Dr. Jayaraman. Thank you for joining us at this very early art for you and we look forward to your talk.
Dr. Arun Jayaraman: Thank you so much for letting me be part of this really nice event. And apologize if I look or sound a little bewildered? Because I’m trying to set up this system quickly in my house and even though you would expect most people are working from home I’ve been working at work through the pandemic. So for me to set it up at home was a unique experience and I kind of enjoyed it.
So today I will focus on wearable sensors, smartphones and machine learning an impact of Rehabilitation Medicine. My name is Arun Jayaraman. As already mentioned, I’m a researcher at the Shirley Ryan AbilityLab, which is a large rehabilitation hospital inside Northwestern University’s medical school. So I’ll try to focus on areas of how we can use sensors which is now become ubiquitous. If you notice most of us, I would say a lot of us tend to have some kind of wearable on our wrist or even use our own smartphone to monitor so we’re kind of stuck to these technologies and we obsess over them a lot. And how can we use that as part of a clinical care model is what I’m trying to pitch today?
So if you look at current healthcare systems, whether you want to try new interventions, or you want to be part of a clinical trial, or you want to know when to discharge a patient or send them home or home or monitor them home we struggle with the standard issues we have infrequent assessments. We don’t really consistently take them and so we don’t know clinical assessments. We don’t know if our patients or colleagues are doing fine or are they getting worse or better and titrating care models have been tough. And a lot of times we use performance based measures, which were trained on a specific population and we’re testing it on a different population and a lot of times have nominal ordinal scores which struggled with ceiling effects and things like that. So progressions very destroyed lack of sensitivity. There’s also bias from the patient and the clinician because we’re so involved in the care that we think, oh, he’s gotten better, she’s gotten better or when we ask them a lot of times our patients don’t want to hurt our feelings because they know we’re so involved with them and they say, oh, yeah, I feel great, even though they might not. So we would love at least a lot of us would love as clinicians and researchers to get unobtrusive real-time data for patients naturalistic reward behavior, especially if you’re looking at rehabilitation technologies, assistive or adaptive technologies. We don’t know if a process fits them. Well. It looks great when you put it on a clinic but in real life use are they feeling better? Is a wheelchair working fine. Should we need modifications? So continuous monitoring when should a Parkinson’s patient come in to tune their deep brain stimulator or change the dosage of their Adobo drug? So questions like this are critical. So what a future vision and what we’re already attempting at hospital sis that when a patient gets admitted to the hospital. We choose a certain selective sense number of sensors. I’m not saying sensor is the whole body, a few of the variables and then we can in real time monitor physiological information like heart rate, blood pressure, EKG and then use the other sensors to look at muscle activity, activity recognition. Are they sleeping, are they walking, how’s the quality of their gait heart movement and things like that? Provide a patient dashboard for the clinicians and researchers to look in real time with progression and also track them in human community and this will inform clinical care.
So it’s not that easy. If you’ve noticed that you can just use a sensor and it works magically. If you all have tried this at home and if you have a wearable, I highly recommend you try it. The easiest one is to just sit on a couch while you watch TV. Watch your favorite show and then shake the sensor in your hand you’ll see that it count starts counting steps. So a lot of these sensors use what is known as predicate custom algorithms which has been trained on sudden acceleration profiles, and it’s very easy to cheat. You’ll feel really great about it but it doesn’t mean you’re actually doing the work. So something to think about is if you want to do a higher resolution sensor monitoring in a clinical setting which is why even if you look at the Apple watch or a Fitbit and a lot of them they haven’t gotten FDA approval for clinical diagnostics or clinical monitoring. It’s always human community and fun recreational monitoring because they don’t want to go through the process of being stringent in their algorithm detection
So extracting useful information from sensors is not as simple as just placing a plug and play. You got to put a little more effect if you don’t want to under predict or predict clinical diagnosis. So some of the practical considerations for their students and researchers and clinicians who want to use it is you need to know what the target outcome is. You want to understand device configuration. Where do you place the sensor? You got to cross-validate against gold standard metrics. And then you need to know how you collect the data and the clinical application and also look at alternative tools. We don’t have to always use this and I’ll try to touch on all these topics pretty quickly. And if I’m getting slow or I’m delaying this, please give me some warning to know that I need to speed up.
So let’s start with the target outcome. So make it in simple three-word languages.For everybody to kind of think of this question or this problem. I for my purposes, I always call it automation, elimination, and prediction. So what is automation and we’ll touch on what is elimination and what is prediction. And that’s how I usually see as a 10 sensor whether you take a smartwatch or take a phone or any kind of sensor technology to monitor I kind of put it into it’s a little overarching but it kind of helps at least in the field of rehabilitation to keep things in perspective.
So automated is automating the signals to create validated marks to standardize clinical outcome measures. So let’s say you really want to know how fast your patients working, what their balances are, how many ectopic beats they’re throwing. You want to automate this which means a simple sensor, a single lead EKG sensor should give you the same resolution with a five lead EKG Holter monitor or something like that. So you’re automating this using different methods signal passing a missionary, but what you will then be able to do is you can discreetly look at oh, this is how fast my patient is walking or this is the balance improving or getting better or worse. And then you can choose whether I want to see it every five minutes, every five days or once a month. But this allows you to figure out what you want to see, that’s automation.
Elimination is the next step. You want to understand. If somebody is walking faster or they have ectopic beats, what’s the underlying physiology? What’s the biomechanics behind it? What are the impairments so it’s looking under the hood? So this is important because somebody can work really fast but they can use pathological gatekeepers. They can be in pain and still try to walk fast doesn’t mean they’ve gotten better. They’ve just tricked themselves and the system. So one way to understand that is to illuminate yourselves with what’s under the hood. So joint angles, spatial-temporal kilometers, co-activation patterns and looking at the holy EKG or postural sway, this is illumination.
And the third part which is the biggest holy grail for a lot of us is prediction. You want a diagnosis? Or do you want to understand when should I discharge somebody from inpatient care home? When should somebody in a home environment, get into a stage where it’s a red flag? I want to bring them in for therapy, who will respond to a certain medication who will not like in a pharma trial or something and keep an eye on people at home skilled nursing facilities things like so this is a prediction and the best one is to predict who is going to get, oh, somebody’s going to get cardiac disease when somebody is going to have early onset of Alzheimer’s or Parkinson’s or something like a prediction modeling. Prediction modeling is much harder because you’re collecting a ton of data from healthy people and people who are early stage of a disease to the end stage. And you’re kind of creating a model which will predict when something’s going to happen, but these are three ways you can look at the question.
And the application of these can go in different ways. One is symptom detection as I told you activity classification, somebody walking, some revealing their wheelchair. How fast are they going? If it gets if an intervention, community mobility in social interactions is critical, right? Just because somebody walks fast or long doesn’t mean they walk a lot at home in the community. They’re going back to work, they’re going out with their families, fall detection is one way so these are things which you want to keep in mind as applications.
Device Configuration is another important thing. We in our group have moved on to what is there are two types of devices very simple one is called a variable which all of us can have access to right any of these which you can buy in a store a simple accelerometer, which is gives you a step-down, then the next generation or stackables and these are what is the difference between a wearable and stickable is these two cables are body conforming. You can pack quite a bit of sensor tools. into it, you can make it pretty dense, and they use micro electronics. And then they can dump data to Bluetooth 5g, 4g, Wi-Fi in the receiver, a smartphone and then you can receive it to the cloud or directly and these go in different manners. The next generation of sensors which I collaborate with my collaborator, Dr. John Rogers, was a brilliant Material Science Engineer and is part of the National Academy of Sciences, his epidermal electronics where you can add to the electronics into your skin. It’s pretty cool if you see John talking. He has a new electronic sensor system. It’s even dissolved in your skin over time and he’s testing for toxicity and things like that where you don’t even have to take it off. It just dissolves in your skin.
The other brilliant sensor, which we’re all stuck to, like 24/7 is our smartphone. If you look at it, an average smartphone anything which you buy off the shelf has about 14 sensors, which we can all access we need to and create activity monitoring tools, other tools, and then they have about 64 sensor probes other than the main sensor itself you can create about 64 sensor probes so that’s something you want to keep on when you’re doing patient monitoring and things like that. Your smartphone is a brilliant sensor, and you can do a ton of research and clinical care.
So different types of device outputs and this is again something escalation and researchers and students you all want and patients you want to understand is a lot of these commercial sensors give pre-computed measures activity, locomotion vitals, sleep, things like that, and if that’s all you want, but your question is not the high resolution of accuracy, then you just want to see a change. How was a sleeping? Am I sleeping better after taking a medication? Maybe the sleep hours are not perfect but overall you’re looking for a change and it’s okay to use pre-computed
But let’s say you’re trying to look at REM sleep versus non-REM sleep vs asleep and wake cycles and things like that. Then you want sensors that give you raw data and then you can customize the information. So again, you can the sensors with mechanical outputs, thermal outputs, electrical output, microfluidic outputs, optical things like that. And so based on the question you’re asking, these raw data signals can be then processed, and then you can use different methods of signal processing and machine learning techniques to create your own algorithms. Classifiers
This is another very important question and everybody needs to keep in mind is the location of the sensors. You just can’t put sensors all over the body and it makes it impractical. Whether it’s a grad student doing it or a patient taking it home. It just doesn’t help out. So knowing exactly where the sensors and how many sensors should be placed is very critical in capturing the signal. It’s a very simple question and should be pretty obvious, but a lot of times that’s not followed through so that’s something you want to keep in mind when you’re placing this and the attachment again, high quality data monitoring means attaching the sensor comfortably to the body where there’s not movement in the sensor between the skin not on the clothing which itself will cause noise. This signal-to-noise issue can impact your overall outcomes. So quality of data collection determines home. The output also determines how much the resolution of your data.
Measurement validation: this is a very simple study we did this is an actigraph, which is commercially very well used. Research grid sensor actually using a ton of studies. Just simply by putting the device on either your arm, waist or ankle, this just steps counts up steps. We have somebody to walk 50 Steps pre-counted 50 steps and then we collected the same data and a healthy person as you can see in the healthy it’s not perfect, but it’s almost there.
It’s doing a decent job. But immediately we’ll look at a person with spinal cord injury depending on where the sensor is. You can see the numbers are quite off. They’re not even close. So again, just plug-and-play sensors might give you a complete over prediction or under prediction of information. So that’s something you want to keep in mind when you’re doing this stuff.
So important when you do this is looking at validation against code standards. For example, if you’re using just raw signals. This is an example of an MC 10, which is a stickable where we validated the EMG signal against a dulcis system, which is a surface EMG system. So there’s a ton of studies, IEEE studies, and society for biomechanics studies, which have already shown that the dulcis system is pretty good. So you know that’s the gold standard. So you want to cross validate your signal fidelity the raw signal from that often incident, so that you now say okay, are they both doing the same to look at signal noise issues, but if you have pre computed measures, we then have to validate it against devices which give the completed information here. We’re checking the bio stamp data against that off a gateway. And you can see the heart rate is point nine two, which means there’s a strong correlation between the device and a gold standard method. So that even pre computed output you figured out what can be done.
So now we’ll get to clinical applications, and I’ll quickly touch upon it. So one of the things which we did now, even though we’re not an infectious disease group, is actually it’s a fun collaboration with a group called Bionic Entra in Bangalore. Actually, we did a COVID monitoring trial and it’s ongoing actually in India right now, where you’re using a noninvasive sensor, the one on your suprasternal notch, which can record cardiac output, breathing, cough, other respiratory features, lung breathing features, and temperature, pulse ox, and we validated against COVID positive and COVID negative for using a PCR test and we created a two-minute snapshot test. What we did here is in two minutes we asked a person to do a sequence of activities like breathing deeply coughing a few times, walking around for 30 seconds, or just arm cycling for 30 seconds just using the arms. And we would predict whether a person has COVID-like symptoms or not. The reason we did this was there were a lot of you know asymptomatic people walking around everywhere in the world, and they were happily spreading COVID-19 to each other. And so we wanted to catch it using a noninvasive sensor because it’s impractical to have PCR is done on everybody on an everyday basis. You could be negative today and positive tomorrow or three days later. You just can’t keep doing PCR so we thought it’s a two-minute noninvasive test and the test tells you if you’re having COVID like symptoms then you can go get a PCR and so this is another classic example of a simple Meccano acoustic sensor and how it can be utilized in a remodel to pick up COVID Like symptoms.
This is another example of how we use sensors to monitor stroke. It’s a little dramatized video because we did it for a TV Series for the BBC I think, but it’s an example of how you can use different sensors or different body parts and questions.
If you see the sensors are placed on different parts of the body. And this is just the raw signal from the accelerometer and gyro. But depending on the question you ask, you can look at speech problems, aphasia or swallowing problems or dysphasia sleep monitoring cardiac function arm is here a therapist is doing a manual muscle test but we’re automating it using sensors. Again, this is an example of home monitoring. It’s obviously dramatised but here we’re showing the same mechanical acoustic sensors we use for COVID monitoring but we can use it for speech mark
So that’s another example of how you can use sensors here’s another example. If you look at this test, this is somebody with Parkinson’s disease actually doing a clinical test visually unless you’re a superstar neurologist or a neuromuscular disease expert or somebody like that or a therapist who’s worked with patients with Parkinson disease. You really can’t see this person’s brain stimulators offer or they’re off medication, like levodopa. But if you look at the right side, very simple x ray emission signals from that sensor on the dorsum of the hand. You can see that when the medication is on or the DBS is on the signal so clean the moment it’s off, you can see the change in acceleration signals you can as simple as that pick it up. You don’t have to have a quantity to measure for single that’s tremor or this is my MDS UPDF score. This is basically no you don’t even have to worry about it. You can just simply use this change in signal peak acceleration, to say that if somebody is getting worse or better just to give you some kind of a learning system to work with.
This is another one we’re working on in babies in the neonatal ICU. The idea is to look at high risk birds and see if a child is atypically are typically developing so that if there is a motor delay, and we worried about it, we want to offer early intervention to these babies. So, either you can use these miniature sensors which are strapped to their bodies, or you can even use what is known as now cutting edge method called pose estimation, where you can simply use your smartphone camera and we can estimate real time motion tracking using pose estimation algorithms and see if this is a physical movement or a riding movement or the baby is atypically moving or typical movement parent. All this data is guarding this the more the data you collect, the better the algorithm accuracy is.
Again, with smartphones, we do a lot of home and community monitoring of patients. Even somebody is sent home we really want to know if they have a wheelchair. Are they using it? Are they having a walker or cane or crutches prosthesis and orthosis How much are they using it? Are they going in public transport? Are they using it to go to the hospital only are they going to their religious regions of choice are they going to friends houses are if you see the guy in the middle with the pink dots, you can clearly see the person’s not moving outside the house. Sometimes you can have a ton of step counts but all it is is like me just walking from a couch to the kitchen, watching TV getting food going couch to the kitchen, couch the kitchen you can have 10,000-20,000 steps in a day but you really haven’t moved outside.
The other thing which we’ve done really successfully is fall detection which is a huge thing. And there are many methods but with a smartphone, everybody’s stuck with their smartphone. So one way you can use the smartphone system as to whether if somebody slipping or tripping, were able to immediately I would get a text saying your patient xy is fell at this in this GPS location. If it’s outdoors, tell me the weather so we know if they slipped in the rain or the snow. And then we can also follow them. Is there a long line? Are they just faint? Are they lying down there? Are they getting up and moving so we can track them and automatically column it. So that’s one way to use smartphone technology for things like that.
So the last one, which I take a lot of pride and we do this quite as if you assume all these are your patients in the Greater Chicago land. You can still use their smartphone to track them if they’re going to the parks or the exercising. Or they go into the grocery store or they go to the fitness center because people say oh yes I am going and working out oh, I’m walking every day in the morning in the park. But you want to track them and we’re able to track them using their smartphones. Are they taking public transport in the bus or the trains a huge improvement right if your patient is as processes or as a disability or something else and stroke patients has somehow figured out how to use public transport whether whichever country we’re just figuring how to use public transport and doing that is a significant improvement community mobility. So it looks interesting, but it’s important to keep an eye with it as you’re just looking at the physiology or the community aspect. So if you look at the Disability model DCF model we have to cover the whole aspect of it. And that’s why we try to track them using these sensors and keeping an eye on so we need to intervene if you intervene at the right time. Not only can you help the patient, but you can also make sure the healthcare expenses are very low because you’re wasting healthcare resources in you’re getting them at the right time.
There are obviously alternate ways to do this. Motion Capture is the gold standard everybody does it but big labs and big groups only have it ForcePlates obviously instrumented math and printer. And now the big thing which even our team focuses on is motionless tracking camera less this is an early stage of Parkinson’s where in a clinic, we’re just putting a camera and letting it record everybody. It’s not any specificity. if you can see it’s all over the place. The quality is a little bad, but that’s the whole idea in a real clinic in a Parkinson’s clinic here where they’re doing their UPDRS. We’re just trying to track and see if we can predict bradykinesia destination tremors and can we catch it and can we give automated scores. So in an ideal world, I did with one of our faculty professor James Garden and as we’re trying to access all the cameras in the hospital and monitor patients and try to predict whether somebody is getting into a fall risk is somebody getting looks worse. How’s the gait looking how there’s movement looking? Because everything has camera systems nowadays and we’re trying to use that to do less pose estimation.
So in summary, it’s important to know what your target outcomes What are you trying to measure? Then you should configure a device to it and place the sensor exactly that. If you’re using a commercial grade sensor, make sure you validate it against the gold standard to know what it is if it’s not matching it see if you can create a conversion factor which is consistent or create your own algorithm with the raw data. Understand data collection methods and use cases and clinical applications. Always consider alternate tools. If there is a cheaper way or a more complex way depending on your question to do it, but it’s achieving your target goal then you would obviously, I do some of the work a lot of it is done by my team and collaborators and from different funding sources. And I would like to thank them all. And thank you all for the opportunity to talk today and happy to take questions.
Anil Prabhakar: Thank you, Arun. That was a very informative talk. We do have time for a few questions. I see this one in the chat.
Arun Jayaraman: Yes, it says is there a non invasive type sensor methodology to capture fetal parameters? Yeah, so the one other sensors we showed you with with the fetal is we put it on actually the mother’s womb itself when she the mother is pregnant and start to monitor fetal behaviors, heart rate, and things like that using a sensor. Resolution obviously is not perfect, but I think we get pretty good data from that it’s a non invasive sensor. And again, the location of a sensor really can magically be surprised how much signal it can pick up. It’s all done signal processing.
Anil Prabhakar: I guess as a follow up, would a accelerometer that is sort of tracking how often the baby kicks could be something that could be used one hand or predict when labor might happen?
Arun Jayaraman: Yeah, so yeah, absolutely. So the secret is in the data and the labelings right. So if you had a capability to label accelerometer movements in fetal kicking for let’s say 100 pregnant women, and then you cannot let find out when they are all delivered, then it’s easy to go and create a model which will say okay, based on these kind of speaking signals, majority of the women delivered within 30 days or 15 days, so then you can create a predictive model. So labeling the data is a critical part. Let’s say you have additional sensor features like heart rate and something else added to the same axon, then maybe 50 Babies are enough because you will now see features the more the features you have, the better the resolution now you’ll say, oh, not only did the kicking go up by a certain percentage, but I also saw the cardiac output was changing a little so you can combine the features you can say yeah, now I can predict at a higher resolution that within seven days or five days I can predict when they’re going to deliver. So it’s the feature quality and richness which helps you predict with a smaller sample size.
Attendee: So we are currently working on a ECG sensor for ChessBase ECG wherever, especially for people in rural areas where healthcare is not present in the villages in India, for for tele monitoring, post op, etc for those who are at risk of stroke or other. So the question My question is, firstly, if you’re developing a chest based variables with the patient as to how feasible is it? I mean, from a single point of view, you would get good leads, but then will the patient where and what are the various issues you have got in your experience based on chest based variables? And secondly, is there a way we could collaborate with the ability lab?
Arun Jayaraman: Yeah, hey, thanks. I’m assuming that was Raheel. Who was talking? Yes, yes. Thank you. So yeah, one of the biggest things which you want to do when you do a community monitoring and all that is is to make sure the materials are robust, especially dust, water, all those are the ones which can mess up your signal right. So whenever you do it in a lower income regions of the country, and you want long term monitoring, those are things you want to check the robustness of the sensor materials, material, sensors, waterproofing, dust proofing, and things like that. That’s one second is the question you’re asking. A single lead EKG system might be enough to give you warning signals you’re not trying to diagnose. If somebody’s going from lists of a certain level of congestive cardiac failure to another. You’re trying to create an early warning system or a post hoc monitoring system then a single lead EKG should be fine as long as you play around with the lead location and the signal to noise ratio. issues, then a single lead would be great. And, again, it all depends on labeling the data. Let’s say you take five CCS ways to put multiple leads on the body and collect it firsthand, many patients with cardiac disease or whatever you’re doing too many, and then you can say which lead gives you the best resolution of signal to the overall signal quality that way, you know, okay, it’s a reduction in a number of sensors. I’ll take out one lead at a time and see which how does the signal so you then eventually choose that lead which gives you the best signal compared to a fully lead EKG system. Then you want to pilot it in a home and community system and then obviously, everybody now has a smartphone or some phone. You can use 5g to or 4g or 3g, whatever you have to transfer the data to a central server so you can set it up that it only dumps the data when they plug in the device at night. The phone at night. And then the data gets transferred. Otherwise what will happen is it’ll clog up the download system and then the phone companies will start throttling the data download and things like that. So you want to just dump it at night or something. So those are all very feasible to do
As far as collaborating with us, absolutely. John and I, we collaborate with on a numerous projects across the world on many things like I said, we this whole COVID project is is in Bangalore actually. And so numerous things so absolutely shoot me an email and we’ll look at how we can help you.
Attendee: when it comes to stick averse, so how far is the technology? Or is it just in the initial research phases?
Arun Jayaraman: No, actually, it’s pretty well developed actually. There’s a ton of companies, material sciences, engineering is a stunning field. And people have gotten to measure the level of stickables nowadays and package it really well. And especially with 3D printing and things like that coming, you can easily capture all this in camps. The secret sauce is obviously the battery, the electronics, everything, and the encasing so that it doesn’t rip off and come off and then comes in a lot of the work is like when we do the neonatal studies. The sticker shouldn’t cause skin irritation and hurt the baby child’s chest and things like this. A lot of that goes into work. But if you’re doing it in adults, we usually put even a bandage sticker or anything which keeps it in places enough. But if you’re working with the babies and things like that, then you need to put a lot of effort into making sure you’re not damaging this kid.
Anil Prabhakar: sensor of a COVID detection in the market and when can it be expected?
Arun Jayaraman: So the sensor is ready. So is the algorithm. I think it all comes down to a commercial vendor who was ready to other government to pick it up. In fact, we applied to many government authorities in India at one stage. Unfortunately or fortunately, at that time, India was on the downtrend of COVID and they said oh, we don’t need a sensor of we’re fine. Then the uptrend happened again, they said they were interested by the time we figured all the bureaucracy they said oh, it’s all gone. Now. We don’t need a sensor so needs to be faster. Yeah, I think it was my fault.
Otherwise, we would have done it. But the good news for the sensors, right? You don’t even have to use it for COVID you can use it for flu, you can use it for anything. It’s all labeling, right? You the features are almost the same for any cardio respiratory disease. You just have to create an algorithm. It’s not a COVID algorithm, but now a flu or a tuberculosis. Then it’s plug and play. In fact, it’s pretty much straightforward.
Anil Prabhakar: All right. Thanks a lot, Darren, and thank you once again for joining us so early in the morning for you. Look forward to your engaging with you. I’m sure lots of participants will reach out to you.And we look forward to seeing you in person next year hopefully at Empower. Thank you so much.
Anil Prabhakar: Absolutely. Thank you so much for having me. Take care everyone. Bye bye.
Anil Prabhakar: All right, our next speaker Madhav, it is an amazing story. I was introduced to him about a year, year and a half ago. You wouldn’t believe that he was he had not yet started college at that point. I think right now he’s gone to MIT at the East Coast. Madhav is going to tell us about his journey with transplant glass. I think it’s an amazing story of how an individual with passion can bring together people to solve an existing problem. So without further ado, Madhav state is all yours.
Madhav: Thanks a lot Dr. Prabhakar. Just to introduce myself. I am I was born and brought up in New Delhi. And then my family lived in the US in the west coast for a couple of years. And then we came back to India. And so I started this actually when I was still completing my high school degree but I’ve always kind of been building things and making a lot of stuff since I was a child. And I’ve been wanting to solve problems and this what I started with transcribe glasses it’s now been over four years of working on this and trying to get a product out there.
And it’s it’s been an exciting journey, but I’ve just been passionate about it because I want it I found a problem that I felt was very impactful, but also very personal to me. And so my motivation behind this is just to kind of solve this problem. And have something out there that people can use to improve the quality of their life. You know, whether that’s one person or a million people.
So I’ll start with kind of my personal story behind this. So my I have been exposed to a lot of people with hearing loss. My grandmother on my mother’s side had hearing loss and she was diagnosed for it but she denied that she had hearing loss Unfortunately, my grandfather on my father’s side has hearing loss and uses hearing aids but I think kind of my biggest exposure was when I was in high school, one of my friends had hearing loss, significant hearing loss and it was very difficult to communicate with him, even for me. He was also finding it very difficult to understand what the teacher said. And in India today, a lot of classrooms don’t provide accessibility or accommodation services. For most people with disabilities right there is no sign language interpreter there is no closed captioning. And so he found it very difficult to really understand what’s going on in the classroom. And I remember one day I just kind of stopped seeing him in school. And so at that point, I started wondering, well, you know, it’s 2017 Is there no solution that could kind of have helped him understand what people are seeing around him? And he told me, Well, you know, what’s the point of coming to school? I can’t really understand I’m kind of clueless as to what’s going on. And I asked him, I started asking him okay, but what about hearing aids? And he said, Well, it’s very, very expensive. You know, I can my family would end up spending lakhs on hearing aid, and even then it’s not a golden bullet. You know, just wearing the hearing aid isn’t going to help me in all these situations and just solve all my problems.
And then at that time, I started looking for other solutions. And I looked at cochlear implants and they said, okay, what about this? Is that any different from a hearing aid? And he said, Yeah, well, you know, the thing is, it’s even more expensive. It’s going to cost 10-15 lakhs for the device. It’s a medical procedure and I will have to, you know, get this implant, and I’m not really comfortable with something that’s so intrusive, and so expensive, but also, again, I’m not sure it’s going to be a 100% solution.
And if, if I would recommend for you to watch an amazing movie called Sound of Metal, and it’s about someone who is a drummer in a rock band and starts losing his hearing, and essentially becomes profoundly deaf and how he kind of slowly adapt into this new life of the deaf community. And he initially saves up a bunch of money and is very keen on getting a cochlear implant, but once he gets a cochlear implant, you kind of see his disillusionment with you know, all of these medical devices and solutions that exist out there.
And so that’s kind of what he was saying about a cochlear implant and then started looking at other solutions. And kind of one of them that I found interesting was closed caption and, you know, it’s free. In general, you can use ASR – automatic speech recognition, like we are using right now, at the conference. I think we’re using Otter AI, which is a pretty popular caption API in the deaf and hard of hearing community. And you can just, you know, see what people are saying on a phone screen. And so I asked my friend, well, you know, what, what about this? Why don’t you just use an app that can recognize speech and then kind of just read it on the phone? And, you know, at that time, he said, Well, it’s very inconvenient because I’m constantly doing this back and forth between the person speaking and the captions and you know, it’s like I’m watching a tennis match where the ball is going back and forth. And it just doesn’t work for me and it’s very inconvenient and I really miss out on what’s going on. And at that time, I wasn’t very satisfied with with that answer, but I try it was trying to figure out okay, why do these closed captions not work? And so I started building this hypothesis of okay. You know, right now closed captions on a screen. And when you’re looking at the screen, you’re not looking at the speaker at the same time.
But for people with auditory loss, they depend very heavily on visual cues for communication. So for example, lip reading, or speech reading is a very important part of communication for someone who’s deaf or hard of hearing. Similarly, looking at facial expressions or body language, because sometimes tone of voice isn’t accessible. And, you know, having all these additional visual cues to supplement communication is really important for someone with hearing loss. And so, then what I realize okay, if you’re doing this back and forth, if someone really depends on looking at the speaker, maybe looking at multiple speakers, then you’re, you know, not being able to kind of see what’s going on and that might impair your ability to understand spoken communication.
So you know, that’s where I started getting some of these ideas, and I started doing some research. And this is the problem statement that I came across. And this was my original hypothesis, which today, in October 2020, Google Research released a paper on the benefits of a headworn display for closed captions for people with hearing loss. And essentially in that, you know, eight page thorough research paper they validated the exact hypothesis that I came up with about 3-4 years ago.
And so that is that closed captions are a good solution in themselves, but the experience of consuming those closed captions of using them is inconvenient. And that leads to problems in comprehending spoken communication. So, the first reason is, of course, visual cues like I talked about lip reading and gestures facial expression. But then there’s also engagement with the speaker. If you’re in a classroom and you’re focusing on the teachers face and you’re reading their lips, and you’re looking them in the eye and perhaps you’re looking at some presentation material, you’re absorbing more information than if you were, you know, cognitively loaded with looking between multiple sources. And there was a lot of visual dispersion and not a lot of focus and you couldn’t really maybe see what she’s drawing on, or the teacher is drawing on the board.
And so, engagement is also a crucial aspect. There’s also environmental awareness so for people with hearing loss, speaker identification, and sound localization is often challenging, with purely auditory cues, and even closed captions. So speaker identification is if there’s multiple people talking, you want to know which person is talking sound localization is, you know, if a sound or a speech signal is there, in you know, what, where in your space in your environment, is it coming from, and often people with hearing loss have difficulty with these things. And closed captions don’t tell you who’s speaking, they tell you what is being spoken. They sometimes don’t tell you who is speaking and they definitely don’t tell you in general where the sound is coming from. And for that, you need environmental awareness. You need the ability to look at something that’s changing in your environment or look at whose lips are moving in a multiple speaker situation.
And so these are three of the main benefits, but there’s also mobility. You know, if a person with hearing loss is driving a car, or walking down a busy road, they don’t want to be stuck with having a phone in their hand and having to read that phone because it’s dangerous in those situations. And then there’s also the social acceptability of the solution. If I am a person with hearing loss, and I’m talking to someone who doesn’t have hearing loss, and I stick a phone in the middle of a conversation, I’m talking to them at a cafe and I go and I put a phone in the middle and I started staring at the phone. That is not a traditionally natural social interaction. Right. And so then what happens is not only does the person who’s talking to me, they feel conscious and they are not as free in the conversation when they’re speaking. They may may withhold certain information because it’s very obvious. It’s being recorded and transcribed. But then, even me as the user, I will start to feel conscious and I’ll start to feel as if I’m being rude, and therefore I might just put away my phone and not used the transcription, even though that’s what’s helping me comprehend the conversation. And so this was kind of my hypothesis of what’s going on. And he said, because of all of these different things, the user’s comprehension of spoken conversations suffer significantly. And so that’s when I had kind of this eureka moment of okay. It seems like all of these problems are being caused because closed captions are traditionally being used on phones or screens, you know, a tablet, a laptop, if you’re lucky, you might get a projector screen, but you’re not projecting it on the face of the speaker. You’re projecting it somewhere on the side.
And so all of this is leading to visual dispersion. But what if we could just take these closed captions, the technology exists, people are already viewing them on their phones, but instead of seeing it on the screen, we make it heads up instead of heads down. We bring it into your field of vision. And, you know, just have it in your environment in real time. And so that was kind of my idea of Okay, that seems like it could potentially solve these problems behind why closed captions are still not a great solution. And so the idea behind this is, because of all of these things of being able to adapt to visual cues and also look at environmental awareness and have mobility and also higher social acceptability, the solution is going to be more effective than simply viewing closed captions on the screen.
And then, of course, as everyone knows, in it, especially in India, the purchasing power of people in the deaf and hard of hearing community is lower than the hearing community in India. And so affordability needs to be key you know, we can have devices like hearing aids and cochlear implants that cost lakhs and lakhs of rupees need to build something that everyone can afford. And so, what is affordable? Well, you know, a very, very low end smartphone is probably affordable to a lot of people at least in in urban areas. And so something that’s even less expensive than that is probably what we’re going for. And then, you know, a lot of the time I would hear the fact that oh, hearing aids need some fitting or I have to go get it customized or cochlear implants, for sure needed surgery. And so I wanted to build a solution that is non intrusive, and is comfortable, convenient, doesn’t require customizations doesn’t require fittings.
And so that’s kind of those are the three key tenets of what I started building. And today, transcribed glass. The system, kind of the, the elevator pitch that we tell most people that tries to communicate what we’re doing in you know, 60 seconds is the target audience for transcribed glasses. Anybody who uses closed captions, or could benefit from using closed captions if they started to, and that’s not just people who are deaf or hard of hearing. It could also be people who don’t speak the native language could be people with, you know, mental disabilities who find it difficult to focus on verbal conversation. It could be people who just want closed captions to supplement the auditory communication. So anybody who uses Closed Captions can use transcribe glass. And the why behind it is like I said, captions on heads down screen are inconvenient and inefficient, users missed out on verbal communication cues, engagement with the speaker environmental awareness, etc. existing solutions are just really expensive, inaccessible and comfortable and sometimes ineffective, and transparent glasses and affordable wearable that displays closed captions from any source on the heads up display in the user’s field of vision in real time, thereby greatly enhancing their ability to understand spoken communication.
And so this is kind of a render of the product. If you can see my screen what we have your my video, this is the latest prototype. Well not the latest. This is our fifth iteration of the prototype. And four years ago we started with something that was you know, four or five times the size. It was this big, bulky black device that all it did was take some text and projected in your field of vision and today we have a very refined device where the weight of this is less than…this weighs 11 grams, which is actually blown away our target weight of less than 25 grams and most users we’ve tested with have not noticed the weight of this device. It’s also much smaller. It’s it works smoother, it works with you know different caption sources. And so yeah, these are some features of the device. We’re trying to make the battery last all day. We are also developing so I’ll go into a bit of a side note here. As you can see, what we’re developing is not a headset or glasses, we’re not making glasses that have some display inside them. And the reason that was actually a very conscious decision.
Number one, I think all of the wearables that exist out there for heads up displays, they are integrated spectacle frames, or you know, have some side of some sort of headset, and you don’t really see retrofit devices out there. So it’s unique but more importantly, what I started hearing from a lot of people who use devices like this is one you know if they wear glasses, if they wear spectacles with prescription, and they have certain power and lenses then it becomes very difficult you need some customization in the smart glasses you need, you know some fitting you have to go to the store, maybe you have to pay more money to buy the device. And it’s just a more complicated process. But number two is if I actually just want to read something using my glasses, and I want to look around me but I don’t want the Smart Glass functionality all the time. I can’t really do that if the glasses and the device I want the same thing and it’s binary to the on off. And so what I decided is okay, why not build an attachment? So there’s wearable just attaches to existing glasses and if you don’t normally buy glasses, you can get an empty frame of glasses and snap it onto that.
And so that’s kind of what we’re building. And as part of that one of the challenges in the design process of a wearable like this has been the sheer complexity of designing a mechanism that retrofits to existing spectacle frames because there’s a variety of a lot of spectacle frames, you know, you have thick glasses like this that are plastic and you know, really bulky and so your device may be able to snap onto that, you know on the side of it and you can then wear it like that. But then you also have, you know, very thin aviator frames that are made out of metal and cylindrical. So how do you how do you develop something that’s like that. So what I’ll share here is something we’re working on in the design is this retro fit mechanism, which is how do we develop a mechanism that not only attaches onto the side of, you know, a variety of different styles and shapes and sizes of classes, but also stays on and doesn’t you know, rotate and isn’t loose and is reliable? And so you know, first we evaluated simple clips and added some rubber on to the metal to improve the friction. Then we looked at kind of a snap fit mechanism where we have some form inside memory form that adapts to different shapes of spectacle frames, and then you can kind of lock it in so that the form adapts to the shape but it also holds on tight to it. You know then we looked at a magnetic option. You know what if we could put a magnet on the temple of the glasses and then we could have a magnet on the device itself and kind of snap that to make sure it’s tight.
In here the problem is we also have some electronics inside the body of the device and you know we would probably need some strong magnets and so that may interfere with some of the stuff that’s going on in the electronics then we looked at, you know, having a spring based option that you know it’s it’s applying pressure and there’s some rubber grips here but when you slide in the spectacle frames to it, you know pushes the spring back and creates enough space to put in the glasses but also applies a lot of pressure pressure glasses.
And then we also had a screw based on where okay, you slap it on and then you screw it tight to the glasses. And so we came up you know, this was a design process that was very intensive. We got a lot of feedback from users. We tried out a bunch of different options. And this is kind of what we did for every feature in this product.
And then you know, one thing we said is Okay, it looks like it’s going to be very difficult to accommodate all sorts of spectacle frames. So why don’t we have a separate attachment for the thin frame, these metallic aviator glasses that then attaches onto that and kind of stay s on to these thin frames, and then you can retrofit it onto this attachment.
And then one thing that was very important for us was kind of the evaluation process of, you know, in this wearable device, what, what are some of the most important criteria and how do we make certain criteria as being more important than others? And so kind of the final, what we landed on was usability was one of the most important criteria because we’ve been keeping users at at kind of the center of everything we’re doing. So that has the kind of the highest importance.
And then also, we don’t want the retrofit to be something that’s very bulky or you know, very visible and that doesn’t go with the with the design and the aesthetics of the device. And so the integration that is important as well. Maintenance is really important because if you have a device that you constantly need to, you know, get it fixed or you need to get it tuned or you need to maintain it. That’s not a good solution.
You know, with hearing aids or cochlear implants, you do have to sometimes go back and you know, get it, get them checked up, and then longetivity so we want this device to last for a long period of time. We don’t want it to slowly deteriorate. You know, we have some material here that it’s constantly being bent when it’s being attached to the device. And it’s getting weaker and weaker and maybe over 1000 cycles of 1500 cycles of usage of the device, it’s going to get weaker. And that’s not something that’s desirable. So, you know, we want to define a life cycle for this product, how long it’s going to last how many cycles of usage, should it be 100% or 99% reliable for. Then of course manufacturability is a big aspect of can know, manufacture this on a large scale. Can we is it going to be reliable? We’re going to produce the same thing every time and then costing because like I said, affordability is key of everything we’re doing so we can’t really we’re limited to using materials that don’t cost a lot and are not very expensive to manufacture. And so then we kind of rated all of these and this is just to give an example of one specific feature of the device and all the ideas and the brainstorming and then how we finally choose one of those for the final product. And then you know another thing we call the XY mechanism is what I started realizing talking to people and seeing the way they’re wearing this is everybody has different interpupillary distances you know, their face shapes are different the location of the eyes is different on their face, and different glasses as well. And so, you know, when you retrofitted onto the glasses, the position of the device is going to always vary for some person it may be here for some person up here, some person here but because of the optics of the device, you kind of have to get this display part centered with your pupil, you know, if it’s not centered, you won’t be seeing the captions. And so what we realized is we need to have some sort of mechanism to be able to manually calibrate and move the device so that it’s aligned with the center of the eye. So again, you know, we have many different options of okay, some screws and you have you can you know, pull it here and push it and all of that. So, those are some of the things that went into the design process.
And then so how the device works. What we mean by connect with any caption source is there are a lot of different ways that one person that a person can get captions out there. And, you know, one of those is automatic speech recognition, like I said, what we’re using with auto AI, but within ASR, there exist several different caption API’s and an API is, you know, an open programming interface that I can use to access work that has already been done. And so I can, so within ASR, there is Google ASR. There is Microsoft Azure has their speech text, Amazon transcribe as their own. Apple has their own Android has an all the biggest companies in the world have their own ASR API’s. And that’s just within automatic speech recognition. And then there’s CART which is communication access real time translation, which is usually human captioning as a human who has a steno machine that allows them to type very fast and they type verbatim while they’re hearing their hearing person. And then that gets streamed to a web server, typically something like streaming text. And then the users can just log on to the web server and see the captions in real time. And the current is much more expensive because the human capital is cost much higher, but it’s also better in professional settings in conferences and meetings because it’s typically much more accurate than ASR. You can also train the cart typer to you know, know certain things like my name. Automatic Speech Recognition always calls my name mother, because they don’t know when Mother is but you know, a standard person can be trained for that. And then there’s pre formatted digital subtitles. So when I watch a movie, The subtitles are 100% Perfect. They’re also formatted very well. And so those are usually typically digital files. So one thing that we are doing is and this is a so all of these features have common conversations with the deaf and hard of hearing community is there is no universal caption source that works best for a specific situation. In some situations, ASR may work best because I can just pull out my phone and start using it and it’s free or not very expensive, but in other situations, I need that 100% reliability of captions, because it’s crucial. And if I miss out, then there’s going to be a lot of problems. I’ve used cars that if I’m watching a movie, I don’t want to use my phone and use ASR. I want to use the captions that the movie is providing. And so what we’re doing is we’re aggregating all these different caption sources, and serving as a delivery channel between these existing third party API’s, and our heads up display that the user is going to be wearing. So in different situations, a user can use whatever caption source they prefer, whether that’s ASR or digital format.
And so our app starts transcribing from whatever caption source we don’t do any speech recognition, and then it sends it to our hardware device. And then we retrofitted and then it kind of projects into your field of vision.
And so finally, I think the most important thing, in this whole process, has been talking to the end user and getting feedback from people who are actually wearing this device. And so, in the last four years, we’ve probably had trials with 200 Plus deaf and hard of hearing people in India, the US, the UK, and got a lot of feedback that has initially been very positive. So this is an example of one trial. We’ve had probably six or seven of these trials. Most recently, we developed a new prototype, and had a trial with 14 users in Ahmedabad, Mumbai, and Delhi.
And so the gist of the feedback is kind of that having something on a heads up display that captions conversations is much more useful than having it on phone and I’m able to comprehend the conversation better we’ve also been able to validate the price point.
And so all of these features, you know, whether that’s retrofit or whether that’s having multiple caption sources, it’s come in, in having conversations with different hardware people. For example, I used to attend a meetup every other Sunday. In Delhi at the Free Church where the National Association of the Deaf used to have a deaf meet up and there would be 100 Deaf people just signing there. And I would go to that and just talk to people and show them the device and tell them these are some of the ideas I have, but what are the ideas that you have and you know, I want to build something that is useful for you. Since I’m not a part of the community. I need to try to understand your perspectives better rather than imposing my ideas on you. And so a lot of the ideas that have come have not been mine. They’ve actually been users but I am simply serving as someone who’s trying to translate those ideas into a product.
So yeah, the last slide is kind of just the overall journey. And this was one year ago but started off with an idea was able to we were able to file a patent, do some early user trials that were very positive. We launched a crowdfunding campaign in India. And we got about 50 backers across India who wanted to see this product out there on the market. And then we had demos and user trials at the National Association of Deaf Gallaudet University here in Washington, DC, MIT, and then a lot of deaf and hard of hearing users in the US and UK. We’ve also talked to, you know, educational institutions who have deaf and hard of hearing students. We’ve talked to caption service providers, we’ve talked to corporate accessibility leaders. We’ve talked to, you know, theaters and movies and cinema halls that provide closed caption services. And then we’ve been able to raise some funding to build this product. And the next step is early next year. We are going to launch the first public version of the device that we’re calling transfer glass beta. We’ve got about 220 pre orders for the device from across India, the US and UK. And we’ll be shipping it out to a first 100 users across the globe. And this is kind of going to be the first thing that goes out and we’re going to have longitudinal trials and people are going to give us feedback on how it’s working. And then from there, hopefully, we’ll be able to go into mass production. But yeah, this is kind of been my journey. And I’m grateful for the opportunity to share it with all of you
Anil Prabhakar: Thanks a lot Madhav. It’s been an exciting journey for you, obviously, and we are looking forward to having a product of this kind come out of India. I mean, I’ve been following your journey over the past year, year and a half and it’s very rewarding to see the progress that you’re making. And we have time for maybe one question very quickly because we are running a little late on schedule. Anyone in the audience have a question? From other?
Well, maybe I can ask this question. I understand that a lot of the captioning that you’re going to be using is going to use available closed captioning methods. And even on this conference call. We are using Otter and if someone turns on the live transcript, you’ll see that it does a fairly poor job with an Indian accent.
Right. So I was following it when we had Indian speakers and I was following it when we had our foreign speakers and clearly, it’s not doing as good a job with the Indian speakers. Do you have any ideas on how you might be going to tackle that problem for an Indian or user base for your glasses?
Madhav: Yeah, that’s a very good question. So I think the reason that has that is generally the case is because most of these companies when they start out with building ASR, API’s and ASR models for speech recognition, the data that they use is typically American accents or British accents. And so these models are trained without early on and they do much better and so an Indian accent kind of throws off these machine learning models.
But what I’m seeing and so Otter is a relatively smaller company. And it’s it’s not, it’s not, you know, the oldest thing that’s been around but, for example, I think if you do a comparative study on what Google ASR was, like, two years ago, and what it is today, given Sundar Pichai has been pushing for the inclusion of countries like India and Asia. I also know that Google has been focusing on having more Indian speech data in their training of their speech recognition models and so that has been significantly improved. So now we see if you use Google meet, and you turn on closed captioning, the performance is much, much better than it used to be two years ago and it’s still not perfect, ASR will never be perfect. And you can quote me on that. But that it’s essentially the data that you train it with and it’ll become better. So I think the solution for this is for companies who are doing closed captioning and who have kind of got it down on American and English accents to actually start using Indian data and other foreign accents to train the machine learning models. And they need to include the data and be more inclusive of different accents to be in order to improve the accuracy and I know, for example, Airtel in India, Airtel Labs is creating speech recognition for Indian accents and not just singing accents, but also Indian regional languages. And in the bunch of you know, all of those things. So, I think that the point is, it comes down to the inclusion of the of the biggest companies that make these automatic speech recognition models. It’s their responsibility to include a variety of data and be inclusive while they’re training their models.
Anil Prabhakar: Right. Thanks a lot, Madhav. This is a quick question from Rita and just about the patent. I think Rita will I’ll reply to you but there is usually it’s not necessarily an Indian patent there is protection worldwide if you filed what is called a PCT and then do international filings in each country. So I do not know perhaps mother was gone through something like that. But since we are running out of time right now, I will request you all to thank Madhav. To hear from him about his journey and we look forward to more successes. Thank you very much.
Our next speakers are we have two contributed papers. We have a brain computer interface from TCS.
Reshmi Ravindranathan: I also have my colleague Robin Tommy with me and we’ll be presenting. So, greetings of the day and thank you for giving us an opportunity to present our paper on Empower 2021. I hope my screen is visible and I’m clear enough on the audio.
So what we will present here today is the experimental validation of findings now using brain computer interface in autistic children. So before I go to the crux of what we have done, which Robin would be presenting, I just want to throw light upon the two main aspects, two different aspects of this journey that we’ve been doing that is human brain as well as a brain computer interface and autism. Now, it is a no brainer that the brain is the most complex part of our nervous system and is composed of around you know, 100 billion neurons and it controls the entire human activity and helps us to interpret data and various other processing that we do now this this organ is mainly divided into three parts that is the forebrain, the midbrain and hindbrain and the forebrain is the largest and the most developed part of the human brain and the midbrain is associated with a movement and other voluntary movements and the forebrain is what where all the cognitive capabilities of a human body is concerned with and the hindbrain houses in the cerebellum, which controls the body’s vital functions.
Now let’s go a little more into the waves that are emitted by the brains Okay, now, neurons that are there inside the brain communicate with each other by creating a potential difference that is known as action potentials. Now during this neural activity, there are a lot of oscillations which are induced, which is also known as brainwaves or EEG waves, which is electroencephalogram waves. Now, based on the activity that the human is performing the strength of the EEG signals also varies. So the various kinds of waves as emitted by the brain are alpha, beta, gamma, theta, and delta waves. And the alpha waves are considered to be the strongest which is observed during wakeful relaxations and, you know and when we are just when we are wakeful and but our eyes closed, we have meet again, beta waves which are normal rhythm that is predominantly you know, observe when we are active when we are busy when we have some anxious actions or active states of the mind happening. Gamma is also observed when there is you know, some kind of meditation, some kind of blissful activities of intellectual perceptions are being taken care of. Delta and theta are more on you know, towards the sleep side of it. So, these are the various kinds of EEG signals that are being emitted by the brain.
So, keeping the brain side aside, we come to the next part of our research which is autism. And I know you’re all aware of, you know, autism spectrum disorder, which we are really not sure about the cause of autism, but now the affected children or people normally show a different brain structure compared to the neurotypical children. Now, the learning the thinking, problem solving capabilities, people that belong to the autism spectrum disorder can range from severely challenged to severe to primarily gifted or people with really high intellectual capability. Now some people with ASD need a lot of help in their day to day lives, whereas some may not need that much of help. Now people with autism spectrum disorder also face problems in social emotional and communication skills where they have a lot of behavior. They have problem in emoting their expressions, the way they react to things could be different. It could also be okay, so there are, you know, different types of behavioral differences that we can normally see in children with autism.
Now, what is the motivation for us to get into this research with them now? We know we have been involved in the research and development of innovative solutions using the latest technology for this neurodivergent population for the last couple of years, I would say six years to be precise. Now when we know that technology has been creating a lot of disruption in the overall way the world is running, you know, even in general healthcare. While we visited the special schools and came across various explained experiences of the children as well as their beneficiaries, we realized that you know, the technology and the benefits that technology provides for a very primal relevance in the lives of these children. And from there on, we have been involved in creating ways kinds of solutions for these children that includes a gamified solution for their physical rehabilitation for their cognitive improvement. We have had intelligent AAC devices using you know, the latest speech and condoms to make sure that we give them a voice for their learning. Bringing in transformation in the way that doing things.
Now, the solutions are not just aimed at making the current situation better, but what could also be done for a hopeful and independent future. So that’s the motivation you know, since we are working in a technology perspective, what is it that we can do to make their lives better? Now? How did this come into the picture? So while we were doing our research, and you know, we were conducting we have been all obviously we’re involved in a lot of series of interactions and brainstorming sessions with the parents and the other stakeholders like the school authorities. We were able to understand that the skill assessment or the interest assessment from an academic perspective for the children are completely dependent on manual interventions, okay. So it could so how do you know that a child is interested in something or this is where their skill lies and it is mainly dependent on maybe the observation of the parents and the teachers when they go to schools, some questionnaires that are being filled in from the, from the front side of the teachers, but overall, it is, you know, completely dependent on a human being to assess the skill or the interest of this child.
Now, this process took around three to six months or even more than that in certain cases. For this kind of identification to happen. So, we thought why not in create a research when we have technology at hand, why not do something that that that could help us determine whether there is a technological or there is a very defined system through which we are we will be able to understand the interests of the child and to probably guide them in a very defined manner for a better future with respect to career or vocational things or something of that sort as because we know that as of today, there is a lot of generic training and they many of these children are even you know, extra have to do kind of very general things in their lives are probably very menial kind of a vocation. So instead of that, why not drive them towards something that is based on their interests based on their skill, and how do we come up with it in a very accurate format and that is where this research with brain computer interface started. Now brain computer interface is something that is still undergoing a lot of research and you know, that’s why we are calling this paper experimental validation of brain computer interface for autistic children. Now, this is an electromechanical device that will help analyze the brainwaves using certain sensors known as EEG sensors. They are available in different models like headband or GPS. It’s an extent we have used the one one that you see here on the screen is what is being used in our research.
Now you know, over the last couple of decades, the BCI based research has contributed a lot to the society where even in the case of you know, how do you get Neurofeedback for post rehabilitation? Well, how are they how are they reacting to certain kinds of treatments? When you give you know, these certain games for the children to play, how do they react? That is what is the kind of brain signals that their brain is emitting? Now, is it positive, is it negative? All those kinds of inferences can be made just from the brain signals and that is why we thought in this, this, you know, the research could also be used to gather more insights about a brain person’s brain activity when they are involved in certain activities like reading, writing, drawing, etc, etc. So this is a part of our research and I will now request Robin to take you through a journey on what we did with certain children. We collaborated with a certain NGO where we did a research or research journey to see how brain computer interface can be used for assessing the interests of the children over to Robin.
Robin Tommy: Thank you. As Reshmi was mentioning, this has been for the last five years we have been working with cerebral policy and autistic children. So one of the burning problems ofthe autism is the way in which they learn. So you might have already seen that why we have been using or by many researchers happening from the perspective of robotics, especially social robots, social assistive robots, or whatever that comes into play. The major key thing is that they have a very good intellectual capability but how we will tap into that intellectual capability and provide the right training for the Autistic Society ar ASD. So that is how we started this journey. So we identified a few tasks, okay, so we saw who was involved as a stakeholder, the parents of the autistic ASD spectrum. We also had the teachers the physio therapists, cognitive training mentors are called as the coaches or the teachers. And also we brought in some occupational therapists to make sure that we are doing it in the right sense. So the occupational therapist along with their teachers provided us with a few set of tasks. For example, the task could include like mathematically, perspective, habit perspective. So when I say habit perspective, more of the things that comes from the basal ganglia, so that they they what do they remember? How do they how do they attach the factors of remembrance? How do most of the things for example, how the favorite is in place, so those kinds of tasks, so subtraction, multiplication, or addition, which comes from a logical and analytical perspective, the other is like writing an essay, or writing and reading, then the other part of it being more creative and also the skill training and those kinds of things.
So these are the three dimensions of a triangle, you can think of where the different activities were formed, or the tasks performed. So we continued this task for three months we gave we tried reading the brain even before reading the brain. We had taken a few children I would say you can see the final experiment was done on six children, but at least we did it in the normal brain or I would say normal students in the same age to understand how the normal brain is responding to these kinds of activities. Then we went into six Autistic Spectrum Disorder students in multiple range, so two to two each doing the low to in the medium and to in the high. And that as you have seen, the device we used was an open and open BCI standards were used and you can see some of the major activities given and the comprehension that was there. So reading, writing and the samples how they have given, so to phrase two phases of it was done one is the manual interpretation phase and the BCI interpretation phase.
So first manually, we saw how the, how he or she is doing this activity, then we moved into this. So then the experiment started, we gave the variety of tasks. And I would say this was we were observing how the brain was performing based on the alpha beta, which Rashmi was saying, majorly what is the frontal lobe, parietal lobe, occipital lobe, which means that how the retina is perceiving the information in the back part of the brain, how the language information from the parietal as well as the I would call the thalamus region, how they are responding and how the waves are coming out. And also we were in observing the frontal and the prefrontal lobes. So we obtained the signals we used statistical analysis of almost like some logistic regression algorithms to understand how the how the attention was there, how different values performed how much alpha, low beta, and the high beta values that were getting performed.
So, as you see here, what we are trying to see is we are trying to make sure all this data that we get is processed properly. As you know these devices are majorly used for measuring the attention meditation. So what we have done is we have taken the Alpha Delta and we started putting our own filters over it, we we got away with all the noise then we made a threshold attention levels from which the percentages were calculated. Then we went into a more of like a quadrant analysis the minus XY quadrant analysis on the four different quadrants, how they like that’s how the recordings were done, and they were analyzed, and the processing was done so you can see how the results look like and how they were performing for each of these results. Reshmi you can interfere here and interpret more on the results.
Reshmi Ravindranathan: Thank you, Robin. So what we have done here is so if you see six samples now these are what you see on this side is what has been manually you know, told by their parents and teachers that each of these sample this is what they are interested in. And these are the activities that were given to each of them. And this is a percentage of attention that was recorded now. I’m not having time to go through each of them, but it seems as if you’d look at sample four, where they say the fourth sample is their interest lies in you know stringing beads.
So, for each of the activities, sample four would show maximum attention in that area itself. So here for one sample, we were able to clearly say that whatever has been manually interpreted is exactly what was found out from the VCA. Also because the attention level for this particular child, who was who they had observed to be interested in stringing beads showed a high level of attention and meditation when he was doing this particular activity.
Similarly, there were certain students by which we could you know that there was a difference in what was manually observed and their attention immediate issue which showed that there was another activity which they would have been interested in. So, you know, so this is what we’re trying to prove at the end that BCI can be certainly be used as a tool to make sure that we have a very accurate platform through which using brain signals these level of assessments can be done that could be used could help in the long journey of the child. So these are some conclusions that I was just talking about. References. These are the various references that we have created. Yeah, that’s it. Thank you so much.
Anil Prabhakar: Let e thank Rashmi and Robin. And I request the last speakers, the last people the session is on the assister smart class by another team from TCS Charudatta, Vijay & Kunal.
Kunal Shrivastava: So good evening everyone, along with Charudatta Jadhav and Vijay Raut, I am Kunal Shrivastava devices from TCS Research and Innovation accessibility COE. We will be presenting you our paper assistive smart class that’s a wearable device, which is about enhancing abilities and empowering people with visual challenge by leveraging AI.
So what’s our research focus? We understand that technology intervention available for visually challenged for equal digital access, but the quality of life is still a struggle for many, and they often depend on others for day to day activities. Due to visual impairment, there are about 48% of people completely or moderately cut off from other people and things around them as they’re not able to establish the proper connect they’re not able to experience the world as the mainstream does. So our research focus on how we will empower visually challenged and provide equal opportunity for them to live a dignified life. Leveraging artificial intelligence machine learning can help to overcome physical limitations and address the problems faced in day to be liked by them.
So let’s see. What are the challenges so people with visual challenge or due to aging encounter several problems in their daily life that restrict their personal freedom? It’s a big challenge for them to deal with routine issues in a physical environment independently, such as recognizing people and surroundings, reading, printed or handwritten text, identifying currency, controlling home appliances, or even monitoring their health in this COVID era, where physical health is very limited. We will understand more on why these challenges are critical to resolve during our demonstration of the solution. Let’s see how we solve the problems.
So we have developed an assistive smart glass that is a wearable device attached to eye glasses that can aid visually challenged or an old age in daily activities. It’s an innovative way to empower the visually challenged and has a potential to create a larger social impact. This is a major of how it looks like.
So we’ll let me help you understand our solution and our approach. The solution comprises of hardware devices and cloud AI services. This entire solution is multimodal where user can use different functionalities in efficient and effective ways in various situations and surrounding conditions. By using buttons voice and hand gestures in outdoor or a crowded or noisy surroundings they can use buttons or hand gestures instead of voice or command. So the hardware part consists of a processing unit, which is a Raspberry Pi Zero, which is a low cost small size device the single board computer it’s a small so that it can easily be mounted on the classes. The another. Other input devices are sensors like camera gesture sensor and bluetooth earphones with a microphone and a push buttons are there. For as an output these earphones can also act as an output device to listen to the speech output of a smart class and the cloud AI services and various AI and ML algorithms are used to addressing problems in an intelligent manner. So this device is connected with the power bank for an uninterrupted power supply. In the picture, you can see how this complete assembly is mounted on a glass and can be used by a person a camera is placed in such a manner that a pictures taken will be the site of a view of a person which captures the moment what user is currently viewing. Let’s see how it actually works. The architecture diagram shows how multimodal input and output devices are connected to the purchasing unit upon the command as the input whatever command we give the multimodal the input images captured by the camera is sent to a cloud and analyzed for various use cases as per request from the device. The response is sent back in the form of text which is later converted into speech using text to speech the program and the user can listen to is using earphones. So let’s deep dive a bit and show it actually how it works with the help of these demos. So we’ll be demonstrating each use cases. But I need a confirmation if you’re able to hear the voice of the video once I started.
The first is the reading of the handwritten text. So the importance of reading in our life cannot be undermined as it impacts every aspect of our life. We understand from education to employment during our school or colleges. We come across various handwritten notes right. So even during our employment or documentation where a person encounters meeting notes, applications, letters, which are handwritten, how can a visually impaired person can read this independently without taking help from other? There might be cases where you don’t want to show a personal notes or a letter to others, right? It might be personal. So let’s see how we solve the problem. How it does it the camera captures the image of the handwritten text and send the image to the OCR engine on an AI cloud to detect and extract the text from the image. And the response is then sent back to the smart class in the form of audio. Let’s see.
So see this is currently it’s reading the handwritten text you might not be able to listen, but it is reading an assistive smart class whatever is written on the text it is reading it.
It’s how do we read the text? So we understand similar to the handwritten text we also can read a printed text is the reading experience is also very important. So it’s not only reading you need to understand how we can go front to the front of the line to the para how we can read line by line in a similar manner how the mainstream does so this particular demo will show how the gestures can be used to play and pause the reading to change the reading mode to the line by line or to paragraph how to do backward front and how this can be read. So because you are able to listen to the voice I will just quickly show some of the gestures
See you can do these gestures are there the up and down and these gestures you can do
and you can play and pause it the readings. So so these are the kinds of gestures you can use for this and apart from the reading text we understand. There are many people who are visually impaired who have done the schooling in the native language so even they’re not familiar with English. So it’s very important for them to translate the language or whatever they’re reading into their native language even when they go outdoor. It’s required to understand the language of a printed context of the signboard receptor tickets or restaurant menu. So this also has a capability to read whatever you’re reading, like this particular video, which I’m showing might not be having an audio is reading the same text which is why reading in English in Hindi. So this is reading in Hindi.
Next one is the face recognition so the inability to see and recognize a person can hinder a social engagement we understand it also affects your privacy and security for the visual channel. So it is difficult to understand the non verbal communication such as expressions body language, mood, and comprehend what others are feeling currently to initiate and establish the communication. So for the facial analysis, what we have used we have trained the face that model using internet datasets and utilize that to recognize faces. This MTCN face detection algorithm is used to detect the face from an image and detect faces that are then pass to the train face that model which gives a face in coding. So this face ID didn’t capture when the encoding of a person is available in the database. So it has a capability so the first time it understand the name and stores it the next time this person come will recognize the person and also recognize the mood of that person which is very important. Another use case is the surroundings how it’s very important for a visually impaired to understand the surroundings and identify objects nearby without touching or an external assistance. It’s important for them to have a clear mental image so that they can easily navigate and find stuff which they’re looking for in this particular video. When it gives the command it added it identifies the surrounding objects and the machine is reading that it identifies the person who is on a computer sitting chair table and indoor environment. Similarly for an outdoor also it reads whatever is the outer surrounding.
And then for the currency detection, we also have the model train for identify the currency where you can double tap their different gestures voice and we can double tap on it and it identifies the note which is there to identify the 500 rupee notes and also other notes.
The next use case so we are going to have to stop in two minutes. Would you like to wrap it up and take some questions? Yeah, sure. Just give me two minutes. I’m just about to conclude. So another use cases is where we can give a gesture voice command to control the home appliances. Like in a tube light or a fan can turn off and turn on the different lights. So just to conclude is so this is our humble efforts and it’s currently still under research. There are not many more use cases which we have identified to be worked on due to COVID you could not conduct the comprehensive user study. However, in the control environment, we could successfully compete I want to see and get the feedback from the thesis associates who are involved in your study. There are certain things like for a person will not when we as of age, it identifies gender as female for a male. The currency detection if the notes are folded can be improvised even for the object detection on the in the condition of lowlights. It has to be improved. So just to conclude one more point. So whatever you have demonstrated I just indicated use cases our vision is to add a lot more scenarios to empower user to lead an independent life and a dignified life. This solution will not only resolve the challenges faced by the visually impaired but will also help to enhance the experience of a mainstream business. Thank you so much.
Attendee: So I just have a couple of questions. One is what is the like around the weight of this device and whenever you will be putting it out in the market. What are you planning to like what would be the pricing like?
Anil Prabhakar: Okay, let me add a third question to that. Which is where is the processing being done? I mean, is everything connected only to the Raspberry Pi Zero and everything happening on that or are you actually connected to something and then doing all your recognition.
Kunal: That is the uniqueness that processing is done in the Raspberry Pi itself. We are not connecting or not doing any processing on the laptop or the machines. And to answer the first question that device weight is very light, it’s like less than a credit card kind of size and weight. So it can be easily mounted on the glasses it will not affect the user experiences. But to answer on that how we are bringing it on the market and all I guess that is something which TCS is working on we need to go with that organization how they want to approach and put it in the market but currently it’s in a research phase and we are working on it to improvise a lot more things into it.
Anil Prabhakar: So the battery pack is kept separately is that right?
Kunal: The battery pack is attached to that Raspberry Pi if you have seen my screen. You see this? It’s the wire hanging and this power bank can be put in the pocket.
Attendee: And just one more thing the glasses that you’re mentioning are they like special glasses for this device? Can any normal spectacle the glasses be used for that
Kunal: So you can see right? It’s a very normal glasses which we have used and we have put this processing unit with the help of magnetic clip over it. But the vision what we have is we are also wanted to do, like procure the actual smart glasses and we’ll also try to see if we can put our algorithms into the smart glasses for this but that can be a costly affair and all but currently this can be fitted into any glasses.
Attendee 2: You said that all processing is done on the zero on the Raspberry Pi Zero. But in your earlier slides, you said you’re connecting to the internet and getting data
Vijay: Processing as in capturing the image, processing for voice command, then capturing the image and then finally, we are sending this image to a cloud server for accessing the AI ML services.
Attendee 2: So and you expect to connect to the web through your smartphone. So should always be connected to the internet?
Vijay: Yes, it should be connected to the internet to access. For example, you cannot do offline you have to be connected to the internet. Yes, you have to be connected we can connect your mobile or mobile.
Anil Prabhakar: Okay, I am going to definitely stop here although it’s very interesting.Thank you. Also, we just pretend we are hearing impaired.