The Government of India has launched an app, Aarogya Setu, that sits on your smartphone if you download it, then shares data about you via Bluetooth and GPS to a central server, which in turn allows you to know if you have been in contact with a COVID-infected person. An app like this could be very useful at a time like this, but it also raises questions about the way the data are gathered and stored, and what happens with them in the longer term.
We speak to Lalitesh Katragadda, the founder of Indihood and an iSPIRIT Fellow, an organisation of IT tech companies, formerly with Google, who has worked on this app; and to Raman Jit Singh Chima, a senior international counsel with the non-profit Access Now, and co-founder of the Internet Freedom Foundation, and former India Policy Lead at Google.
Lalitesh, tell us about this app and how does it work and how is it designed? I know you are a maps expert—you have done a lot of work in mapping, building the mapping architecture at Google. So that is your specialisation and obviously some of that has found its way into this app as well.
LK: The app is very straight forward, it is not a very sophisticated app. There is a level of sophistication that will develop over time. It does three things. One is it allows people to register themselves; answer questions, when they choose to, about how they are feeling, whether they were in touch with someone who was infected [known to them] or they have travelled internationally--of which we take this very complicated chart that ICMR [Indian Council of Medical Research] has created about assessing people and assessing yourself, whether you are at risk or not, and using very simple chat-like questions, it determines that and tells you immediately whether you are fine or not, whether you are at risk and need to take some precautions, or you are at a high degree of risk and need to get tested immediately. That is one thing it does.
The other thing it does in the background, as soon as you register, is it starts tracking two pieces of information. One of them is whoever you come in touch with, who is also using the Aarogya Setu app, by examining Bluetooth interaction. The actual user information is not shared, a de-identified-ID (DID) is created to notify you if you have come in contact with an infected person.
Also, every 30 minutes, it measures your latitude-longitude, which is stored on the phone. The only information that is going to the server is your self-assessment, specifically when the assessment is showing you are unwell or at risk, along with the latitude-longitude of where you took the test, so that if testing or some other intervention is required, the authorities potentially know where you are.
What is the use of all this? If it so happens that one of the persons you came across and spent significant time with, the app will assess that you are at risk, and allow you to know it much earlier than any PCR [polymerase chain reaction] test may; it will allow you to quarantine yourself, keep your family safe; and if you were close enough to need a test, it will advise you to get tested and the health authorities will consequently help you get tested much earlier.
If you are like me, staying up late nights and reading research papers about this infection, there are two things that are material about what we are discussing. It is all about the biology. One is that this is contagious when you are asymptomatic--a lot of people who are infected never become symptomatic, but they are still contagious. The other one is that when you get symptomatic or when you get infected, the sooner you get treatment the better, before you have this seizure of the lungs (some people are calling it the cytokine storm; some people are calling it haemoglobin reaction). The sooner you get the treatment, the more likely you are to recover and get better. So this is all a race against time and the whole idea of the app is not that we will immediately see benefits when we install; the idea is that if a large number of people install, we all use it for the next 15, 30 or 40 days (however long this crisis takes), whenever the pandemic peaks, the app will allow us to control, trace and quarantine people much earlier and contain it earlier than we would otherwise. That is what it is doing both from a back-end and the front-end.
As an architecture, was there a specific intent to combine the medical part with the geolocation and the location part? Following and tracking the disease is a more complex phenomenon than just knowing whether there is an infected person around you.
LK: There are three parts. Let me first answer your last observation. The moment somebody has a disease, unless they are taking the phone and running away out of quarantine, they are not going to be “in the field”. So, detecting someone near you having the disease is unlikely; what is more likely is that you came in contact with that person before it was discovered that they have the disease. That is the reason we are using Bluetooth.
The reason why we are doing GPS is: if we have sufficient data from multiple people, who were later diagnosed as having the disease (who were all using the Aarogya Setu app), it allows us to very rapidly identify hotspots. Whether this infection happened in a coffee store or near a kirana store or some other place where people were working, it will become evident much more rapidly within hours of the disease being detected in the people rather than within days of tracing work that health authorities and surveillance workers are doing now. We have the capacity to do it today when a few thousand are detected, but the capacity will disappear, if something like what is happening in Europe (God forbid) happens.
Raman, what are your first thoughts on this app? Applicability, the way it is positioned and then rolled out.
RC: Whether you are a technologist, state government or federal government, you are trying to take urgent action. However, the adage that works in the tech space--move fast, break things and then maybe patch them later, launch and iterate--can be more problematic when it comes to the public health space.
More specifically, there are a couple of different things--remember that we had a contact-tracing app very specifically, in Singapore’s TraceTogether app--which earlier this week has completely open sourced. You can review and see the code base--but for a very specific purpose, for users to know if people nearby have self-declared to be infected and are being cautious about that. We already have learnings: a small percentage of Singapore's population use that app.
In India’s case, what we are doing right now is somewhat unprecedented, perhaps more equivalent to the Chinese intervention in terms of COVID tracing, where there is ratification from mobile devices, sharing with the central backend, which is also collecting location data, and not using it for a user to just know what is happening, but the government or rather the public health authorities can look at potential hotspot tracking or location tracking. There is much more data that they collect, perhaps than in many other places.
Why I am cautious about this is that, one, it does have implications. For example, even currently, it is not just about the data stored on your phone, it is about the data stored centrally at the government's end, whether it is the NITI Aayog or some other agency.
(Editor’s note: It is the National Informatics Centre.)
Who is keeping that data, who has access to it, who will be hosting it later--right now it is hosted on AWS [Amazon Web Services] and I am assuming if they are porting it later, then who is actually in control of that, what has been kept there, right now there is no clarity on what data is kept after the pandemic finishes.
But let us talk about during the pandemic itself. There have been a lot of useful previous examples. For example, ‘tracking Ebola’, from South Africa, using a data-driven approach--using call data records, massive geo locations of cell phones--where in fact the postscript, after-action analysis was that it was not very helpful. In fact, it may have been counterproductive, may have confused things more. Therefore, we must be more cautious here.
Contact tracing is an experimental step in terms of the app-based model; it is being tried for the first time; I sometimes think it is worthwhile to look at what public health people are saying. One fact that Lalitesh himself said is the app is only effective if 50% or more of India’s total population uses this. So, one must see how that happens. And if that sort of urgent step is taken where, basically the majority of our population is being asked to install this, we need to be cautious. Would it leak data in other ways, would it be subject to malware, what happens to the data afterwards. More importantly, even today, it is said that it will let you know about your possibility of being infected. Is that a human-based intervention coming from the ICMR-[run] state health agencies? Is it an algorithm coded between NITI Aayog, NIC and volunteers? These are important questions.
Essentially, what I am saying is independent of this app, you need to be a bit cautious of tech-solutions when it comes to public health. While it can be a useful assistant, I am still not sure that it can do all sorts of different things we plan to do, and most troubling in India’s case is that it is trying to do a lot in one single app or one single dashboard. And that makes me generally wary.
Lalitesh, is India trying to do too much with one app?
LK: Well, we are trying to do whatever is necessary. Too much or not, I think time will tell. I think the more important thing I am focused on is whether this is going to be effective. That I completely agree with Raman--digital for the sake of digital is pointless unless you make it effective.
See, there are a few checks and balances that are in the system. One is, the information you are storing on the phone is only pulled out when it is determined that you are infected using a virology test. And in rare cases when it is determined that you are at very high risk because of [deep] proximity. So that percentage, if you take the entire population of people who are registered, that suppose tomorrow we have 400-500 million people registered--presently we have 6,000 and even if the number goes to 100,000--and their cohorts. Presently, we are seeing a cohort ratio of about 3-4. Then at that point, we will have half a million records downloaded on the server, the rest of them will remain on the phone. We are not downloading everybody's information and we cannot.
You do not know this, but the team inside, the volunteer team has been going it day in and day out--battling it out figuring what is the privacy edge we can walk and what we cannot walk. I think we have spent more than 40% of the effort battling privacy and less fighting the code and the app because if we downloaded all the data to the server and ran this, we would be able to do a much easier job. The algorithm is very very complex because we are minimising the amount of data that we are downloading.
And the other thing is--to Raman’s point--I am old enough to have a lot of mistakes in data science. If you write enough algorithms, you will realise that most of your algorithms do not work the first time. So none of this is rolled out--we have written the algorithms, but we have not rolled them out.
One of the reasons we have not taken the Singapore approach, one of the reasons why we are very wary of the Apple and Google approach, is that it directly informs the user and tells them that they are at risk. In a country like India, that can cause mass panic. So, we are not doing that. We are detecting that if there are a potential set of people, who might be in proximity, we are going to run that information by the health authorities. Possibly even, the first few days we get enough information, we do on-ground testing to see if this algorithm is doing anything useful or not. If it is not doing anything useful, then we will discard the data.
And what about the longevity of the data? Do you know or do we know what is going to happen to this data July-August onwards?
And the other three constraints are: if you are never shown to be at risk through self-assessment or coming in close contact, we throw away that data on a running basis in 30 days. Even on your phone it is wiped out within 30 days. If you are determined to be at risk, that window is expanded to 45 days. If you are virologically proved to be infected--where the virology test says you are infected, after you are cured, we keep the data for 60 days for post-analysis, and then all the non-anonymous data is thrown away. And to ensure anonymisation (and not shallow anonymisation because anonymised data can be de-anonymised, right?), we are mixing up data of multiple people, in the space of 50-100 people, and bucketing it in large geo-fences like 100- to 200-metre grids in dense urban areas and even larger grids in rural areas. So, there is actually data of 50-100 people in each of these buckets, and then anonymising it and keeping that for research. Rest of the data is getting wiped out.
So, there is either a window, after which the data is gone.
LK: Yes, and there is one more thing, and I think this nuance is being lost. Most of the data never makes it to the government.
Right, what is another app that could compare it with, Lalitesh? For people to understand what this is like? Where the data is primarily sitting on the phone and the exchange is only happening with the server.
LK: Most of the apps I know, like Google Maps and Facebook and so on--all the data flows to the server site/side. I do not know of many apps, of this scale, where the data sits on the client side [on the phone].
Raman, two points. Lalitesh’s assertion that most of the data is going to be sitting on your phone and not getting transferred to the server. And the second, that there is a window for all this data to disappear. Your comments?
RC: The point to remember is: who is running this? Who is accountable for this and how it works? The reality of course with the government or say, with the CEO in a company, is that they can set their own rules, unless they are controlled by something else--like you can say the Board in the case of a company, or in the case of the government, the Parliament or something else. Why I mention this is, it is an important point. There has already been tension between the state governments and the Centre. Even within the central agencies--is the National Disaster Management Authority in charge of this, is it the NITI Aayog, is it the NIC, is it governed by the Epidemic Diseases Act or the National Disaster Management Authority Act, is the important question the government needs to answer.
The second thing: some of the data that is being collected--and again I put that caveat, I am a lawyer who spent a lot of time with product engineers and others--I would be cautious about collecting large amounts of location data for predictive or later trend tracking. Because you want to potentially use it for other things. Even the Singapore TraceTogether app has had that criticism, where people have said that it is very clear that the app will not continue after the pandemic, that you can make a data deletion request. But there have been points where even the Singaporeans have raised concerns saying, that well, what if the Government put a retention order on it, how do I challenge it? Just imagine they are trying to do everything genuinely, but some officer for some reason decides to mess up...How do you escalate that?
In Aarogya Setu, it is not fully clear if you can make a data deletion request. But also, to what Lalit has just said, it is hard to find a parallel about what is going on here--the amount of data collection that is happening, what is being shared, what is not. You need to have an open conversation; you need to ask other people to have an open review and understand what the processes are.
More importantly, if something goes wrong with this--my worry is not even the civil liberties consequence, but the public trust consequence. In a large diverse country like ours, if something goes wrong with this, you will see so much skepticism and worry from the general population across different states or working with the state governments or working with the ICMR, we should be very wary about it. Globally, there is a sense that the governments are jumping ahead too much and focusing on these app-based solutions and perhaps not focusing on other elements, other key public health interventions. The government must open up this process a bit more, get some of these questions answered not just by the engineers but by the government itself.
Raman, so are you OK with the concept, OK with what seems to be the execution at this point, as long as the process going forward is made sufficiently open and there are opportunities for people like you to jump in and ask these questions and get them answered?
RC: Noting that if it requires 50% of the population to use it and therefore requires a lot of government resource and political energy to get everyone behind it, I would question if that is the most important thing to do right now, when we are having other data concerns with the government itself, right? For example, release of COVID data and whether it is accurate? From a public health policy perspective, if this is what you want to double down on right now, I would be wary and note that there are alternative approaches. [You] need to be sure what you need to build because that is going to take time and energy on behalf of the government, which at this point of time is perhaps the [most] precious commodity of them all.
Lalitesh, can I give you the last word to respond on these two points: One, who owns the data today and likely tomorrow? And from the point of data deletion, how do I as a citizen ensure that once things are OK or I am better, I want to get out of the system in a way I understand it across this country and it is easy to do.
LK: I think the data deletion policies are very clear. If you uninstall the app, the data is gone, because data is sitting on [the user’s phone]. The other point that is being raised is slightly untrue that this app is useful only if 50% of the population installs it. The app is immediately useful to you the day you install it because you can report your symptoms and get help. This app is useful for figuring out where the hotspots in the city might be, completely anonymously. Even if 10% of the people install, because that is what it takes statistically--to tell where people might be or locations where the infection may be spreading. For contact tracing, you need to get to 40-50%, but again, that data is on your [user’s phone] and is as safe as the phone [with the user]. Having said that, there is a fear of Big Brother and the fear of the government changing things later and that is a question I am not qualified to answer.
To answer one specific question, this data is currently under the control of the NIC, even though the servers are not in the NIC, the data is being controlled entirely by the NIC--this is a Ministry of Electronics and Information Technology (MeitY) app (NIC works under the MeitY)
This app reports to someone...is it the Ministry of Health?
Ya, the ministry of health is a stakeholder, but the entity responsible for maintaining the app and making sure that it works the way it is supposed to is the NIC, inside the MeitY.
We welcome feedback. Please write to email@example.com. We reserve the right to edit responses for language and grammar.