Bengaluru: The Union government has reportedly formed a Standing Committee on Statistics (SCoS) to review and improve data collection.

Preceding its formation, in March and July 2023, there were two reports by the members of the Economic Advisory Council to the Prime Minister (EAC-PM); one examining the quality of the National Sample Survey (NSS) estimates, and another concluding that “domestic data agencies and statistical apparatus need to be overhauled in order to provide better feedback for policy-making as well as narrow the space for skewed estimation by international agencies”.

The National Sample Survey (NSS), National Family Health Survey (NFHS), the Periodic Labour Force Survey (PLFS) need a “major sampling overhaul to reflect the true status of India’s real economy”, wrote Shamika Ravi, a member of the EAC-PM.

Amidst the debate on changes, the former acting chairman of the National Statistical Commission (NSC), P.C. Mohanan, tells us that data quality is an issue in all surveys and censuses, and it has to be addressed through training, field supervision, awareness campaigns, working in a team, among others. “Intuitively I feel that the conclusion [in the July 2023 report] is not relevant in our context,” he said.

In January 2019, Mohanan, acting chairman of the NSC and another member, J.V. Meenakshi, resigned protesting the government’s refusal to make the 2017-18 employment survey public. The situation has not changed much since his resignation, he feels. “The NSC is a body that has very limited say in statistical reforms and no resources have been provided to it for its effective functioning.”

Mohanan, who is now heading the state statistical commission in Kerala, talks about the changes that need to be made to improve India’s statistical systems, data quality, and the need for critical evaluation of certain statistical operations.

There has been criticism from members of the EAC-PM on India’s statistical system. The government has set up a new panel to review all National Statistical Office data replacing the Standing Committee on Economic Statistics (SCES) formed in 2019. You had said in an interview with us in May 2019, soon after you quit as acting chairman of the NSC, that “it was the feeling in the commission that it was being side-lined in key statistical matters, including the launching of major statistical initiatives”. How do you assess the latest attempt to improve the statistical system given that there have been previous reports and commissions for reforms in the statistical architecture. Also, there was an expectation, in 2005, that the NSC would be given statutory backing within one year, which did not happen.

There is nothing new in the constitution of the present SCoS by the MoSPI. Such committees have always been there. Since the formation of the NSSO, its affairs were looked after by a Governing Council with independent non-official chairman and official and non-official members. The Advisory Committee on National Accounts has closely monitored national income-related work since the beginning. In the past, recommendations of such committees were taken very seriously and the ministry [MoSPI] rarely interfered with them. Once the NSC came up in 2005, the setting up of such committees were supposed to be its remit and they were supposed to report to the NSC. However the committees are now appointed by the ministry and are to report to the ministry.

From its Terms of Reference, the present Committee, appointed by the Ministry, is seen to be more for the conduct of surveys; a role that NSC used to perform. So this committee has a very significant role in conducting surveys, but is unlikely to make substantial reforms in official statistics.

The situation has not changed much since my resignation [in 2019]. The NSC is a body that has very limited say in statistical reforms and no resources have been provided to it for its effective functioning.

A July 2023 paper from the EAC-PM tries to highlight the problem of data quality in the NSS in three estimates–the proportion of the rural population, the proportion of the Scheduled Caste (SC) population, and the proportion of the working-age (aged 15 to 59 years) population. It argues that an increased sample size or “the bigness of the data cannot address issues related to data quality”. How significant and acute is the problem of data quality in NSS surveys?

Data quality is an issue in all surveys and censuses. Issues in survey responses are very real and the changing lifestyle, family settings and economic activities are all likely to make it more difficult. These are addressed in several conventional ways like training, field supervision, awareness campaigns, working in a team, back checking etc. The paper from EAC-PM approaches this issue in a theoretical context developed in an entirely different setting. The work on which their paper is based was developed in the context of the 2016 US presidential election and the emerging big data scenario. The NSSO surveys are not designed to directly estimate the percentage of rural population or the number of SCs/STs or the proportion of the working-age population. These are indirectly derived from the survey data. Users are well aware of the limitations while using it.

To ensure that a survey has adequate coverage of the topic of interest and is broadly representative, the population is stratified at the level of districts, villages and households’ before the sample is selected. This approach is followed in all national surveys like the NFHS. These are multi-subject surveys that cover a large number of variables, unlike an election survey. Sample size does play an important role to ensure most of the key variables can be estimated at state or even district levels. NSSO publishes the survey-based margin of errors for key indicators, which the data users should refer to before commenting on the survey estimate.

One would usually expect such research [referring to the July 2023 paper from the EAC-PM] with debatable findings in an academic setting.

What would be your specific comment on the overestimation of the SC population in the survey compared to the population census? The paper says that household consumption expenditure might be exaggerated, given that it over represents the rural and the SC population. What is the concern in comparing the census and the NSS?

The NSSO does not draw a sample at the national level, but at a stratum (district) level separately for rural and urban areas. The estimates are made primarily at the stratum level and aggregate upwards. To me, the EAC-PM paper seems like they are drawing one sample for all aspects.

The NSS is not meant to estimate the urban or rural population. The urban and rural areas are based on the demarcation available on the date of the survey. The estimates are [in the form of] percentages [of households] or ratios of various variables in NFHS and NSS. Absolute numbers [of the rural and urban population] are not available. Users apply these estimates based on the numbers available from projections for the urban and rural population

But for the SC and ST population there are other issues. The census uses a legal and administrative approach. They check if the caste or tribe is listed in the state and if it is not listed you are not enumerated as SC or ST. But the NSS records it based plainly on the response from the respondent. So the census may give a slightly lower estimate compared to NSS. This has nothing to do with the sampling process.

Are there definitional challenges in categorising urban and rural areas in the census and statutory towns?

Urban and rural areas have a standard definition and specific geographical demarcation. The state government may notify an area as a municipality, corporation or town. The census will categorise an area as urban based on population and density and non-agricultural employment of males etc. Those areas meeting the criteria will be declared a census town and will be stated as an urban area for all surveys. There is no confusion between urban and non-urban. The issue comes when, say census 2011 declares [a new or altered] census town, it is not immediately possible for NSS to identify these regions. It will take a year or so to prepare boundaries and urban frame maps. The statutory towns are easy to identify.

The EAC-PM paper compared 2011 NSSO data that used the urban delineation from the 2001 census for the 68th round. That is another reason why urban data will be underestimated (the underestimation is around 18% for urban and 6% for rural). Further, the population is estimated as average household size and the number of [total] households. Census and NSS estimation of the number of households are generally similar. But the average household size is lower in NSS, so the population will be estimated to be lower.

Intuitively I feel that the conclusion is not relevant in our context. NSS, as I said, samples at the district level and there are many variables at the household level. It is not possible to have a sample number determined for any one variable. A sample size for employment may not be applicable for unemployment. The kind of simplification in the paper does not look alright to me.

Bibek Debroy, chairman, EAC-PM had written that “every questionnaire (both census and survey) can be streamlined and simplified so that it is completed in 20 minutes. How many people have the time and inclination to respond to a questionnaire that takes two hours to complete?”. Is this possible or, is there a disconnect about what can be achieved on ground?

It usually takes 20 minutes to explain your [survey] purpose to a household and identify resident members and their basic data. I do not understand [how this can be done]. There seems to be a disconnect between what is being said and what happens at the field level. No doubt respondents do not give time for surveys and their cooperation cannot be taken for granted. A balancing of survey cost and items of data to be collected is necessary. Technology can support transmission and processing, but data has to be ascertained first through personal contact and dialogue. Building rapport with respondents before data collection, which is important, takes time and cannot be done in 20 minutes. And then most surveys in rural India are not done in the privacy of a home. Investigators need tact and experience to deal with different survey environments. Statements like this do not do justice to the professionalism of investigators.

The July 2023 Niti Aayog National Multidimensional Poverty Index (MPI) finds that nearly 136 million people had exited poverty between 2015-16 and 2019-21, using NFHS data. It added that this was the “fastest decline in percentage of multidimensional poor in rural areas” from 32.6% to 19.3%, based on data. Your perspectives on the report’s findings?

It is interesting that the Niti Aayog finds the fastest decline in multidimensional poverty using the very survey data that the EAC-PM finds fault with. The multidimensional poverty index depends to an extent on the choice of indicators and there is no universally accepted basket of indicators. There have been great improvements in indicators related to banking, drinking water, sanitation, housing etc in recent years and their presence in the index positively reflects in the overall index.

The advantage of NFHS is that all relevant information is collected from the same household. Clubbing data from two or three surveys will make MPI computation difficult. MPI looks at characteristics of each individual using different variables. There is no alternative than using NFHS [for MPI]. The irony is that the NFHS is criticised.

We have been using income-based or expenditure-based poverty measurements. MPI is an innovation of the United Nations Development Programme. It is easier to collect this data compared to income. But the actual poverty indicator should be income or expenditure-based, but I do not see a major issue with using MPI. However one may question the methodology used in MPI like assuming no deprivation in households where the specific indicator does not apply.

While the government says it is concerned about the NSO data, there is no headway in conducting the population census. Legally the government can decide when it wants to conduct the census, but practically, administrative surveys depend on the outdated 2011 census and this lack of data has serious implications on policy making and welfare access. Your comments?

In India, the conduct of population census is an administrative exercise under the control of the Ministry of Home Affairs (MHA). In most countries, the census is the responsibility of the national statistical offices. Lack of recent census data has severe implications on policy and other data collection systems. Most of the claims of beneficiary coverage of several programs are based on projected population. Large scale human migration and urbanisation are now unaccounted. As we move ahead, most of our debates and policy would be based on uncertain supporting evidence.

Census data has also obvious political uses. The idea of using the population census to prepare administrative registers like the National Population Register or National Register of Citizens is also a matter of concern as this could impact the census data quality. However the government should clear the air around the conduct of the next census at the earliest.

In response to the March 2023 EAC-PM paper reexamining estimates of development indicators by international agencies, you wrote that “more obj­ective assessment of the methodology and database would, however, help improve the statistical system,” instead of tailoring of data to reflect a specific narrative. There has been a suggestion for having an independent office and regulator like the CAG so that the apex body of the statistical system is not dependent on the government and affected by narratives and politics. How feasible is it in your opinion to create a statistical body with statutory backing given that a Standing Committee on Statistics has been formed and NSC does not have statutory backing?

A recent study by Pramit Bhattacharya for the Carnegie Endowment for International Peace Foundation has dealt with this subject in detail and talked about the need for a statistical reform commission for India. The reforms, including the formation of the NSC, till now were half-hearted. The system needs institutional restructuring and resource induction both in terms of manpower and funding. One would expect the EAC-PM to understand and address these issues which could help the statistical system. Unfortunately their efforts were more to fault the national data systems for their failure to highlight the all-round progress in recent years.

We have a highly decentralised statistical system in the country. A strong and effective technical apex body alone can control the statistical operations now spread across ministries and the states. Certain statistical operations, including many national level censuses, have actually outlived their utility. The different types of censuses like agriculture or economic census are expensive operations. We have to evaluate if we need to conduct these in the present context? Agriculture census started in the 1950s when there was hardly any data relating to agricultural holdings or land distribution. We need to decide if we need a census or a large sample survey would suffice. There needs to be a critical evaluation of the utility of these operations.

Large administrative databases need to be integrated and made part of the statistical system. This requires better meta-data management practices. An empowered apex body with adequate resources and expertise can provide the right environment for change. Piecemeal reforms are unlikely to help. The standing committee or the NSC in its present form is not enough to meet the challenge.

An apex body can also help resolve criticisms on data more objectively.

Do you feel that all these operations be merged under one statutory body for statistical operations? Or should respective departments be handling it?

Presently the population census is the responsibility of the MHA while respective departments like the agriculture department handles the agriculture census. If the requirement is for an efficient and economical statistical system, all operations must be under the control of a nodal statistical agency. In most parts of the world, the population census is conducted by the statistical office.

If we have an empowered statistical commission with statutory backing including financial and human resources, most operations can be brought under its purview. It can be decided if the design is like the CAG’s or like the Election Commission's. [Otherwise] the ministries will continue operations as it is because it has the system and staff in place.

You are heading the state statistical mission in Kerala. Based on your NSC experience and dealing with controversies around surveys and data, what is the focus at the state level to make the state statistical system robust? What role must states play at a time when there is more and more data from various sources including non-administrative sources?

The statistical system in most states has suffered from neglect and lack of resources. The state statistical systems only play a secondary role and are dependent on central agencies for guidance and in most cases funding. Some of the statistical activities that states do on behalf of central agencies are not particularly useful to states.

To cite a specific example, the compilation of Gross State Domestic Product (GSDP) is extremely important for a state since the state’s financial performance, especially borrowings, are linked to it apart from measuring the economic growth of the state. Successive finance commissions have stressed the need to improve the GSDP estimation. A large portion of the data used by states for GSDP compilation is actually provided by the NSO. Many of these supplied by the NSO are not necessarily based on data collected by states and quite often are outdated. Such indicators can be updated through state-level surveys. But the states do not have sufficient resources for conducting state-level surveys.

Unfortunately, data produced at the state level do not have much visibility unlike that of say NSSO or CSO data or any central ministry. Not much research is done using state data. This means there is not much pressure on state agencies to improve the quality of data they collect or publish.

One would normally expect the national data to be an aggregation of data generated at lower levels, like panchayat or urban body, which are aggregated to block, district, state and national level. But this is not happening. Even where the states are fully responsible, like for the agricultural production data, the approval of the central agencies is essential. To an extent this is necessary to ensure that quality standards are met, and data undergo some external scrutiny as the states do not have any autonomous agency to screen their data. But all these make the state agency wholly dependent on central agencies for data in most sectors.

The state systems need a complete overhaul and their priorities need to be recast. In this process some of the existing activities can be curtailed and new activities taken up in their place. Though most states have good IT capability, the availability of hardware and software are inadequate due to lack of funds. In my opinion the areas where the states should concentrate are (1) improving GSDP estimation, (2) strengthening survey capacity (3) better use of administrative data.

We welcome feedback. Please write to We reserve the right to edit responses for language and grammar.