斯瓦赫数据任务
NITI Aayog人工智能(AI)政策的定义特征是关注社会部门和赋予公民权力。“人人享有人工智能”的座右铭强调了这一点。该政策旨在关注“全民繁荣……实现更大的利益”。
NITI Aayog人工智能(AI)政策的定义特征是关注社会部门和赋予公民权力。“人人享有人工智能”的座右铭强调了这一点。该政策旨在关注“全民繁荣……实现更大的利益”。
The AI policy was enunciated in a working paper \u201cNational Strategy for Artificial Intelligence #AIforAll in June 2018. NITI Aayog had an allocation of Rs 7000 cr until 2024-25 and MEITY had allocated Rs 400 cr for the same. Its subsequent working document \u201cTowards Responsible #AI for All (https:\/\/niti.gov.in\/sites\/default\/files\/2020-07\/Responsible-AI.pdf) focused on the design implications of the algorithmic aspects of AI. The paper identified the challenges, principles, enforcement mechanisms, standards, structures for implementation etc., for the above.
<\/p>
While algorithms are integral to AI applications, an equally important part is data. This is because AI applications require large, complex datasets consisting of both structured and unstructured data across several domains. Large datasets are broken into training data and testing data. Using training data, the software \u201clearns\u201d to detect patterns in the underlying data. Subsequently, the testing data is used to validate the learned algorithms or pattern matching techniques. These are then fine-tuned to develop more robust AI solutions for prediction, planning, and allocation of resources.<\/p>
Datasets that may be used for public policy purposes are available with government departments\/ministries as they typically undertake implementation of public services. Since the government is getting increasingly digitized not only through new initiatives such as Smart Cities and Digital Health, but also in other sectors, large amounts of digital data is available. However, to give fillip to socially relevant AI app development, tap expertise available across sectors and support start-ups, such data should be made available publicly, with due focus on the privacy of individuals as per the Personal Data Protection Bill. The existing National Data Sharing Accessibility Policy (NDSAP) policy for facilitating access to GoI data and data created with public funds can be the basis for access to government datasets. As per NDSAP, a variety of data such as raw, derived, spatial and non-spatial is mandated to be in a machine readable form, periodically updated and proactively available. Seeking proposals for AI applications in the relevant domains can then be done through hackathons, regular processes or contests, using the available datasets.
<\/p>
Of course, to be able to use this data for AI, not only does the data need to be accurate and properly formatted, it is also required to be representative of the underlying context. For example, it has been found that AI applications based on testing datasets that predominantly contained facial data of white people did not recognize black people. Similarly, in our context, if training datasets have data only from a particular region, then predictions for another region may not be accurate. Such requirements are more critical, when dealing with different socio-economic categories. Thus, having representative and inclusive datasets is important. However, collection of data in the government often does not consider this dimension explicitly. The data.gov.in website, consequent to the NDSAP has fragmented data. Some states are represented, others are not. Data for some years is available, and not for others.
<\/p>
Further, NDSAP data is available at different levels of aggregation. Some of the data is at the district level, other at the state level, with the latter not being available at a lower level of aggregation. Further, data is available for varying parameters across different states\/districts. Most of the data is dated as few, if any data sets are updated. Some datasets do not indicate the year of the data, essentially rendering it useless for any decision-making. Further, some datasets do not have raw data, just percentages (as a number, not as a formula in Excel), limiting flexibility in analysis. Further, due to the fragmented nature of the data and differing formats across departments, it is not possible to link such data for analytics purposes, without a lot of heavy lifting for making it consistent. For example, to have a policy for decreasing drop-outs or increasing gross enrolment, policy makers may need to examine not only availability of teachers, schools, education level of parents, income etc, but possibly link this to availability of nutrition, teacher incentives or mother being a part of a self-help group. Thus to leverage AI to understand the underlying parameters or patterns that contribute to the required policy outcomes, analysis has to be done across various ministries. But, doing this kind of analysis is often not possible, due to the above cited reasons. Since there is little focus on data quality at the stage of capture, its quality is suspect as often data is first captured manually and then transcribed into a digital format.
<\/p>
Thus, using AI for policy-making requires an overhaul of the processes for data collection, storage, standards, formatting and dissemination. Although, the initiatives of the NITI Aayog for AI are commendable, it also needs to design a framework for management of data thatr could facilitate its usage for AI. For example, the only data available under Farmer Welfare for a particular state is the District and Month wise Queries of Farmers in Kisan Call Centre (KCC) from 2010-2018, under nine different catalogs. This kind of data by itself does not support policy-making. It is also not clear why this data is not available year wise in a single catalog. The above exemplifies the gaps in a holistic approach to the importance of relevant and high quality data.<\/p>
Since data is the basic building for AI, the focus of AI for All policy also should be to identify mechanisms for developing schema for the metadata, identifying the interlinkages between data elements, and for collecting, verifying and updating the data in all ministries and government departments. For this, a review of existing processes for these aspects has to be undertaken. And, given the cascading effect of the resultant clean, properly formatted, maintained and updated data would have on being able to use AI for societal development and empowerment, this mission should be named Swachh Data Mission. It should have the same rigor and focus of the Swachh Bharat mission, if not more.<\/p>","blog_img":"","posted_date":"2020-10-13 13:21:05","modified_date":"2020-10-13 13:36:09","featured":"0","status":"Y","seo_title":"Swachh data mission","seo_url":"swachh-data-mission","url":"\/\/www.iser-br.com\/tele-talk\/swachh-data-mission\/4574","url_seo":"swachh-data-mission"}">
最典型的特征是镍钛Aayog的政策人工智能(人工智能),重点关注社会界别及增强市民的权能。这是由“普及人工智能”的座右铭。该政策旨在关注“全民繁荣……实现更大的利益”。它确定了五个重点领域:医疗保健、农业、教育、城市/智慧城市基础设施以及交通和流动性。它的重点是政府部门使用人工智能来帮助提高公民的生活质量,并大规模地解决印度背景下的复杂问题。在最近由英国首相主持的全球峰会RAISE(负责任的人工智能促进社会赋权2020)上,引用了一些试点部署和例子。
人工智能政策在2018年6月的工作文件“国家人工智能战略#AIforAll”中得到了阐述。NITI Aayog在2024-25年之前分配了7000亿卢比,MEITY分配了400亿卢比。其后续的工作文件“Towards Responsible #AI for All (https://niti.gov.in/sites/default/files/2020-07/Responsible-AI.pdf)”侧重于AI算法方面的设计含义。本文指出了上述问题的挑战、原则、执行机制、标准和实施结构等。
虽然算法是人工智能应用程序不可或缺的一部分,但同样重要的部分是数据.这是因为人工智能应用程序需要大型、复杂的数据集,包括跨多个领域的结构化和非结构化数据。大型数据集分为训练数据和测试数据。使用训练数据,该软件“学习”检测底层数据中的模式。随后,测试数据用于验证所学习的算法或模式匹配技术。然后对这些进行微调,以开发更强大的人工智能解决方案,用于预测、规划和资源分配。
政府部门/部委通常负责执行公共服务,因此可提供可用于公共政策目的的数据集。由于政府不仅通过智能城市和数字健康等新举措,而且在其他领域也越来越数字化,因此可以获得大量数字数据。然而,为了推动与社会相关的人工智能应用程序开发,挖掘各行业的专业知识并支持初创企业,这些数据应该公开,并适当关注个人隐私数据保护比尔。现行的国家数据共享无障碍政策(NDSAP)的政策,以方便获取政府数据和由公共资金创建的数据,可作为获取政府数据集的基础。根据NDSAP,各种数据,如原始数据、衍生数据、空间数据和非空间数据都必须以机器可读的形式,定期更新并主动提供。然后,可以通过黑客马拉松、常规流程或竞赛,利用可用的数据集,在相关领域寻求人工智能应用程序的建议。
当然,为了能够将这些数据用于AI,数据不仅需要准确和正确的格式,还需要能够代表底层上下文。例如,人们发现,基于主要包含白人面部数据的测试数据集的人工智能应用程序无法识别黑人。类似地,在我们的环境中,如果训练数据集只有来自特定区域的数据,那么对另一个区域的预测可能不准确。在处理不同的社会经济类别时,这种要求更为重要。因此,拥有具有代表性和包容性的数据集非常重要。但是,政府中的数据集合通常不显式地考虑这个维度。根据NDSAP, data.gov.in网站有碎片化的数据。有些州有代表,有些州没有。有些年份的数据是可用的,有些年份则没有。
此外,NDSAP数据在不同的聚合级别上可用。一些数据是地区一级的,另一些是州一级的,后者在较低的汇总级别上没有。此外,数据可用于不同州/地区的不同参数。如果有任何数据集被更新,那么大部分数据的日期都很少。一些数据集没有显示数据的年份,基本上使其对任何决策都无用。此外,一些数据集没有原始数据,只有百分比(作为数字,而不是Excel中的公式),限制了分析的灵活性。此外,由于数据的碎片性和不同部门之间的不同格式,不可能为了分析目的而链接这些数据,而不进行大量的繁重工作以使其一致。例如,为了制定一项减少辍学率或增加总入学率的政策,政策制定者可能不仅需要审查教师、学校、父母的教育水平、收入等的可获得性,而且可能将其与营养的可获得性、教师激励或母亲是否参加自助小组联系起来。因此,为了利用人工智能来理解有助于所需政策结果的基本参数或模式,必须在各个部委之间进行分析。但是,由于上述原因,做这种分析往往是不可能的。 Since there is little focus on data quality at the stage of capture, its quality is suspect as often data is first captured manually and then transcribed into a digital format.
因此,将人工智能用于决策,需要彻底改变数据收集、存储、标准、格式和传播的流程。尽管NITI Aayog在人工智能方面的举措值得称赞,但它也需要设计一个数据管理框架,以促进其在人工智能方面的使用。例如,在特定州的农民福利项下,唯一可用的数据是基山呼叫中心(KCC) 2010-2018年期间的地区和月份农民查询,分为9个不同的目录。这类数据本身并不能支持政策制定。同样不清楚的是,为什么这些数据不能在单个目录中按年获得。上述情况说明了对有关和高质量数据的重要性采取全面办法方面的差距。
由于数据是人工智能的基础,“全民人工智能”政策的重点还应是确定为元数据制定模式的机制,确定数据元素之间的相互联系,以及在所有部委和政府部门中收集、验证和更新数据。为此,必须对这些方面的现有程序进行审查。而且,鉴于由此产生的干净、正确格式化、维护和更新的数据将对能够使用人工智能促进社会发展和赋权产生的级联效应,这一任务应该被命名Swachh数据任务.它应该拥有与Swachh Bharat使命相同的严格和重点,如果不是更多的话。
免责声明:所表达的观点仅代表作者,ETTelecom.com并不一定订阅它。乐动体育1002乐动体育乐动娱乐招聘乐动娱乐招聘乐动体育1002乐动体育etelecom.com不对直接或间接对任何人/组织造成的任何损害负责。
The AI policy was enunciated in a working paper \u201cNational Strategy for Artificial Intelligence #AIforAll in June 2018. NITI Aayog had an allocation of Rs 7000 cr until 2024-25 and MEITY had allocated Rs 400 cr for the same. Its subsequent working document \u201cTowards Responsible #AI for All (https:\/\/niti.gov.in\/sites\/default\/files\/2020-07\/Responsible-AI.pdf) focused on the design implications of the algorithmic aspects of AI. The paper identified the challenges, principles, enforcement mechanisms, standards, structures for implementation etc., for the above.
<\/p>
While algorithms are integral to AI applications, an equally important part is data. This is because AI applications require large, complex datasets consisting of both structured and unstructured data across several domains. Large datasets are broken into training data and testing data. Using training data, the software \u201clearns\u201d to detect patterns in the underlying data. Subsequently, the testing data is used to validate the learned algorithms or pattern matching techniques. These are then fine-tuned to develop more robust AI solutions for prediction, planning, and allocation of resources.<\/p>
Datasets that may be used for public policy purposes are available with government departments\/ministries as they typically undertake implementation of public services. Since the government is getting increasingly digitized not only through new initiatives such as Smart Cities and Digital Health, but also in other sectors, large amounts of digital data is available. However, to give fillip to socially relevant AI app development, tap expertise available across sectors and support start-ups, such data should be made available publicly, with due focus on the privacy of individuals as per the Personal Data Protection Bill. The existing National Data Sharing Accessibility Policy (NDSAP) policy for facilitating access to GoI data and data created with public funds can be the basis for access to government datasets. As per NDSAP, a variety of data such as raw, derived, spatial and non-spatial is mandated to be in a machine readable form, periodically updated and proactively available. Seeking proposals for AI applications in the relevant domains can then be done through hackathons, regular processes or contests, using the available datasets.
<\/p>
Of course, to be able to use this data for AI, not only does the data need to be accurate and properly formatted, it is also required to be representative of the underlying context. For example, it has been found that AI applications based on testing datasets that predominantly contained facial data of white people did not recognize black people. Similarly, in our context, if training datasets have data only from a particular region, then predictions for another region may not be accurate. Such requirements are more critical, when dealing with different socio-economic categories. Thus, having representative and inclusive datasets is important. However, collection of data in the government often does not consider this dimension explicitly. The data.gov.in website, consequent to the NDSAP has fragmented data. Some states are represented, others are not. Data for some years is available, and not for others.
<\/p>
Further, NDSAP data is available at different levels of aggregation. Some of the data is at the district level, other at the state level, with the latter not being available at a lower level of aggregation. Further, data is available for varying parameters across different states\/districts. Most of the data is dated as few, if any data sets are updated. Some datasets do not indicate the year of the data, essentially rendering it useless for any decision-making. Further, some datasets do not have raw data, just percentages (as a number, not as a formula in Excel), limiting flexibility in analysis. Further, due to the fragmented nature of the data and differing formats across departments, it is not possible to link such data for analytics purposes, without a lot of heavy lifting for making it consistent. For example, to have a policy for decreasing drop-outs or increasing gross enrolment, policy makers may need to examine not only availability of teachers, schools, education level of parents, income etc, but possibly link this to availability of nutrition, teacher incentives or mother being a part of a self-help group. Thus to leverage AI to understand the underlying parameters or patterns that contribute to the required policy outcomes, analysis has to be done across various ministries. But, doing this kind of analysis is often not possible, due to the above cited reasons. Since there is little focus on data quality at the stage of capture, its quality is suspect as often data is first captured manually and then transcribed into a digital format.
<\/p>
Thus, using AI for policy-making requires an overhaul of the processes for data collection, storage, standards, formatting and dissemination. Although, the initiatives of the NITI Aayog for AI are commendable, it also needs to design a framework for management of data thatr could facilitate its usage for AI. For example, the only data available under Farmer Welfare for a particular state is the District and Month wise Queries of Farmers in Kisan Call Centre (KCC) from 2010-2018, under nine different catalogs. This kind of data by itself does not support policy-making. It is also not clear why this data is not available year wise in a single catalog. The above exemplifies the gaps in a holistic approach to the importance of relevant and high quality data.<\/p>
Since data is the basic building for AI, the focus of AI for All policy also should be to identify mechanisms for developing schema for the metadata, identifying the interlinkages between data elements, and for collecting, verifying and updating the data in all ministries and government departments. For this, a review of existing processes for these aspects has to be undertaken. And, given the cascading effect of the resultant clean, properly formatted, maintained and updated data would have on being able to use AI for societal development and empowerment, this mission should be named Swachh Data Mission. It should have the same rigor and focus of the Swachh Bharat mission, if not more.<\/p>","blog_img":"","posted_date":"2020-10-13 13:21:05","modified_date":"2020-10-13 13:36:09","featured":"0","status":"Y","seo_title":"Swachh data mission","seo_url":"swachh-data-mission","url":"\/\/www.iser-br.com\/tele-talk\/swachh-data-mission\/4574","url_seo":"swachh-data-mission"},img_object:["","retail_files/author_1599561728_27456.jpg"],fromNewsletter:"",newsletterDate:"",ajaxParams:{action:"get_more_blogs"},pageTrackingKey:"Blog",author_list:"Rekha Jain",complete_cat_name:"Blogs"});" data-jsinvoker_init="_override_history_url = "//www.iser-br.com/tele-talk/swachh-data-mission/4574";">