This is a project conducted at the IAM Lab at the University of Maryland, College Park. I work with my supervisor, Hernisa Kacorri and my fellow lab member, Utkarsh Dwivedi.
AI and data-driven systems offer the possibility of removing many accessibility barriers. To develop and deploy such technologies, datasets and data sharing play an important role for training and evaluating machine learning models, as well as benchmarking, mitigating bias, and understanding the complexity of real world AI-infused applications for people with disabilities. However, there is a scarcity of available population data in wellness, accessibility, and aging due to smaller populations, disparate characteristics, lack of expertise for data annotation, as well as privacy concerns. This is problematic, as it leads to practical challenges in developing technologies that work for diverse user populations.
The mission of this project is to address the issues regarding inclusivity of data-driven methods and technologies and promote effective data sharing practices that support the accessibility community.
As a first step, we have released a data surfacing repository called IncluSet (Official Site) for the accessibility community to easily discover and link to datasets that include data sourced from people with disabilities and older adults. The datasets were manually located by our research team at IAM Lab. The repository stores metadata about where the datasets can be found, the populations represented, data types, or technology used. None of the datasets is stored in our servers. A download link is only included for the datasets that are publicly available. Datasets that are available upon request include an email that data creators have indicated and a link to a webpage describing the data when available. The repository also links to datasets that don’t include any sharing intent or information from the creators by pointing to the related publications where data collection is described.
Our next step is to deepen our understanding of data sharing practices as well as the perspectives of dataset creators and users unique to the accessibility community. We are currently analyzing the datasets located in our IncluSet repository by their metadata, including the populations represented, data types, project source/funding, and data sharing methods. We aim to observe some insightful patterns of existing data sharing practices, such as whether dataset creators choosing to provide direct download link, share upon request, or provide no information on dataset sharing vary by their target population groups.
Hernisa Kacorri, Utkarsh Dwivedi, Rie Kamikubo, “Data Sharing in Wellness, Accessibility, and Aging”, In NeurIPS Workshop on Dataset Curation and Security, December 2020.
This project is supported by the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR, ACL, HHS #90REGE0008).