Recommendations for Good Practice Using AI in Learning
The application of AI should be based on a human-centered approach. This means that the design should be based on actual human use, and not an idealized model. For example, not everybody is going to be able to ask exactly the right question using the right syntax; a human-centered approach would use prompts and encourage users to clarify their request.
Design for Bad Actors
We’ve already seen cases where well-intentioned AI assistants have adopted patterns of objectionable behaviour, for example, Microsoft’s Tay bot. This can happen when the method to collect training data does not take into account people who insert false, misleading or objectionable data. Care needs to be taken to ensure that training data is developed in controlled and reliable environments.
Beware Unintended Consequences
AI can stray from intended design outcomes in two ways: it can detect patterns in data that may not have been anticipated by designers, and it may result in unintended applications. For example, an AI designed to design responses for different communities based on their average income may thereby also be differentiating by race, with an outcome that may be used not only to allocate greater social services, but also more policing and enforcement (this specific example is known as digital redlining). Therefore, applications of AI should be evaluated not only for specific intended outcomes, but against a wider range of metrics and from a wide range of perspectives.
Inspect the Data
Training data is the crucial component of any AI system, and often the area where errors may most often be overlooked. Data (or at least, samples of the data) should be studied for potential errors in labeling, data recording, and other systemic errors. Proxies (cases where one type of data is used to evaluate another type of phenomenon, such as taking temperatures to detect disease) should be clear and well-understood. Data should be assessed not only for quantity but also for representativeness; data that was used to train public service workers based entirely on military training data would be inadequate, for example, because of differences in job function and outcomes.
Resist the Temptation to Transfer
In AI the concept of ‘transfer’ means taking AI models that were trained in one domain and applying them to another domain. Humans have ‘general intelligence’ and transfer knowledge naturally, but AI are trained in specific domains and cannot be expected to work outside that domain. For example, “a shoe detector trained with stock photos can work best with stock photos but has limited capability when tested with user-generated cellphone photos.”
Diversity, Equity and InclusionEnsure Fairness
AI applications in education need to be fair, and need to be seen as fair. This means more than ‘treating everyone the same’, because AI data will easily capture inequities and injustices without flagging them as such. Thus in designing and deploying AI applications, users and their clients should be informed of measures taken to ensure fairness, and of the precise understanding of fairness being applied.
Engage for Inclusion
AI researchers and developers should adhere to the principle of “nothing about me without me”. This means that the use of AI tools in learning and development should be developed and applied with effective input from both teaching staff and learners themselves. Feedback should inform decisions about appropriateness of method, desirability of outcomes, reliability of data and effectiveness of measures. In all cases, the application of AI (or any) learning technology should be based on voluntary agreement, just as we would expect (for example) for any medical treatment.
Define Community Broadly
Many AI and learning applications are developed with a specific community in mind, however, their use and influence on further development may extend well beyond this intended audience. Therefore it is recommended that a broad definition of user community be anticipated, one that extends beyond a specific demographic, domain or department. Managers should ask, for example, “what if this tool were used with children?” Defining community broadly allows us to anticipate uses even within the public service that we may not have anticipated, and to ensure that the needs of under-represented groups are taken into account.
Monitor Use and Benefits
The fairness and equity of a tool should be assessed not only in the design phase bust also as the tool is being used. In particular, patterns of use can reveal potential problems with the tool. For example, we would question our tool designed if it were being used exclusively by Anglophones, and not at all by Francophones. Managers should monitor also for the expected benefits from the use of a tool, to ensure that people who should be included are being included, and that the tool does not exclude certain demographics or user groups. There should be clear guidelines and standards for the evaluation of AI tools in learning.
Ensure Representative Data
As mentioned above, data should be inspected for quality and representativeness. There should be clearly defined criteria for types and proportions of data employed representing different demographics and especially under-represented groups. Tools that make predictions should be based on real-world frequencies of events, and concordance with real-world data should be measured and monitored on an ongoing basis.
Limit the Possibility of Bias
When creating data sets, it is recommended that researchers take proactive steps to limit the potential for bias. Vocabularies and labels should be reviewed to ensure they do not perpetuate stereotypes or prejudicial generalizations, and people creating data (especially about themselves) should have the ability to go beyond predefined taxonomies. Where possible, data should be generated in neutral environments where individuals are motivated to employ accurate and inclusive annotations. Small biases in data collection can be magnified by AI, a process called bias amplification.
Learn About Your AI
Managers and users should not regard an AI system as a black box or oracle. They should be able to study the input data to see the bases on which the AI draws its conclusions. Theft should be able to see, and test for, different sorts or results from different inputs. AI users should have an understanding of what would count as an appropriate response from an AI and learn how to predict what an AI would produce given different inputs. Where possible, use explainable AI that produces feature tables along with results.
Design for Interpretation
Where possible, use the smallest number of inputs possible to achieve reliable results. This makes it easier to narrow down the factors having an impact on the outcome. Choose metrics that have an impact on the task, and where reasonably possible, do not employ unrelated metrics. For example, a system predicting a students performance on a test should include data such as previous grades and attendance during the class, but not their height or street address.
Explain AI-Based Decisions
For many people, AI is a literal black box, and when an AI returns with a response - a grade on an essay, for example - it is natural for people to want to know why such a decision was made. However, an AI system might not provide a reasonable answer. The grade may be based on hundreds or thousands of different factors, rather than simple principles or explanations. This makes it difficult to be able to understand AI-based decisions, which points to the need for a person to be able to look at the input and say what would have made a difference in the result. Thus including such a person in any application of AI in learning is recommended.
Because the development and use of AI requires the collection of individual data, AI has significant implications for personal privacy. Most jurisdictions recognize that individuals have a reasonable expectation of personal privacy, and most research ethics processes require that privacy be respected in the collection and use of data. This requires not only that data be anonymized before use to train an AI, but also to ensure that de-identification is complete, and that identities cannot be reconstructed after the fact.
Avoid Unnecessary Surveillance
There has been a significant negative response to the use of AI-assisted surveillance technologies in learning environments. Do not secretly collect data for the purpose of training AI. In secure government workplaces a certain degree of surveillance is required and expected, however, surveillance should be limited to cases where it is needed. It is generally agreed that learning and development applications are not among those cases, and that the use of surveillance technologies to monitor compliance are an excessive use of AI in education.
Don’t Deceive Users
AI should not be used to deceive users. For example, people should not be led to believe that an AI chatbot, advisor or interactive tool is actually a person. Nor should AI tools be used to emulate humans on social media or in learning environments. When AI tools are used to create or manipulate content, this use should be declared. Deceptive use of AI, and of technology generally, is widely disapproved and in some jurisdictions subject to legal action or sanction for ethics violations.
Secure AI Technology
Managers should ensure AI applications for learning and development are secure. This is especially the case if learning technology is being used to collect data for future use by AI. Ideally, most personally identifying data should be cleaned before the data uses the individual learner’s machine, and where possible, on-device processing should be employed. AI data and algorithm systems should undergo a regular security audit to ensure system integrity.
There should be a clear line of accountability for any decision made by an AI such that it is not acceptable to declare that ‘the AI decided’. Accountability is especially important for diversity, equity and inclusion, such that patterns of decisions or recommendations made by the system can be determined to be not the result of bias or prejudice. Similarly, persons need to be able to take responsibility for the reliability of the data and algorithms employed, with the tools and processes clearly documented. Where a decision or recommendation is determined to be ‘not explainable’, it should not be accepted. There should always be the possibility of appealing and overturning AI-based decisions or recommendations. Technology is not yet ready to produce an AI with a skill-set that is broad enough to make human judgements.
Limit Social Harm
The impact of AI isn’t limited to your own use. That creates a responsibility to reduce the likelihood, severity, and scale of harm caused by use of your application generally. OpenAI provides a set of safety guidelines to help with this. It recommends practices such as size limits on data processing, rate limits on API calls, content filters for topic and and unsafe content, and collecting user feedback.
Even if the limitations of a specific application of AI are well understood by the developers and the researchers, they need to be communicated clearly to end users and their clients. This is important because the results of a specific AI application may lead people to draw conclusions beyond what is warranted by the data and the algorithm. For example, an AI-generated recommendation should be depicted as an ‘AI-suggested option’, and not the ‘best option’.
Developers of AI should anticipate and minimize the possibility of harm resulting from use of the application. Harm could include physical injury, psychological or social harm, damage to or loss of property, or harm to the environment. It includes risks created as a result of distress, misinformation or radicalization. Again, this may require imposing constraints on the type of data used as input as well as ongoing monitoring not only of output but also of the effects that output is having on wider society.
Limit Harmful Values
An AI may create safety concerns when harmful values are embedded into the application or the data used to train the application. This is not limited to obvious concerns about bias, prejudice and stereotypes, but also more generic sets of values that can cause harm such as bellicosity and tendency toward confrontation, incitement to violence, greed and self-serving economic focus, xenophobia and intolerance. While it is impossible to completely eliminate harmful values, or even to concretely define which values are harmful, developers and users of AI should nonetheless be aware of this risk.
Test for Reliability
AI, like any technology, should be tested for reliability, to ensure that the application is performing as intended, is unlikely to stop or cease operating without notice, won’t experience degradations in performance (for example, slowing down because of a filter bottleneck) or won’t produce random or unpredictable output. There should be a maintenance schedule (for example, to ensure domain registrations and security certificates are updated) and performance testing of software updates.
Ethics, Analytics and the Duty of Care https://docs.google.com/document/d/1I-9SUZbSTfZbOuYeGm5JIMV3ilw8vPIT1l67jteiF8w/edit?usp=sharing
FTC. Using Artificial Intelligence and Algorithms. https://www.ftc.gov/news-events/blogs/business-blog/2020/04/using-artificial-intelligence-algorithms
Google AI. Responsible AI Practices. https://ai.google/responsibilities/responsible-ai-practices/
OpenAI. Safety best practices. https://beta.openai.com/docs/safety-best-practices/recommendations
Tony Hirst. Terms of Engagement With the OpenAI API. https://blog.ouseful.info/2021/05/20/terms-of-engagement-with-the-openai-api/