Which steps to take if you want to regard fairness in your AI system development

Technical processes

Introduction

The current process model was developed by the research consortium fAIr by design. It is intended for anybody who is interested in improving fairness of AI systems. It offers support in developing, deploying and maintaining fair AI systems and can be used in different contexts and for different technologies and uses. Adhering to the process model should offer help in being prepared for future standards and regulations, as well as facilitating eventual third-party audits; but it does not give any guarantee of compliance.

The process model includes steps for all stages of AI system development and deployment, for an interdisciplinary team.

The Technical Processes relate to technical actions that need to be taken at particular stages of the AI system life cycle. They provide specific outputs that are used to build, evaluate, or maintain the AI system, and consist of concrete and practical steps to be taken in order to develop and use the AI system as planned. For better orientation the steps of the technical processes are identical to the processes defined in ISO 5338.

Please note, that the process model is regularly updated.

  1. Inception

The Inception phase is the first phase of the project. The objectives and features of the AI system are defined, and a roadmap for the further development process put in place. In order to ensure an efficient and effective development, and to provide a proper basis for the analysis of the AI system’s fairness aspects, it is crucial that a complete understanding of the project and its objectives is developed during this phase.

  • General Purpose

    In order to be able to develop the AI system, an approach to the designing, testing, deploying and monitoring needs to be agreed upon. Version control and documentation systems also need to be defined. First choices about hardware and software components to be used for the development process are made, decisions on sourcing the requisite resources are taken, and the User Experience Design is planned. The choices made in this process should conform with documentation, versioning, monitoring and other needs. It may be necessary to iterate several times between this process and other processes in the Design And Development stage, in particular the Architecture definition process, AI data engineering process and Implementation process.

    Fairness Purpose

    The design of the AI System needs to be aligned with the previously defined and use-case specific fairness requirements. This means in particular that all technologies and methods adopted should be assessed to ascertain that they allow for the implementation of the desired fairness measures. If data labelling is required, the general strategy for data labelling needs to be addressed at this stage, and if software is acquired externally, it should be assessed regarding its suitability to meet the fairness goals of the AI system. The user interface should be designed with the needs and requirements of the intended users in mind, and mechanisms for user and other stakeholder feedback defined. Stakeholders should be included in the decision-making process at all times.

  • General Purpose:

    Based on the selected design approach, the requirements and the acquired data, the possible system architectures and algorithms that will be explored in the development phase are defined in this process. This can be one architecture, or several that will be developed and compared.

    Fairness Purpose:

    Before building an AI system, a procedure or mechanism to test the AI system quality and validity needs to be determined. For example, in supervised machine learning tasks such as classification, it is common to use error rates as quality measures for trained AI systems. To align the choice of AI system architecture, test designs should include tests for selected fairness measures and de-biasing measures.

    To be able to test and validate your chosen AI system(s), it is important to ensure that available datasets are suitable, and reflect the context in which the AI system will be deployed. In the case of ML-based systems, ensure that training-, validation and test datasets are completely separate, with no possibility of leakage from one dataset to another, as this could invalidate any evaluations and lead to performance failures, as well as unintentional bias and possible fairness degradation. If insufficient data is available for separate training and validation datasets, designate an alternative validation testing scheme to establish the optimal AI system in terms of architecture and hyperparameters (such as cross-validation). In addition, it may be necessary to generate or augment existing datasets in order to obtain data of sufficient quality. Any such choices and procedures should be assessed for their fairness impact, and documented.

    Tools:

    data-science-ethics-checklist

  • General Purpose:

    The system analysis process provides evidence and information to support technical decision-making throughout the AI system life cycle, and is often used together with the decision management process.

    System analysis includes a wide range of methodologies, from mathematical modelling to simulation and experimentation, with a level of formality that is adaptable to suit the context of the decision to be made. Examples of decisions that can be supported with system analysis include definition of operational concepts, resolution of requirements conflicts, and evaluation of engineering strategies.

    Fairness Purpose:

    The system analysis process includes bias detection and analysis of bias mitigation strategies. This involves the precise definition of testing strategies for detecting anomalies, bias and discrimination, as well as the analysis of system behaviour and test results in order to assess if fairness requirements are met.

    The test strategy should include definition of bias detection metrics and methods, and acceptance thresholds, as well as the precise collocation of each of the measures along the AI system life cycle. The test strategy needs to be implemented, and quality control measures for ensuring the validity of analyses instated. Additionally, the regularity of intervals at which particular fairness analyses are to be performed should be established, and traceable documentation implemented. The relevant stakeholders should be identified, and included in the entire fairness analysis process.

    Tools:

    Aequitas

  • General Purpose:

    Knowledge and resources that are not yet available but necessary to create the AI system are acquired during the knowledge acquisition process. This includes third party software (including pre-trained models and expert systems), as well as knowledge acquisition in the form of hiring experts, or conducting research and interviews, etc. Proper documentation of the process is necessary in order to understand why certain decisions were taken.

    Fairness Purpose:

    During this process, it is important to ensure that all acquired knowledge adheres to the predetermined fairness requirements. This includes, for example, ensuring that any pre-trained models satisfy appropriate fairness tests, that results are available for verification, and that the necessary access to the models for further fairness verification and validation are obtained. It is also necessary to ensure fairness and diversity in the hiring of experts, and in the selection of interviewees.

  • General Purpose:

    The purpose of the AI Data Engineering process is to obtain, understand and prepare data for use in developing and in evaluating the AI system.

    This AI system life cycle process is often broken down into a data understanding, and a data preparation phase in other AI system life cycle models such as CRISP-DM. In this AI system life cycle process, we also include data acquisition : Data needs to be collected and its quality assured – in particular with respect to bias and representativeness. In case of automated data acquisition, in addition to the data quality, the data collection tools themselves must also be evaluated. Similarly with synthetic data generators, the synthetic data generation tools must be evaluated for bias and non-discrimination.

    The data understanding phase follows upon the initial data collection, and includes activities that enable the system developers and data scientists to identify data quality problems, and gain first insights into the data. The data preparation phase covers all activities needed to construct the final datasets in the form that they can be used by the AI system. Data preparation tasks are likely to be performed multiple times, and can occur at several different timepoints in the data engineering process.

    Fairness Purpose:

    It is necessary to check the quality of the data and determine if it represents reality. Additionally, tests for all types of bias and discrimination should be performed. If the bias affects the protected groups defined in 1.2. Risk Analysis, the appropriate mitigation measures must be identified in 1.3. Requirement Definition and the 1.4. Project Planning project plan needs to be adapted accordingly. This should be done iteratively with 1.2. Risk Analysis. Following the insights might need to be considered in 2.4 Data Preparation and 2.9. Definition of Monitoring Plan.

    Feature selection, and any transformations to the data should be documented, and checked for fairness. In case automated tools are in use, a post process evaluation must take place as well as detailed documentation about the tools and the setting used in the specific use case. If necessary, bias mitigation measures identified in data understanding are used.

    Tools:

    Fairlearn

  • General Purpose:

    The purpose of the implementation process, is to realize AI system elements by translating the predefined requirements, architectures and designs into actionable steps. During this process the various modelling techniques that were selected in the definition of model architecture phase are applied, their parameters are tuned to optimal values, and thoroughly tested. Typically, there are several techniques for the same problem type. Some techniques have specific requirements on the form of data. Therefore, going back to the data engineering process is often necessary. Upon completion of this task the optimal model or models for the given use case and success criteria have been identified and selected.

    Fairness Purpose:

    As the first step in modelling, select the actual modelling techniques that are to be used. Although you may have already selected a tool during the Business Understanding phase, this task refers to the specific modelling techniques, and the hyperparameters that will be tested e.g., decision-tree building with different depths and pruning parameters, or neural network generation using different neural network architectures and different activation functions and learning rates. To ensure possible model architectures align with fairness goals, fairness metrics of each possible model architecture should be defined.

    Many modelling techniques make specific assumptions about the data—for example, that all attributes have uniform distributions, no missing values allowed, class attribute must be symbolic, etc. Those assumptions may have an impact on the ability of a model to meet fairness needs and they should be documented in an accountable way.

    Before building a model, a procedure or mechanism to test the model quality and validity needs to be generated. For example, in supervised machine learning tasks such as classification, it is common to use error rates as quality measures for trained models. To align the choice of model architecture, test designs should include tests for selected fairness measures and de-biasing measures.

    To be able to test and validate your chosen model(s), split up any existing datasets into training-, testing-, and validation datasets (2.5.) to avoid unintentional bias and possible fairness degradation. If no dataset is available, designate an alternative testing scheme to establish the optimal model in terms of architecture and hyperparameters (such as cross-validation).

  • General Purpose

    The Integration process aims to combine implemented system elements into a functional system that meets specified requirements, architecture, and design, with a focus on ensuring fairness in the integration process.

    Fairness Purpose:

    Validate that the integration meets fairness criteria, as well as other requirements like support, security and accessibility. This process needs to be documented very well to maintain traceability of each element an step.

3. Verification and Validation

During this stage, the AI system is thoroughly tested to ensure it fulfils all requirements, before it is finally released and made available to users.

  • General Purpose:

    Through the use of appropriate techniques, standards, and metrics, this process finds anomalies (errors, shortcomings) and presents evidence that the system or system element satisfies its requirements. It answers the question if the systems was built right and according to system requirements.

    Fairness Purpose:

    Fairness metrics and requirements must be met, including system characteristics such as security, interpretability and explainability of the model. At the end of this stage, a decision on the use of the model/ resp. AI system should be reached.

    Tools:

    Google “what if”

    responsible-ai-toolbox

  • General Purpose:

    Once the AI system is in its service environment and each component has been individually tested, it needs to be tested as a whole, including non-technical requirements. It asks the question: did we even build the right system? – i.e. did we even write down the correct system requirements, and does this work in the given usage context? This also includes User Experience Testing to ensure understandability and usability.

    Fairness Purpose:

    Understanding the stakeholder needs and translating them into requirements for the AI system and development of the AI system is a crucial step for fairness.Before being deployed in the target market, especially for fairness, the AI system as a complete needs to be tested for meeting fairness objectives, requirements and metrics.

    Fairness needs of all stakeholders are met. This includes validation of fairness characteristics in the given usage context, and for varying AI user and AI subject profiles.

    Test the UX Design with the target audience and final features to ensure it meets objectives, promoting fairness, comprehension, and usability for diverse user groups

4. Deployment

The deployment stage covers all steps necessary in order to progress the AI system from development stage, to operation stage.  Depending on context, the number and type of models that compose the AI system, and the extent to which the operational environment differs from the development environment, this stage can be more or less complex.  The deployment stage consists of just one process – the transition process, described below.

  • General Purpose:

    The transition process progresses the AI system from the development to the operational stage. The transition process is required in order to ensure that only a verified and validated AI system is put into operation, and that all supporting processes, infrastructure and functions are in place. This can include particular hardware requirements, cybersecurity measures, risk management and configuration management processes, as well as appropriate instruction and training for eventual AI system users and operators.

    Fairness Purpose:

    The transition process needs to ensure that all fairness requirements have been verified and validated for the AI system in its operational environment. Additionally, all measures necessary to monitor and respond to system bias need to be in place. This can include appropriate risk management processes, system logging, as well as data protection measures and appropriate user/operator training. Furthermore, where deemed necessary (as per project requirements), system decision explainability needs to be ensured, and opportunities for stakeholder feedback put into place. Finally, this step should also ensure adequate accessibility of the system for all possible users.

5. Operation And Monitoring

After the release of the AI system, it is operated and monitored, while in use several fairness issues might arise, that need to be carefully monitored and mitigated.

  • General Purpose:

    In the Operation process, the AI system is used for its intended purpose, using operational input and delivering intended results.

    Personnel requirements and assignments are established, and adequate personnel is operating and monitoring the AI system, as well as the operator-system performance. Personnel (with the optional support of monitoring systems) detects and analyses operational deviation from predefined, AI system specific thresholds, stakeholder requirements, organizational policies or other applicable agreements.

    Elements of the operation process also include preparation for operation, performance of operation, managing of results of operation, customer support.

    Data required for the operation of the system needs to be checked for quality and pre-processed to conform to the format required by the model. Additionally, the data may need to be logged.

    Use the model to produce the required output. Ensure appropriate safeguards (hardware and software) for the appropriate, safe and privacy-preserving operation of the system.

    Fairness Purpose:

    During the operation process, the AI system is monitored to ensure its compliance with predefined fairness requirements. It ensures that adequate human autonomy and oversight are maintained during operation, and that appropriate controls and intervention mechanisms are in place. Any unintended outcomes regarding fairness, or deviation from predefined fairness-thresholds is detected and analysed by adequately trained personnel (with optional support from monitoring systems).

    Data quality issues may be associated with different groups of users and have different impacts on different groups of people.

    Ensure that those who work with the model are not subject to poor working conditions, that adequate controls for humans are in place, and that model operation does not reduce human autonomy, for example avoid that when running the AI system embedded in social media the users adopt unintended behaviour.

    Tools:

    Evidently AI

  • to be defined

6. Continous Validation

This stage is necessary to ensure that the system meets performance criteria throughout its operations, and consists of only one process: the continuous validation process. 

  • General Purpose:

    The purpose of the continuous validation process is to ensure that the AI system is continuously monitored in relation to the desired and expected performance during its operation. This includes monitoring for data drift (input deviation) and concept drift (output deviation), and extends the quality assurance process. If the AI system is a continuously learning system, continuous learning needs to be planned and carefully monitored using test data.

    If considerable deviation is detected, the AI System should undergo maintenance (see maintenance process). In case of automated continuous learning AI systems, deviation thresholds for an automated rollback process should be implemented. Versions of the AI system need to be documented – this is especially important for continuously learning AI Systems. It must be possible to explain why a certain outcome was generated in a particular version of the AI system.

    Fairness Purpose:

    Continuous validation should include also continuous monitoring for potential bias during the AI system operation. Especially for continuously learning systems which change often, traceability is crucial. To ensure accountability and make decisions and outcomes explainable and traceable, continuous learning systems need to be carefully updated and changes documented.

    Data collected and logged for continuous validation purposes should follow criteria set by the data governance plan and should be as inclusive as possible.

7. Retirement

AI systems, or particular elements of an AI system, reach the retirement stage when their period of use needs to be ended. 

  • General Purpose:

    The purpose of the disposal process is to terminate the use of an AI system or an AI system element for a given dedicated purpose. This includes adequately retiring or replacing AI system elements, and effectively addressing critical disposal requirements; such requirements could, for example, arise from prior agreements, corporate guidelines, or for concerns related to safety, security, legal or environmental issues.

    Upon termination of AI system use, residual data needs to be disposed of, or archived in accordance with the data governance plan (as defined in the project planning process).

    Regardless of whether the AI system or AI system element is replaced by an updated version, or whether the AI system or AI system element becomes entirely obsolete, decommissioning must be documented in a transparent and traceable manner. For future reference, an archived version of the system or model code, and where applicable, of the trained AI model components should be maintained. This provides the possibility for retrospective evaluations.

    Fairness Purpose:

    To ensure that future fairness evaluations can be carried out accordingly, residual data that may be required for prospective fairness testing should be logged in an accountable and secure manner in accordance to national and international data retention regulations.

    For future fairness reference and retroactive fairness examinations, the source code of the retired AI system, as well as trained model components where applicable, needs to be archived in an accountable and secure manner. This enables AI subjects and AI system producers to challenge or prove that fairness goals and metrics were in place in the case of future allegations of discrimination against AI subjects.

    While not strictly fitting into the scheme of algorithmic fairness, environmental impact can be considered an issue of generational fairness, or as a risk that can affect certain regions more than others. In this context, fairness objectives can include the disposal of the system hardware in a way that minimizes negative environmental effects, and meets recycling requirements.

2. Design And Development

In the Design And Development Stage of the project, the AI system including all its components and features are developed to meet the requirements defined in 1. Inception.

Please note that all processes in this stage may be carried out simultaneously and do not have to follow any specific chronological order. It is expected that development teams may jump between processes.