The fAIrbydesign story of CBA Media

What was the aim of the Use Case? Why should AI be used?

Which fairness risks were identified?

How could fairness risks be mitigated?

What did you do to mitigate fairness risks and what has changed as a result?

What has the UC partner learnt and taken away?

cba.media is a large, fast growing podcasting platform that exists since the year 2000. In contrast to most other podcasting platforms cba does not only provide isolated blog-like pages for single podcasts but provides a portal and opens up its whole archive especially for research and educational purposes. Thus providing good search results and content recommendations are crucial features for using the platform. The larger it grows the worse search results become and content cannot be found since:

  • There is a lack of metadata (in quantity and quality) which is needed to realize better content retrieval

  • There is a lack of meaningful search results & content recommendations due to the use of conservative search methods that reach their limits with large data sets

  • There are currently no personalization functions in place that take the user’s interests or interactions into consideration

Thus there’s the need of making use of machine learning and AI methods that are able to deliver better, context-sensitive and more meaningful search results to the user and makes vast amounts of our content retrievable and thus accessible in the first place.

What was the methodological approach?

  1. Use of the AI model canvas and ecosystem analysis to build a good understanding of the use case and identify all relevant stakeholders, including possible affected groups

  2. Desk research to identify possible risks

  3. Interviews with experts who are already working on recommender systems for media, workshops with individuals potentially affected by the planned AI-system, to derive information on fairness requirements for the development of an AI system

  4. Legal analysis

  5. Given the very early stage in development, the technical analysis was mainly geared towards identifying possible sources of bias, and recommendations for data checks and system tests to be implemented at a more advanced stage of development.

  6. Use of Assurance Cases for fair AI systems to facilitate better inner-team agreement on the specifications and claims of the planned AI-system, as well as mediate discussions about fairness claims, and translate fairness claims into fairness requirements and evidences

  • Gender and use of regional dialect, as automated text-to-speech transcription systems are known to work best for male speakers, and to work best when spoken input is as close to training data as possible – which would be High German. This applies also to metadata being pulled from the automated transcript resulting in a higher need to manually transcript for certain speakers.

  • Due to the early-development state of the recommender, no definitive risks for unfair treatment could be identified. The following are possible risk factors for unfair treatment:

  • Popularity – recommending popular episodes could lead to further popularization of episodes

  • Gender, region and level of education of podcast host – the semantic differences might lead to differences in the recommendation pattern of episodes.

  • If user profiling is used for making recommendations, gender, region, ethnic background, and level of education are also possible sources of bias towards the system’s users.

  • Assumptions about the intent of listeners might lead to unfair recommendations.

1. Users need a transparent way of understanding how recommendations are generated. They should be able to select filter criteria for search results and recommendations and have the option to change their preferences manually. In particular, the system should not attempt to anticipate user behavior based on profiling.

3. Collect data to test for fairness issues, in particular if the recommendations are not working as well for certain users, or for certain hosts. Additionally, collect data to determine of filter bubble effects are occurring.

2. Reduce hurdles for hosts that are disadvantaged by the automated transcription process. The review process of transcripts should be as seamless as possible and hosts need a transparent way to understand how the metadata they fill in/edit is used for recommending their episodes.

4. Quality Control of metadata is recommended: Include quality checks for metadata entry by hosts themselves, in particular to avoid “SEO inflation” effects

1. Data-related bias

In order to obtain the most balanced recommendations possible, it is necessary to have a certain amount of descriptive metadata for all content items in the database. To do this, we automatically create transcriptions of spoken words that enrich the full-text index.

In addition, terms (topics, places, people) are automatically extracted from this information, which - after being checked by the user - enrich the content as additional metadata.

Overall, this leads to a better balance of the metadata scope of the database and thus to more fairness in order to make content findable at all.

Undertaking measures to fine-tune the speech recognition algorithm: especially with Austrian-specific terms. Perspectively in other languages too.

High updating frequency of the speech recognition algorithm, since these models improve fast and each improvement lowers discrimination of certain regional ways of speaking as training data become more diverse

Beside making use of a large language model, another method to reduce the risk of losing relevant content-recommendations due to diverging language levels in speech and thus leading to discrimination based on education, is to use generative AI methods to summarize a content in so-called simple language in order to equalize the language level (e.g. by avoiding foreign words and less known synonyms). This method will be tested within a future project.

2. Analytics bias / Faulty interpretation

In order to avoid possible incorrect interpretations of user data, the use of metrics that make assumptions about user behavior or the search situation is avoided, e.g. to make a search result/recommendation dependent on the user's location or device.

Functions that contribute to the ranking of results but are known to introduce bias into the results (e.g. popularity) are separated from the standard content-based similarity search and recommendation function and must be used willingly (and with additional information provided) by the user and activated as a search filter.

3. Cognitive bias / Under-representation

Collaborative filtering is a common method to find out common fields of interests (e.g. what other people found interesting after searching for something or viewing a certain content item). These anonymized data need to be constantly collected to serve as one of the parameters for the recommender algorithm. Unfortunately this method tends to introduce biases and fairness risks on its own since people tend to stay within their known boundaries and thus leads to under-representation of certain contents if the influence of this parameter is set to high.

When it comes to providing a certain way of diversity in the search and recommendation results, we consciously include content that is related to the current field of interest but has enough algorithmic distance usually not to be included in the results.

4. Organisational methods

Implementation of a fairness and quality assurance process that every development iteration needs to undergo before be rolled out. The process mapping as well as the assurance case methods were particularly helpful with that.

It includes:

  • Collecting data of control groups that share certain demographic properties (like age, sex, social situation, ethnical background, etc.). We do that by asking people to voluntarily provide us their (anonymized) demographic data as well as their search histories to collect these data: this is a part of the user backend we introduced. However this dataset needs a certain size before it can be used to identify and get rid of biases. As soon as the user used the site to a certain extent and his/her personal data collection is big enough to serve as a part of our test dataset, we ask them to additionally provide us some of their demographic data and give qualitative feedback.

  • We apply these test data to a set of searches and content recommendations and compare the results qualitatively by having test groups look at the results. Test groups need to review and compare the results.

  • Process mapping: We were able to identify the crucial points in the development cycle that require special attention and critical examination regarding bias

  • Estimate the effort and being able to create methods that have to be taken to mitigate biases

  • Analyse relevancy criteria for their discrimination/bias potential

  • That communication and additional information material is needed that users can understand the reasons and potentials as well as the risks of using machine learning methods. Some of these aspects can already be included and covered in a smart interface design

  • Challenge between the need for personal data collection for bias mitigation while trying to avoid collecting it for privacy reasons

  • Involving users/different stakeholder in the evaluation process creates a better understanding on both sides