AI Safety: A Catalyst for Progress, not a Barrier - Altimetrik

Jeffery Thadues & Anusmita Bose
December 10, 2024
5 mins
Priyanka Baruah
August 22, 2024
5 minute read

AI is transforming industries such as healthcare, manufacturing, and IT, driving innovation and improving efficiency at an unprecedented pace.

While it has proven to reduce human errors and analyze trends for faster decision-making, it still struggles to earn users’ trust in its output. Additionally, further implementation of AI technology in businesses requires ensuring AI safety at all stages.

In this blog, we’ll dive deep into AI safety, explore why it is crucial, and discuss how to identify and address threats using the right tools.

What is AI safety and why do you need it?

AI safety involves protecting AI’s development and deployment and safeguarding against attacks, misuse, and accidental harm.

With proactive addressing of technological, ethical, and societal concerns, it strives to create reliable and robust AI systems.

  • Prevent accidents and their consequences: It is crucial to implement proper safety measures, as AI systems can be vulnerable to manipulations like prompt injections.
    e.g. Attempting to “jailbreak” the AI using specific prompts can lead to unexpected and harmful outcomes if not carefully managed.
  • Global security: It is crucial to protect against malicious AI abuse as it could affect cybersecurity, international relations, and national security.
  • Ethical considerations: AI choices influence people’s lives, and algorithms may unintentionally contain prejudices and biases. Safety procedures ensure fairness and justice in decision-making based on AI outputs.
  • Human well-being: From self-driving cars to medical care, the realm of AI is expanding exponentially. Therefore, ensuring safety is crucial to prevent harm to people and society.

Let’s categorize all risks into these three areas:

  1. Harm to People
    1. Individual Harm: This is when someone’s personal rights, safety, or opportunities are negatively affected.
      e.g. An AI system wrongly accuses someone of fraud, ruining their reputation and access to financial services.
    2. Group/Community Harm: This is when a group or community faces unfair treatment or discrimination.
      e.g. An AI system rejects job candidates based on their race or name.
    3. Societal Harm: This is when society as a whole is impacted, like when democratic systems or educational access are harmed.
      e.g. AI-generated deepfakes spread misinformation during an election, influencing voters’ decisions.
  2. Harm to an Organization
    This includes disruptions to daily operations, financial losses from security breaches, and harm to the organization’s public image.
    e.g. A cyber-attack halts production at a factory, leading to financial losses and a damaged reputation when customer data is compromised.
  3. Harm to an Ecosystem
    Harm to systems that are interconnected, including the global financial system, supply chains, or essential resources.
    e.g. A cyber-attack on a major shipping company disrupts global trade, affecting economies worldwide.

Addressing potential security threats in AI development involves identifying risks to individuals, organizations, and ecosystems. This requires innovative solutions to ensure the responsible deployment of advanced AI systems.

Next, let’s explore the possible security threats in AI and discuss how to address them with the right tools.

How do we address AI threats with the right tools?

From data collection to deployment, various safety and ethical issues may arise. To address these concerns, many innovative solutions are emerging to mitigate risks and ensure responsible AI deployment.

Here’s an example of architecture to illustrate potential security threats encountered during the development and deployment phases of an AI system.

image

Fig: Threats in the development and deployment phase of an AI system


The table below details the threats identified in the architecture, along with possible measures and tools to mitigate them.

Threats Risk Mitigators (Tools and Techniques)
Sensitive Information Disclosure: AI models may unintentionally expose confidential data, leading to privacy breaches. Protecting Sensitive Data: Encrypting sensitive input data and adhering to regional data regulations (e.g. GDPR) to safeguard privacy and prevent unauthorized access. Tools: Masked AI / Presidio
Insecure Output Handling: Accepting AI model outputs without scrutiny can expose backend systems to vulnerabilities. Content Filtering: Removing harmful or inappropriate content to uphold safety and integrity.
Overreliance: Relying too much on AI models without understanding their limits or having human oversight can cause errors or unforeseen outcomes. Human-in-the-Loop Monitoring: Establish systems for human oversight to prevent undesirable outcomes from overreliance on AI.
Training data poisoning: Manipulating training data can compromise model integrity, introducing biases or vulnerabilities that undermine system effectiveness and reliability. Data Filtering: Analyzing data to remove harmful information, ensuring accurate and reliable results. Tools: Evidently AI, TFDV (TensorFlow Data Validation)
Prompt injection: Involves manipulating AI/ML models, especially large language models (LLMs), through crafted inputs to induce unintended actions. (e.g. Jailbreaks Prompt – (DAN) Do anything now) Prompt Filtering: Process of mitigating data leakage and safeguarding sensitive data while using LLMs. Tools: Prompt Injections filtering using LLM’s
Model Theft: Unauthorized access to AI models risks intellectual property theft, loss of competitiveness, and misuse of stolen models. Access Control: Regulating data and model access through user permissions and authentication like two-factor authentication to prevent unauthorized access and theft.
Adversarial Attack: A strategy to deceive AI into errors or misclassifying data by manipulating input subtly. Adversarial Testing: Rigorous testing to find and fix AI system vulnerabilities, simulating real-world attacks for improved robustness and security. Tools: ART – Adversarial Robustness Toolbox

These tools for managing AI system threats align well with the management phase of the AI RMF (AI Risk Management Framework) developed by NIST.  

What is an AI risk management Framework?

AI RMF is a structured process for identifying, assessing, and mitigating risks throughout the AI lifecycle. Therefore, it’s a core to enhance overall risk management efficacy.

image 1

Fig: The four pillars of an AI Risk Management Framework

It comprises four functions: Govern, Map, Measure, and Manage. Let’s understand each function and its role in a nutshell below.

Govern:

  • Organizations should establish and implement robust processes and policies to address AI risks, ensuring transparency and accountability.
  • Ensure that the AI Actors (those who play an active role in the AI system lifecycle) are empowered, responsible, and trained to map, measure, and manage AI risks.
  • Prioritize decisions based on AI risks throughout the lifecycle.
  • Promote a safety-first culture among organizational teams, promptly addressing and communicating AI risks.
  • Make sure policies and processes are in place to handle third-party software and data from AI risks.

Map:

  • The business value context is established and understood.
  • Categorize AI systems (e.g., classifiers, GENAI, recommenders).
  • Understand system capabilities, usage, goals, and benefits.
  • Map risks and benefits for each part of the AI system, including data and third-party software.

Measure:

  • Identify AI risk measures and metrics.
  • Evaluate AI systems for trustworthiness, social impact, and human-AI configurations.
  • Establish procedures for tracking identified AI risks.
  • Cluster and assess feedback on efficacy measurement.

Manage:

  • Prioritize, respond to, and manage AI risks based on assessment and analytical output from map and measure functions.
  • Plan, prepare, implement, and document strategies to minimize negative impacts.
  • Manage both the benefits and hazards of AI from third parties.
  • Risk treatments, including plans for response, recovery, and communication for identified and quantified AI threats, are routinely evaluated.

AI RMF offers a comprehensive strategy to tackle AI risks across a range of use cases and sectors and emphasizes responsible and trustworthy AI development and deployment.

Conclusion

Safety is paramount in the rapidly evolving field of AI, and robust measures are crucial for responsible use. By balancing innovation with ethics and safety, we can ensure a secure future for AI. Remember, AI safety is not a barrier to advancement but a necessary complement.

Your vision, our expertise—let’s make it happen.