GenAI Capabilities and Challenges

Capabilities of Generative AI

Toxicity refers to AI-generated content that is offensive, disturbing, abusive, hateful, or inappropriate.
Defining what qualifies as “toxic” is challenging because it often depends on: Cultural context, Audience, Intent
There is a thin boundary between filtering toxic content and censorship, especially when free expression is involved.
Also, questions to be considered: what about quotations of someone that can be considered toxic? Should they be included?
Mitigation Strategies
- Training data curation
  - Identify and remove toxic or offensive phrases before model training
  - Balance datasets to reduce bias amplification
- Guardrails and moderation models
  - Automatically detect and block harmful content
  - Filter outputs based on predefined safety categories
- Human review
  - Use human-in-the-loop workflows for ambiguous or borderline cases

Hallucinations occur when a model generates confident-sounding but factually incorrect information.
This happens because large language models:
- Predict the next most likely word
- Do not truly understand factual correctness
As a result, models may:
- Invent non-existent facts
- Cite fake sources
- Provide incorrect explanations that appear plausible
Mitigation Strategies
- User education: Inform users that generated content is not guaranteed to be correct
- Verification requirements: Cross-check outputs against trusted or authoritative sources
- Output labeling: Clearly mark AI-generated content as unverified or machine-generated
- Retrieval-based grounding: Use external knowledge sources (e.g., search engines, databases) to anchor responses

Worries that Gen AI can be used to write college essays, writing samples for job applications, and other forms of cheating or illicit copying
Debates on this topic are actively happening
Some are saying the new technologies should be accepted and others are saying it should be prohibited
Difficulties in tracing the source of a specific output of an LLM
Rise of technologies to detect if text or image have been generated by with AI

Poisoning:
- Poisoning involves intentionally injecting malicious, biased, or misleading data into training datasets.
- This can cause the model to: Produce biased outputs, Generate harmful or offensive content
- Often difficult to detect without careful data governance.
Hijacking and Prompt Injection:
- Prompt injection embeds hidden or manipulative instructions inside user prompts.
- The goal is to Override system instructions and Alter model behavior
- This can hijack model’s behavior to produce outputs that align with the attacker’s intentions (e.g., generate misinformation or run malicious code).
Exposure of Sensitive Information:
- The risk of exposing sensitive and confidential information to a model during training or inference
- The model can then reveal this sensitive data from their training corpus, leading to potential data leaks or privacy violations
Prompt Leaking:
- The unintentional disclosure or leakage of the prompts or inputs used within a model
- It can expose protected data or other data used by the model, such as how the model works
Jailbreaking:
- AI models are typically trained with certain ethical and safety constraints in place to prevent misuse or harmful output
- Jailbreaking is a way to circumvent the constraints and safety measures implemented in a generative model to gain unauthorized access or functionality