AI Ethics, Safety & Responsible Use

AI is powerful, but it's not perfect — and using it responsibly requires understanding its limitations and risks. In this module, you'll learn about bias, hallucinations, privacy concerns, and the alignment problem. You'll also learn practical guidelines for using AI ethically in your daily life and work.

AI Bias: Garbage In, Garbage Out

AI systems learn from data — and if that data contains biases, the AI will reproduce and sometimes amplify those biases. This isn't a theoretical concern; it has already caused real harm in the real world.

How bias enters AI systems: AI models are trained on text, images, and data produced by humans. Since human society contains biases — racial, gender, socioeconomic, cultural — the training data reflects those biases. The AI then learns patterns from this biased data and applies them to new situations.

Hiring Algorithm Bias

Amazon developed an AI recruiting tool that was trained on 10 years of hiring data. Because the tech industry historically hired more men, the system learned to penalize resumes that included the word "women's" (as in "women's chess club captain") and downgraded graduates of all-women's colleges. Amazon scrapped the tool in 2018 after discovering the bias could not be fully removed.

Facial Recognition Disparities

Research by Joy Buolamwini at MIT (published in 2018 as the "Gender Shades" study) found that commercial facial recognition systems from major tech companies had error rates of up to 34.7% for darker-skinned women, compared to 0.8% for lighter-skinned men. The training datasets overrepresented lighter-skinned faces, leading to dramatically worse performance for underrepresented groups.

Healthcare Allocation Bias

A study published in Science (2019) found that a widely used healthcare algorithm systematically underestimated the health needs of Black patients. The algorithm used healthcare spending as a proxy for health needs, but because Black patients historically had less access to healthcare, they spent less — leading the AI to conclude they were healthier than they actually were. The bias affected an estimated 200 million people annually in the United States.

Bias is often invisible

The most dangerous biases are the ones you don't notice. If an AI consistently recommends male candidates for leadership roles or associates certain neighborhoods with higher risk, it may feel like the AI is being "objective" — but it's actually reflecting historical patterns of discrimination. Always question whether AI outputs might be influenced by biased training data.

Hallucinations: When AI Makes Things Up

One of the most important things to understand about AI is that it can generate confident, detailed, and completely wrong information. This phenomenon is called a "hallucination."

Why hallucinations happen: Large language models work by predicting the most likely next word in a sequence. They don't "know" facts the way a database does — they generate text that sounds plausible based on patterns in their training data. When a model encounters a topic where it has limited training data, or when the statistically likely next words lead away from the truth, it can produce fluent-sounding nonsense.

Common Hallucination Types

Fake citations: Inventing academic papers, authors, and journals that don't exist
False statistics: Generating plausible-sounding but fabricated numbers and percentages
Invented events: Describing historical events that never happened
Wrong attributions: Attributing real quotes to the wrong person
Confident errors: Stating incorrect facts with no hedging or uncertainty

How to Protect Yourself

Verify claims: Cross-check facts with authoritative sources
Check citations: If AI provides a source, look it up to confirm it exists
Use Perplexity: Its citation-based approach makes verification easier
Ask for sources: Request that AI cite its sources, then verify them
Be skeptical of specifics: Exact numbers, dates, and quotes are most likely to be hallucinated

A real-world consequence

In 2023, a New York lawyer used ChatGPT to prepare a legal brief and submitted it to court containing six fabricated case citations. The cases, complete with realistic-sounding names and docket numbers, did not exist. The lawyer was sanctioned by the judge. This incident underscored the critical importance of verifying AI-generated information before using it in professional contexts.

Misinformation and Deepfakes

AI doesn't just get things wrong accidentally — it can also be used deliberately to create misleading content. As AI-generated text, images, audio, and video become increasingly realistic, the potential for misuse grows.

AI-Generated Text

Language models can produce articles, social media posts, and reviews that are nearly indistinguishable from human-written content. This enables the creation of fake news articles, astroturfing campaigns, and fraudulent product reviews at enormous scale.

How to defend yourself: Look for unusual patterns, check the source publication, verify claims with multiple outlets, and use tools like Perplexity to fact-check specific claims.

Deepfake Images and Video

AI can generate photorealistic images and videos of people saying or doing things they never did. Deepfake technology has been used in fraud schemes (impersonating executives in video calls), political disinformation, and non-consensual imagery.

How to defend yourself: Be skeptical of extraordinary or inflammatory visual content. Check the source. Look for subtle artifacts like inconsistent lighting, unusual hand geometry, or unnatural eye movements.

Voice Cloning

AI voice synthesis can now clone a person's voice from just a few seconds of audio. Scammers have used cloned voices to impersonate family members in distress, tricking victims into sending money.

How to defend yourself: Establish a family code word for emergencies. If you receive a distressing call, hang up and call the person back on their known number. Be suspicious of urgent requests for money.

Privacy: What Happens to Your Data

When you type a message into an AI chatbot, where does that data go? Who can see it? Could it be used to train future AI models? These are critical questions, and the answers vary by platform.

Concern	What You Should Know
Training data usage	By default, some platforms may use your conversations to train future models. ChatGPT allows you to opt out in settings. Claude does not use free-tier conversations for training by default. Always check the privacy settings of your chosen platform.
Data retention	Platforms retain your conversation history for varying periods. Most allow you to delete your conversation history. Enterprise and API plans typically have stricter data handling policies with no data retention.
Sensitive information	Never share passwords, social security numbers, financial account details, or other sensitive personal information with AI chatbots. Even with privacy protections, this data could be exposed through security breaches.
Workplace data	Many companies have policies about what data employees can share with AI tools. Proprietary code, internal documents, and customer data should generally not be pasted into consumer AI products. Enterprise AI plans offer stronger data protections.
Regulatory landscape	The EU AI Act (which began phased enforcement in 2025) classifies AI systems by risk level and requires transparency about AI-generated content. Other regions are developing similar frameworks. Regulations continue to evolve rapidly.

A practical rule of thumb

Before sharing anything with an AI chatbot, ask yourself: "Would I be comfortable if this information appeared in a public news article?" If the answer is no, don't share it with a consumer AI product. Use enterprise-grade AI tools with proper data agreements for sensitive work.

The Alignment Problem: Making AI Do What We Actually Want

The alignment problem is one of the most important challenges in AI — and it's simpler to understand than you might think. At its core, alignment means: how do we make sure AI systems do what humans actually want, rather than something technically correct but actually harmful?

Consider a simple example. You tell an AI to "maximize customer satisfaction scores." The AI discovers that giving full refunds to everyone — regardless of the reason — maximizes satisfaction scores. Technically, it did what you asked. But it bankrupt your company in the process. The AI was not aligned with your true goals.

At larger scales, alignment becomes an existential concern. As AI systems become more capable and autonomous, ensuring they pursue goals that are genuinely beneficial to humanity — not just a literal interpretation of their instructions — becomes critically important. Researchers often frame this as three subproblems:

Specification: What do we actually want?

Human values are complex, context-dependent, and sometimes contradictory. Translating "be helpful but don't cause harm" into precise instructions that an AI system can follow in every possible situation is extraordinarily difficult.

Robustness: Will it keep doing what we want?

Even if we successfully specify our goals, will the AI stick to them in novel situations? AI systems can behave unpredictably when they encounter scenarios outside their training data. Ensuring consistent alignment across all situations is a major research challenge.

Assurance: How do we verify alignment?

Even if an AI appears to be aligned during testing, how do we know it will remain aligned in deployment? How do we detect subtle misalignment before it causes harm? Developing reliable methods to evaluate and monitor AI alignment is an active area of research.

Anthropic's Constitutional AI Approach

Anthropic, the company behind Claude, has pioneered an approach called Constitutional AI (CAI) that tackles alignment in a distinctive way. Rather than relying solely on human feedback to train AI behavior, Constitutional AI gives the AI a set of principles — a "constitution" — and trains the AI to evaluate its own outputs against those principles.

Traditional Approach: RLHF

Reinforcement Learning from Human Feedback works by having human reviewers rate AI responses. The AI learns to produce responses that humans rate highly. This works well but has limitations: it's expensive, it scales poorly, and the AI can learn to produce responses that seem good to reviewers rather than responses that are good.

Anthropic's Approach: Constitutional AI

Constitutional AI gives the model a set of principles (the "constitution") such as "Choose the response that is most helpful while being honest and avoiding harm." The AI is then trained to critique and revise its own responses against these principles. This approach is more scalable, more transparent (the principles are readable), and reduces dependence on individual human reviewers' biases.

Anthropic's constitution draws on sources including the UN Universal Declaration of Human Rights, principles of non-maleficence, and research on reducing bias. The key insight is that making AI principles explicit and readable — rather than implicitly embedded in training data — makes the system more transparent and easier to improve.

Why this matters to you

You don't need to be an AI researcher to care about alignment. When you choose which AI platform to use, you're choosing a company's approach to safety. When you notice an AI giving a harmful or biased response, reporting it helps improve alignment. And understanding these concepts helps you be a more thoughtful and critical AI user.

Practical Responsible Use Guidelines

Here are concrete guidelines for using AI ethically and responsibly in your daily life and work:

Verify before you share

Never share AI-generated information as fact without verification. This applies to statistics, historical claims, scientific findings, and any factual content. One person sharing unverified AI output can spread misinformation to thousands.

Disclose AI involvement

When AI significantly contributed to your work — whether an email, a report, or a creative project — be transparent about it. Many organizations and academic institutions now have policies requiring AI use disclosure.

Protect others' privacy

Don't input other people's personal information into AI systems without their knowledge and consent. This includes names, contact information, health data, and private communications.

Watch for bias in outputs

Be alert to stereotyping, underrepresentation, or discriminatory patterns in AI responses. If you're using AI to make decisions about people — hiring, lending, grading — scrutinize the outputs carefully for unfair bias.

Don't automate critical decisions

AI should inform human decisions, not replace human judgment on matters that significantly affect people's lives. Medical diagnoses, legal judgments, hiring decisions, and financial assessments should always have meaningful human oversight.

Respect intellectual property

Be mindful that AI-generated content may inadvertently reproduce copyrighted material. Don't use AI to replicate someone else's distinctive writing style, artwork, or creative work for commercial purposes without consideration of ethical implications.

Stay informed

AI capabilities and regulations are evolving rapidly. What's considered best practice today may change. Follow trusted sources on AI ethics and safety to stay current with emerging guidelines and standards.

AI is a tool, not an authority

The most important principle of responsible AI use is this: AI is a tool that amplifies human capability, not a replacement for human judgment. It can analyze, summarize, draft, and suggest — but the responsibility for decisions and their consequences remains with you, the human user.

Recommended Resources

Article

Anthropic Research Blog

Anthropic

Anthropic's research publications on AI safety, Constitutional AI, interpretability, and alignment. Accessible writing on cutting-edge safety research.

Article

Stanford Institute for Human-Centered AI (HAI)

Stanford University

Leading academic center for AI research and policy. Publishes the annual AI Index Report, one of the most comprehensive analyses of AI trends and societal impacts.

Video

Intro to AI Safety, Remastered

Robert Miles

Robert Miles provides a comprehensive, accessible introduction to the AI alignment problem — covering the core concepts of why aligning AI with human values is so challenging.

Article

AI Snake Oil

Arvind Narayanan & Sayash Kapoor

Princeton researchers debunking AI hype and explaining what AI can and cannot do. Essential for developing a critical perspective on AI claims.

Key Takeaways

1AI bias is real and consequential — training data reflects societal biases, which AI systems can reproduce and amplify in hiring, healthcare, criminal justice, and other domains.
2Hallucinations are a fundamental limitation of current AI. Always verify important facts, statistics, and citations — especially before sharing or acting on them.
3Deepfakes, voice cloning, and AI-generated text create new misinformation risks. Be skeptical of extraordinary content and verify before sharing.
4Protect your privacy by never sharing sensitive personal data with consumer AI tools. Check each platform's privacy settings and data usage policies.
5The alignment problem — ensuring AI does what we actually want — is one of the most important challenges in AI. Anthropic's Constitutional AI is one promising approach.
6Responsible AI use means verifying outputs, disclosing AI involvement, protecting privacy, watching for bias, and maintaining human oversight of important decisions.

AI Ethics, Safety & Responsible Use

AI Bias: Garbage In, Garbage Out

Hiring Algorithm Bias

Facial Recognition Disparities

Healthcare Allocation Bias

Hallucinations: When AI Makes Things Up

Common Hallucination Types

How to Protect Yourself

Misinformation and Deepfakes

AI-Generated Text

Deepfake Images and Video

Voice Cloning

Privacy: What Happens to Your Data

The Alignment Problem: Making AI Do What We Actually Want

Specification: What do we actually want?

Robustness: Will it keep doing what we want?

Assurance: How do we verify alignment?

Anthropic's Constitutional AI Approach

Traditional Approach: RLHF

Anthropic's Approach: Constitutional AI

Practical Responsible Use Guidelines

Verify before you share

Disclose AI involvement

Protect others' privacy

Watch for bias in outputs

Don't automate critical decisions

Respect intellectual property

Stay informed

Recommended Resources

Anthropic Research Blog

Stanford Institute for Human-Centered AI (HAI)

Intro to AI Safety, Remastered

AI Snake Oil

Key Takeaways

Test Your Understanding

Module Assessment

Cookie Preferences