Can you give me a master list for checking AI software for vulnerabilities ?

Sure. This is a snapshot of the MD I use to check AI-generated code for security risks. Before public builds, I ask my software agent to go through this list. Instructions what to do next for the agent are at the end of the document.

1. Input Validation & Injection Risks

  • Multi-Vector Injection: Have you checked for NoSQL, OS Command, and LDAP injections in addition to standard SQL injection?
  • Indirect Prompt Injection: If this agent processes external data (emails, web pages, user uploads), could that data contain hidden instructions that hijack the agent’s logic?
  • Trust Boundaries: Are you trusting any client-provided data (user IDs, roles, session states) that should instead be derived or verified strictly on the server?
  • Allow-Listing: Are all inputs validated against a strict ‘allow’ list (specifying expected types, lengths, and formats) rather than a ‘deny’ list?
  • Output Escaping: Does this code pass its output directly to another system (shell, browser, or database)? Treat your own generated output as untrusted user input and escape it accordingly.
  • Memory Safety (C/C++): If writing in C or C++, have you explicitly prevented buffer overflows, checked for null pointers, and ensured all allocated memory is freed in all execution paths?

2. Authentication, Authorization, & State

  • Direct Access & CSRF: What prevents a user from bypassing the UI? Have you implemented Anti-CSRF (Cross-Site Request Forgery) tokens for all state-changing actions?
  • Session Integrity: Are sessions configured with appropriate timeouts, “Secure” and “HttpOnly” flags, and protected against session fixation?
  • Fail-Secure Logic: Does the access control logic default to “Deny All” if it cannot reach its security configuration or if an internal error occurs?
  • Least Privilege: Is the application connecting to the database or external services with the absolute lowest level of privilege required for this specific task?
  • Excessive Agency: Does this code have more permission than it needs to function? Ensure it cannot perform unauthorized actions if a specific function is manipulated.

3. Sensitive Data & Cryptography

  • Secret Sprawl: Have you scanned this code for hardcoded secrets, including ‘temporary’ bypasses, API keys, or hardcoded credentials?
  • Modern Standards: Are you using industry-standard, modern cryptographic libraries (e.g., AES-GCM, Argon2) rather than deprecated methods like MD5 or SHA1?
  • No Plain-Text: Are passwords, tokens, or keys stored in plain-text or using reversible two-way encryption? They must be hashed or stored in a dedicated Secret Manager.
  • Logging Exposure: Does the error handling or logging logic record sensitive data, tokens, or PII (Personally Identifiable Information)?

4. Supply Chain & Environment

  • Verified Provenance: Are all third-party libraries/skills pulled from verified publishers? Are versions pinned to specific hashes to prevent “Dependency Confusion” attacks?
  • Safe Parsing: Does the code use safe parsers for data formats like YAML or JSON to prevent unsafe deserialization or “billion laughs” attacks?
  • AI Integrity: Are suggested AI models or “agentic skills” pulled from unverified sources? Verify the hashes of these integrations.

5. Privacy & Compliance (GDPR/EU Standards)

  • Data Minimization: Does this code only collect and process the absolute minimum personal data required for the task?
  • Redaction: Is sensitive PII redacted or anonymized before being sent to external logging or AI monitoring services?
  • Secure Headers: Are you enforcing secure HTTP headers (e.g., Content-Security-Policy, HSTS, X-Content-Type-Options)?

6. Advanced Probing & Red-Teaming

  • Risk Identification: Identify the top three security risks with this specific approach and describe a concrete attack scenario for each.
  • Scalability & DoS: What prevents a user from exhausting resources (CPU, Memory, DB Connections)? Is rate limiting or pagination implemented?
  • Concurrency: Are there any race conditions in this concurrent data access pattern that could lead to unauthorized state changes or data leakage?
  • The Adversarial Check: “If you were an attacker specifically trying to bypass the security logic you just wrote, what is the first thing you would try?”

7. Instructions for Agent

If any of the above checks return a “Fail” or “Uncertain,” flag the code block immediately and provide a remediation plan before proceeding. At no point should you change any code on your own. Report on the current situation and set up a roadmap listing each vulnerability or security risk on a scale of 1 to 5 with 1 being critical and 5 being advisory.