mple production at scale. Define construction methodologies, tooling requirements, and quality standards. Ensure sample coverage systematically maps to the Abuse × Evasion attack surface.
- Adversarial Evaluation & Remediation: Design and execute adversarial evaluation programs that measure the platform's true defensive capability under realistic attack scenarios. Establish evaluation cadences, coverage metrics, and reporting mechanisms. Drive the feedback loop between Red Team findings and downstream policy, model, and operations improvements.
- AI Agent Development: Lead the design and deployment of LLM-powered agent systems to automate and scale Red Team operations, including signal triage, evasion pattern matching, abuse pattern analysis, and sample construction guidance. Define agent workflows, human-in-the-loop review protocols, and automation quality benchmarks. Continuously evaluate and expand the boundaries of AI-driven adversarial analysis.
Qualifications
Minimum Qualifications
- Bachelor's degree or above in Computer Science, Information Security, or related fields.
- 5+ years of experience in content security, trust & safety, or adversarial testing; experience in large-scale consumer platforms.
- Strong understanding of content moderation systems
- Hands-on experience with AI/LLM applications: agent workflow design, prompt engineering, or building LLM-powered automation tools.
- Proven ability to manage complex, multi-workstream programs with cross-functional dependencies.
Preferred Qualifications
- Experience building AI agent pipelines for operational automation (signal triage, classification, structured output generation).
- Familiarity with adversarial attack techniques against ML systems (evasion, prompt injection, multimodal attacks).
- Experience with threat intelligence, OSINT, or dark web monitoring in a trust & safety context.
- Knowledge of international content security regulations and cross-regional policy compliance.