#23084
rdrails.
Plan and execute capacity planning, scaling, and resource management for multi-tenant environments.
Develop and enforce backup, disaster recovery, and upgrade strategies with minimal downtime.
Implement security best practices: AuthN/AuthZ, encryption, auditing, and data masking.
Collaborate with engineering and SRE teams to define SLOs, observability dashboards, and alerting for StarRocks infrastructure.
Proven experience architecting and operating StarRocks (or similar OLAP) clusters at scale (10B+ events/day, TBPB data).
Deep understanding of StarRocks FE/BE architecture, storage engines, compaction, and cost-based optimizer.
Expertise in schema design, partitioning, bucketing, and index/encoding strategies for hot/cold data tiers.
Strong troubleshooting skills: root cause analysis, performance profiling, and incident response.
Experience with ingestion pipelines (Kafka/Kinesis), exactly-once semantics, and CDC-based upserts.
Familiarity with materialized view design, refresh policies, and query rewrite mechanics.
Operational excellence: backup/restore, rolling upgrades, resource governance, and multi-tenant management.
Security and compliance: LDAP/OIDC integration, RBAC, encryption, and sensitive data handling.
Excellent communication skills; ability to diagram architectures and explain trade-offs to technical and non-technical stakeholders.
Nice to Have