Most roles are based in San Francisco, on-site.
Own the devops and hardware side of our inference stack: GPU / ASIC load balancing, model placement across racks based on live utilization, and end-to-end latency.
Build the serving runtime on top of our ASIC hardware: batching, KV cache, scheduling, and the OpenAI-compatible API surface.
Run our $200M asset-backed equipment facility end-to-end — model, data room, lender process, term sheets, close — and own the capital stack from here.