SanJoseCARecruiter Since 2001
the smart solution for San Jose jobs

Principal Site Reliability Engineer

Company: OKX
Location: San Jose
Posted on: May 26, 2023

Job Description:

Who We Are

OKX is one of the world's largest and fastest growing cryptocurrency exchanges. We help millions of people buy and sell bitcoin, and over 30 other crypto assets every day - but our work is a whole lot more than that. We're building an inclusive future of finance, one that opens new opportunities to learn financial literacy, store value, and build wealth for everyone.

Ready to help the next billion people experience the future of finance with us? Come on board.

About the Team

Site Reliability Engineering is a critical engineering discipline and a job function in the company. Its charter is to

  • Triage all production issues and devise remedies to stabilize the problem until engineering can rollout proper remediation
  • Devise metrics to provide baselines where engineering can work to ensure that our platform achieves 95% uptime
  • Build tools and infrastructure that promote early detection of production failures, leading to a stellar customer experience Our work is to drive safety, health and uptime of our platform, and the ability to remedy unforeseen problems. By removing some of the complex burdens in how to scale and maintain uptime in distributed systems, SRE allows development teams to focus on feature development instead of the nuances of achieving and maintaining service level commitments.

    About the Opportunity

    We're looking for a creative and driven individual that can spearhead our effort to push "outside the box" infrastructure implementations, that will have a tremendous impact on our platform's stability and scalability.

    What You'll Be Doing:
    • Responsible for the maintenance, configuration and delivery of public cloud (AWS/ AliCloud) products and services;
    • Responsible for the microservice deployment, configuration and maintenance of online system;
    • Responsible for the design, development, upgrading and maintenance of the maintenance platform, and build a stable operation and maintenance platform;
    • Responsible for the construction and architecture design of maintenance system;
    • Responsible for gateway / containerization construction and landing;
    • Responsible for the preparation of relevant documents of operation and maintenance standardization and the formulation of operation and maintenance specifications. What We Look For In You:
      • 10+ years of experience as site reliability engineer, working with Java based cloud applications
      • Strong knowledge in an object oriented language (preferablly Java) in order to improve monitoring and debug reported issues
      • Knowledge of Cloud providers (AWS, GCP, Azure)
      • Knowledge of containerization (Docker, Kubernetes)
      • Knowledge of IAC (Terraform, Vault)
      • Knowledge of Continuous Integration/Delivery to production environments
      • Knowledge of Pub/Sub technology (Kafka, Amazon MSK)
      • Knowledge of database administration
      • Good understanding of network technology (CDN, DNS, TCP/IP, and other networking protocols)
      • Understands cloud security and is comfortable implementing a least-access privilege strategy
      • Experience working with teams across offices and time zones
      • Fluent in Mandarin and English Nice to Haves:
        • Knowledge of deploying applications on AliCloud Highlights of Perks and Benefits:
          • Market competitive total compensation package
          • Comprehensive insurance package including medical, dental, vision, disability & life insurance (Company pays 100% for employee/80% for dependents)
          • 401K with company contribution
          • Paid Parental Leave
          • Employee Referral Bonus Program paid in BTC
          • Company Donation Match
          • More surprises when you join!
            Okcoin Statement:

            OKX is committed to equal employment opportunities regardless of race, color, genetic information, creed, religion, sex, sexual orientation, gender identity, lawful alien status, national origin, age, marital status, and non-job related physical or mental disability, or protected veteran status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

            The [salary] / [hour wage] range for this position is $230,000 to $340,000

            The [salary] / [hour wage] offered depends on a variety of factors, including job-related knowledge, skills, experience, and market location. In addition to the [salary] / [hourly wage], a performance bonus and long-term incentives may be provided as part of the compensation package, as well as a full range of medical, financial, and/or other benefits, dependent on the position offered. Applicants should apply via OKX internal or external careers site."

Keywords: OKX, San Jose , Principal Site Reliability Engineer, Professions , San Jose, California

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest California jobs by following @recnetCA on Twitter!

San Jose RSS job feeds