We're OneText, a YC backed (Winter '23) startup in the Bay Area, and we're looking for a Devops/DBA Engineer
We're growing faster than we can manage! Since raising our seed round, we've:
Come up with more ideas for features than we could build in a lifetime, and shipped a ton of them anyway
Solved our fair share of scaling issues, allowing us to process tens of millions of webhooks and outbound messages per hour
Built up some huge revenue streams from our dedicated customers, who want even more of what we have to offer, helping us gear up towards our targets for raising our Series A
Had a lot of fun integrating AI into every part of our product
So: join us if you like the idea of a startup environment that is fast paced, but in a sustainable way. There are no shortage of fun engineering challenges and new things to learn. But we always want to be deliberate and smart about what we decide to build, and not just race from one thing to the next.
Email any questions to: engjobs@onetext.com
Click here to see the general job description for all engineers at OneText. This link has all the information about what it's like to interview for and work here!
But for this role specifically, here's what we're looking for:
We use a mixture of:
Postgres: our core database, used for everything that we want to stick around forever and be highly durable/consistent. Accounts, payments, billing, configuration, and so on.
Mongo: Used for tracking and scheduled tasks. Also planning to migrate to mongo for very write-heavy tables like messages, fees, events, etc.
Redis: Used for caching, and data that only needs a finite ttl like temporary short-links
AWS SQS & Event Bridge: Used for scheduling and queuing lambdas for high-throughput tasks like campaigns.
Clickhouse: Used for analytics
We need to you to be experienced in:
Writing really optimal and well-formed queries
Picking the right database type for the right job
Scaling all of these vertically or horizontally
Building in indexes, partitions, etc.
Warehouseing older data for larger tables
Monitoring the performance and health of these databases
Right now we primarily use DigitalOcean, but we have started to use AWS for some new services. We feel there is a strong case to move over completely to AWS or GCP as we scale.
We want you to have strong opinions on which cloud providers are great and which are not. And we want you to come up with a plan for the future of how OneText lives and is deployed to the perfect platform. And how we migrate to get there.
We have two kinds of traffic:
Steady traffic (usually initiated via webhook events from shopper actions on our customers' stores)
Burst traffic from our customers scheduling campaigns to hundreds of thousands or millions of customers at once (especially during holiday periods like Black Friday)
Right now we have:
A front-end written in React
An API layer written in Node
A worker layer written in Node
Postgres/Mongo/Redis databases
A campaign engine written using AWS Event Bridge, SQS and Lambas
We want you to help us optimize these for scale. Whether that's figuring out a good strategy to help our worker churn through as many tasks as fast as possible, or migrating certain operations from one database type to another, or delegating tasks from our api to our workers, or anything to help us be fast when we really need to be.
Find a problem with how we manage concurrency in accepting new tasks? Notice our database connections aren't being pooled properly? We want you to be able to jump into our app and make any fixes you need to. So some knowledge of Node/TypeScript would be very helpful, but a willingness to learn and ask questions is the most important thing here.
We have two major use cases in mind here:
We want to be able to segment users based on properties or events they have attached to their account. This is mainly used to be able to schedule campaigns to the correct set of users.
We want to be able to get really good reporting for messages, revenue, clicks, fees, roi and so on for all of the sms based flows and automations we run.
We're thinking an OLAP database would be a good fit for these two problems. We've been relying too much on our production database for analytical tasks like these. We've started building on Clickhouse for this reason.
We would like you to be familiar enough with Clickhouse or other OLAP databases, or willing to learn, enough to start solving for these.
What happens when we get errors, when databases go down or run out of space, when our cloud provider fails to deploy our code?
We want to have good contingencies and backups for all of these cases, and enough redundancy that we can keep our app highly available and able to deploy at any time.
We want to go as fast as possible from merging in new code (once it's tested and guaranteed to be stable) to having that code build and hit the production site, with any tests run and database migrations performed, and so on.
We also want to make sure we have good testing environments to give us as much confidence as possible before deploying new code.