启动DevOps很难 (Startup DevOps is hard)

There are a lot of things to worry about as a startup. Marketing, product development, keeping your team together. Everything tends to take the “Minimum viable” pattern of getting the bare minimum up so you don’t crash and burn.

作为一家创业公司,有很多事情需要担心。 营销,产品开发,使您的团队保持团结。 一切都趋于采取“最小可行”的模式来提高最低限度,以免崩溃和燃烧。

As an enterprise cloud architect, I know first hand how much work can be done in the field of DevOps. As a startup founder, I also know how little time you have to spend on things — it’s more like you have to spend time on all the things at once.

作为企业云架构师,我直接知道在DevOps领域可以完成多少工作。 作为初创公司的创始人,我也知道您花很少的时间花在所有事情上,这就像您必须一次花时间在所有事情上。

Cloud Infrastructure unfortunately also tends to follow this rule, and all the “best practices” in the field tend to follow patterns that require a large amount of time investment, something startups definitely don’t have.


With this guide, I hope to give you an overview of what a “minimum viable cloud infrastructure” can look like, with a focus on stability, security, and scalability.


稳定吗? (Stability ?)

When looking at the stability of your cloud infrastructure, there are a few key points to focus on when developing minimum viable cloud infrastructure. Restoring from catastrophic failure, automatic restart, and making sure there are enough resources available. If you focus on these three things, you should be in a pretty good place in terms of your uptime.

在查看云基础架构的稳定性时,在开发最低限度可行的云基础架构时需要注意几个关键点。 从灾难性故障中恢复,自动重启,并确保有足够的可用资源。 如果您专注于这三件事,那么就正常运行时间而言,您应该处于一个相当不错的位置。

从灾难性故障中恢复(自动备份) (Restoring from catastrophic failure (Automatic Backups))

You know the worse case scenario — you bricked your server and disk. The minimum viable solution to this is to have scheduled, automated backups taken so you prevent data loss.

您知道最坏的情况-您使服务器和磁盘变砖了。 最小可行的解决方案是进行计划的自动备份,以防止数据丢失。

Depending on your cloud provider, there are a few different options you can take. Snapshotting disks is generally the simplest way to do a minimum viable backup process, but more advanced (and more stable) methods include database specific backups (dumping the database) and distributed systems.

根据您的云提供商,您可以采取几种不同的选择。 快照磁盘通常是执行最小可行备份过程的最简单方法,但是更高级(更稳定)的方法包括特定于数据库的备份(转储数据库)和分布式系统。

  • AWS


    If you are using Amazon, I would recommend using CloudWatch. It lets you create scheduled jobs (such as automatic snapshots) —

    如果您使用的是Amazon,我建议您使用CloudWatch。 它使您可以创建计划的作业(例如自动快照)—

  • GCP


    Google allows you to schedule snapshots as well —


  • Cloud Agnostic


    Don’t want to lock your backup process to your cloud provider? Your most important data will be the database and any uploads that may be provided. For a database, you should look to write a script that periodically dumps the database and sends the data to a secure location (private s3 bucket, distributed file system, etc.) This will be more prone to error than a platform specific method, however, so be wary.

    不想将备份过程锁定到云提供商吗? 您最重要的数据将是数据库以及可能提供的任何上载。 对于数据库,您应该编写一个脚本来定期转储数据库并将数据发送到安全位置(私有s3存储桶,分布式文件系统等)。这比平台特定的方法更容易出错,但是,所以要小心。

确保测试您的备份还原方法,否则将有风险在这里,他们的所有5种备份方法均因未测试还原而失败。

在服务器重启的情况下自动重启服务 (Automatic service restarting in case of server reboot)

There are two parts to automatic restarting. One, when your app crashes, does it start up again? And two, when your server reboots, does your app start up automatically?

自动重启分为两部分。 一,当您的应用崩溃时,它会再次启动吗? 第二,当服务器重新启动时,您的应用程序是否会自动启动?

Crontab —Crontab is a useful tool that lets you schedule jobs easily. Perhaps the simplest approach to auto-start your stack is to create a crontab job that gets run on reboot — .

Crontab-Crontab是一个有用的工具,可让您轻松安排作业。 自动启动堆栈的最简单方法也许是创建一个crontab作业,该作业在重新启动时运行— 。

/etc/init.d — Most systems support init.d scripts. With init.d you can define scripts which can be started at boot and also support stop, start, and status commands (eg. service start myscript) to give you more control over your applications. It’s a bit more complex than a crontab, but it gives you more features — .

/etc/init.d-大多数系统支持init.d脚本。 使用init.d,您可以定义可以在引导时启动的脚本,还支持停止,启动和状态命令(例如service start myscript ),以使您可以更好地控制应用程序。 它比crontab复杂一点,但是它为您提供了更多功能- 。

If you are interested in the differences between these methods, check out .

如果您对这两种方法之间的差异感兴趣,请查看 。

在应用程序崩溃的情况下自动重启服务 (Automatic service restarting in case of application crash)

Applications are not always stable and can be prone to crash at awkward times. A good way to maintain stability is to have a tool which can automatically restart.

应用程序并不总是稳定的,并且在尴尬的时期容易崩溃。 保持稳定性的一个好方法是拥有一个可以自动重启的工具。

  • NodeJS — or

    NodeJS — 或

  • General —


始终确保有足够的可用资源 (Always ensure there are enough resources available)

One of the most common reasons for server downtime is servers running out of resources. I’ve had SQL servers die from running out of disk space and production applications die from running out of memory. Setting up monitoring of resources is a good way to mitigate this risk.

服务器停机的最常见原因之一是服务器资源不足。 我曾因为磁盘空间不足而导致SQL服务器死亡,而由于内存不足而使生产应用程序死亡。 设置资源监视是减轻此风险的好方法。

  • AWS — is a good tool for monitoring. You can set up email alerts on specific events.

    AWS – 是监视的好工具。 您可以针对特定事件设置电子邮件警报。

  • GCP — provides similar functionality, and also integrates with messaging systems like Slack.

    GCP — 提供类似的功能,并且还与Slack等消息传递系统集成。

  • Cloud Agnostic — Crontab is good again for this kind of task, but you will need to write a script which will check system resources and send emails when they reach your threshold.


确保确保记录自动启动方法和启动脚本。 将代码保留在版本控制中,否则由于遗忘的神秘代码,在扩展规模时会遇到麻烦。

安全性? (Security ?)

Security is unfortunately overlooked when it comes to MVP philosophy. People just don’t see the value gained for the time investment needed. This is a form of dangerous gambling, as a security breech could cause severe loss of data, customer trust, and time. Here are some basic things you can do to get started with a security mindset.

不幸的是,当涉及MVP哲学时,安全性被忽略了。 人们只是看不到所需时间投资所获得的价值。 这是危险赌博的一种形式,因为安全门可能会导致数据,客户信任度和时间的严重损失。 您可以按照以下基本操作来开始学习安全性心态。


Nowadays, SSL is basically a requirement for a modern SaaS app with many users refusing to use applications without https support. Tools like make getting certificates easy and free.

如今,SSL已基本成为现代SaaS应用程序的要求,许多用户拒绝使用没有https支持的应用程序。 诸如“ 类工具可您轻松免费获得证书。

服务器安全性 (Server Security)

One of the most important things when it comes to security is managing servers properly. Here are a couple basic tips you should be keeping in mind.

关于安全性,最重要的事情之一就是正确管理服务器。 这是您应该牢记的几个基本技巧。

  • Databases should not be accessible to the open internet.

  • Keep applications and operating system up to date. There are often security updates which protect your server from new vulnerabilities.

    保持应用程序和操作系统为最新。 通常会有安全更新程序可保护您的服务器免受新漏洞的侵害。
  • Close all ports except those that are absolutely necessary.

  • Do not use username/passwords — using keys is much more safe.

  • Do not give people the root key when they need access to your server. Make new accounts and have them give you their public key.

    当人们需要访问您的服务器时,请勿为他们提供根密钥。 创建新帐户,并让他们为您提供公钥。

秘密管理 (Secret Management)

API keys, credentials, configurations, and all sensitive data needs to be managed. I’m always hesitant when placing this kind of data on the cloud, not only because I don’t know what the cloud provider can look at, but also because if they get my account, all my secrets become exposed.

API密钥,凭据,配置和所有敏感数据都需要进行管理。 在将此类数据存储到云中时,我总是很犹豫,这不仅是因为我不知道云提供商可以看些什么,而且还因为如果他们获得了我的帐户,我的所有机密都会暴露出来。

  • Keep as many secrets local as possible.

  • Don’t hardcode secrets into your application — create configuration files you can store outside of app code.

  • Don’t store secrets in a public Github repo (be wary of the cloud in general).

  • Avoid plaintext when storing user passwords and your own secrets


可扩展性? (Scalability ?)

在大多数情况下,涉及可伸缩性时, 开始时) 。

If you have the time, the will, and the skills (or money), putting some effort into scalability could give you future benefits. If not, I’d recommend ignoring it and focusing on the previous two points.

如果您有时间,意愿和技能(或金钱),那么对可伸缩性进行一些努力可以为您带来未来的收益。 如果没有,我建议忽略它,而将注意力集中在前两点上。

Focus on delivering your product to your first 5 customers, not your first 1,000. The best you can do when it comes to building scalable infrastructure is think about design principals while building your app so it wont be too much work to get going when it’s finally time to scale. I should know — I’ve fallen for the over-engineering trap many, many times.

专注于将产品交付给您的前5个客户,而不是前1,000个。 在构建可伸缩基础架构时,您能做的最好的事情就是在构建应用程序时考虑设计原则,这样在最终扩展时就不会花费太多的工作。 我应该知道-我迷上了很多次工程过度的陷阱。

货柜化 (Containerization)

An easy win when it comes to scaling is to containerize your application. Check out Docker for a good guide. Here are some tips:

在扩展方面,一个简单的胜利就是将应用程序容器化。 查看Docker以获得良好指南。 这里有一些提示:

  • Allow configuration of your app via environment variables. Things like database info and initial admin username/password will go a long way when it comes to building a CI/CD pipeline and automating your app deployment.

    允许通过环境变量配置您的应用程序。 建立CI / CD管道和自动化应用程序部署时,诸如数据库信息和初始admin用户名/密码之类的东西将大有帮助。
  • Keep as much state out of your container as possible. This will allow for stateless deployments via tools like Kubernetes.

    尽可能将状态保持在您的容器之外。 这将允许通过Kubernetes之类的工具进行无状态部署。
  • Install your modules as part of the build process to reduce dependencies and image size.


妥善记录服务器的配置 (Keep your servers’ configurations well documented)

Store everything in version control: configurations, scripts, and procedures to prepare servers. This will save you when it comes to scaling. I’ve had to deal with scaling apps that require servers configured in a very particular way, and if the documentation is lacking you will be in for a hell of a time.

将所有内容存储在版本控制中:准备服务器的配置,脚本和过程。 当进行缩放时,这将为您节省。 我不得不处理扩展应用程序,这些应用程序要求以非常特殊的方式配置服务器,如果缺少文档,您将陷入困境。

结论 (Conclusion)

There is a lot of work involved with standing up and maintaining cloud infrastructure. Startups have it hardest because they have no time, and often, their skillset is lacking when it comes to DevOps. What you can do is focus on the essentials. Security, Stability, and if you have the time, Scalability.

站起来和维护云基础架构涉及很多工作。 初创企业最困难,因为他们没有时间,而且通常在DevOps方面缺乏技能。 您可以做的是专注于基本要素。 安全性,稳定性,如果有时间,还可以扩展。

通过自动执行部署(CI / CD),管理订阅以及消除您和客户之间的共同点来帮助您扩展SaaS。 ( helps you scale your SaaS by automating deployments (CI/CD), managing your subscriptions, and removing common points of friction between you and your customers. )




