The problem with both is that you quickly accumulate weeks/months of accumulated coding time that costs a pretty penny at market freelance rates. Spending a few hundred K on devops is routine for a lot of companies.
My main issue with this is that a lot of that coding is just reinventing the same wheels over and over again. Jumping through many hoops along the way like a trained poodle.
It's stupid. Why is this so hard & tedious. I've seen enough infrastructure as code projects over the last fifteen years to know that actually very little has changed in terms of the type of thing people build with these tools. There's always a vpc with a bunch of subnets for the different AZs. Inside those go vms that run whatever (typically dockerized these days) with a load balancer in front of it and some cruft on the side for managed services like databases, queues, etc. The LB needs a certificate. I've seen some minor variations of this setup but it basically boils down to that plus maybe some misc cruft in the form of lambdas, buckets, etc.
So why is it so hard to get all of that orchestrated? Why do we have to boil the oceans to get some sane defaults for all of this. Why do we need to micromanage absolutely everything here? Monitoring, health checks, logging systems, backups, security & permissions. All of this needs to be micromanaged. All of it is disaster waiting to happen if you get it wrong. All of it is bespoke cruft that every project is reinventing from scratch.
All I want is "hey AWS, Google, MS, ... go run this thing over there and tell me when its done. This is the domain it needs to run over. It's a bog standard web server expecting to serve traffic via http/websockets. Give me something with sane defaults. Not a disassembled jig saw puzzle with thousands of pieces. This stuff should not be this hard in 2021.
PaaS has existed since the mid-2000s. It turns out people don't want it -- none of them ever got more than a miniscule fraction of the market for boring workloads that 90% of companies are using IaaS for. People want knobs & levers. Just look at the popularity of Kubernetes, it is nothing but knobs & levers.
Kubernetes, once deployed, has surprisingly little knobs to tweak for the end user. You might have to pick between a StatefulSet and a Deployment depending on your workload, but that's about it.
Kubernetes cleanly separates reponsibility between maintainers of the platform (which have to make decisions on how to deploy it, and in cloud environments it's the cloud provider's job to do this) and users of the platform (which use a universal, fairly high-level API that's universal across clusters and cluster flavours). It's usually the former that people complain about being difficult and complex: picking the networking stack, storage stack, implementing cluster updates, ... that matters if you, for some reason, want to run a k8s cluster from scratch. But given something like a GKE cluster and a locally running kubectl pointing to it, it takes much less effort to deploy a production workload there than on a newly created AWS account. And there's much less individual resources and state involved.
Are Kubernetes capabilities not something that Cloud providers should have made available from the beginning. Meaning, its only possible future, is no future at all. Those capabilities should have been there from the beginning or will be in near (very short) future?
Different cloud providers did different things. Google's cloud offerings started with things like GAE, a very high-level service that many people ignored because it was too alien. AWS, on the other hand, provided fairly low-level abstractions like VMs (with some aaS services sprinkled in, but distinctly still 'a thing you access over HTTP over the network'). Both offerings reflect the companies' internal engineering culture, and AWS' was much less alien and more understandable to the outside. Now every other provider basically clones AWS' paradigm, as that's where big enterprise contract money is, not in doing things well but different.
With Kubernetes we actually have something close to a high-level API for provisioning cloud workloads (it's still no GAE, as the networking and authentication problems are still these but can be solved in the long term), and the hope is that cloud providers will implement Kubernetes APIs as a first class service that allows people to truly not care about underlying compute resources. Automatically managed workloads from container images are effectively the middle ground between 'I want a linux VM' pragmatists and 'this shouldn't be this much work' idealists.
With GKE you can spin up a production HA cluster in a click [1], but you still have to think how many machines you want (there's also Autopilot, but it's expensive and I have my problems with it). AWSs' EKS is a shitshow though, it basically requires interacting with the same networking/IAM/instance boilerplate as for any AWS setup [2].
It might also be the wrong incentives being passed around. I mean, if you're hired and paid to push knobs and levers, you'll choose a tool with knobs and levers. Even with more of them.
GAE did this way back when. They give you a high level Python/Java API that works both locally and on prod, and you just push a newer version of your codebase with a command line tool - no need for containers, build steps, dealing with machines and operating systems, configurable databases, certificates, setting up traces or monitoring... No need to set anything up for that particular GAE, just create a new project, point a domain to it if you're feeling fancy, and off you go.
But in the end, the industry apparently prefers low-level AWS bullshit, where you have to manually manage networks, virtual machines, NAT gateways, firewall rules, load balancers, individually poke eight different subsystems just to get the basic thing running … It’s just like dealing with physical hardware, just 10x more expensive and exploiting the FOMO of ‘but what happens if we need to scale’.
I've been working with AWS CDK for a little while now, and it kind of has some of what you want.
In my case, I wanted a scheduled container to run once per day. Setting it up manually with CF or Terraform would have been a lot of work defining various resources, but CDK comes with a higher-level construct[1] that can be parameterized and which will assemble a bunch of lower level resources to do what I want. It was a pretty small amount of Python code to make it happen.
The AWS CDK is getting closer to this. Your standard VPC setup is extremely simple now. Like
new Vpc()
simple. The tooling definitely has its quirks, but is steadily improving, and you can drop into CloudFormation from CDK (with TypeScript type checking) when needed.
Although your headline is correct imho, I think there are lots of things that the orchestrator might need to do above this, especially if you want the tool to be cloud agnostic so that everyone can use it, which just makes things a little more complicated.
You might want to add something to the existing system, like another web server. These tools have to "add item and wire it up to load balancer".
You might want to scale up. This might work natively or it might require creating new larger instances and then getting rid of the old ones.
You might want to update the images to newer versions.
You might need more public IPs.
You might be adding something to an existing larger network so you need to reference existing objects.
You might need to "create if not exists"
etc. I think your argument covers the intiial use-case in most places but any system used over time will need the other stuff done to it, hence the "complexity" of the tools. tbf, I don't think Terraform is that complex in itself, I think because it is in config files, it can be more complex to understand and work with.
Still the argument stands. Also the points you listed should be expected as standard - everybody will need sooner or later scaling or image updates right?
I agree. And you just described pass like Render.com, Heroku and diy paas like Convox and I'm sure others. I don't understand why companies, especially young - mess with all the low level infra stuff. It's such a time suck and you end up with fragile system.
> So why is it so hard to get all of that orchestrated? Why do we have to boil the oceans to get some sane defaults for all of this. Why do we need to micromanage absolutely everything here?
This post truly resonates with me, however i don't think that we appreciate just how many things are necessary to run a web application and do it well. There is an incredible amount of complexity that we attempt to abstract away.
Sometimes i wish that there'd be a tool that could tell me just how many active code lines are responsible for the processes that are currently running on any of the servers and in which languages. Off the top of my head, what's necessary to ship an enterprise web app in 2021.
RUNTIMES - No one* writes web applications in assembler code or a low level language like C with no dependencies - there is usually a complex runtime like JVM (for Java), CLR (for .NET), or whatever Python or Ruby implementations are used, which are already absolutely huge.
LIBRARIES - Then there are libraries for doing common tasks in each language, be it serving web requests, serving files, processing JSON data, doing server side rendering, doing RPC or some sort of message queueing etc, in part due to there not being just one web development language, but many. Whether this is a good thing or a bad thing, i'm not sure. Oh, and front end can also be really complex, since there are numerous libraries/frameworks out there for getting stuff rendering in a browser in an interactive way (Angular, Vue, React, jQuery), each with their own toolchains.
PACKAGING - But then there are also all the ways to package software, be it Docker containers, other OCI compatible containers (ones that have nothing to do with the Docker toolchain, like buildah + podman), approaches like using Vagrant, or shipping full size VMs, or just copying over some files on a server and either using Ansible, Chef, Puppet, Salt or manually configuring the environment. Automating this can also be done in any number of ways, be it GitLab CI, GitHub Actions, Jenkins, Drone or something else.
RUNNING - When you get to actually running your apps, what you have to manage is an entire operating system, from the network stack, to resource management, to everything else. And, of course, there are multiple OS distributions that have different tools and approaches to a variety of tasks (for example, OpenRC in Alpine vs systemd in Debian/Ubuntu).
INGRESS - But these OSes also don't live in a vacuum so you end up needing a point of ingress, possible load balancing or rate limiting, so eventually you introduce something like Apache, Nginx, Caddy, Traefik and optionally something like certbot for the former two. Those are absolutely huge dependencies as well, just have a look at how many modules the typical Apache installation has, all to make sure that your site can be viewed securely, do any rate limiting, path rewriting etc.!
DATA - And of course you'll also need to store your data somewhere. You might manage your databases with the aforementioned approaches to automate configuration and even running them, but at the end of the day you are still running something that has decades of research and updates behind them, regardless of whether it's SQLite, MariaDB, MySQL, PostgreSQL, SQL Server, S3, MongoDB, Redis or anything else. All of which have their own ways of interacting with them and different use cases, for example, you might use MariaDB for data storage, S3 for files and Redis for cache.
SUPPORT - And that's still not it! You also probably want some analytics, be it Google Analytics, Matomo, or something else. And monitoring, something like Nagios, Zabbix, or a setup with Prometheus and Grafana. Oh and you better run something for log aggregation, like ELK or Graylog. And don't forget about APM as well, to see what's going on in your app in depth, like Apache Skywalking or anything else.
OTHERS - There can be additional solutions in there as well, such as a service mesh to aid with discoverability of services, circuit breakers to route traffic appropriately, security solutions like Vault to make sure that your credentials aren't leaked, sometimes an auto scaling solution as well etc.
In summary, it's not just because of there being a lot of tools for doing any single thing, but rather that there are far too many concerns to be addressed in the first place. To that end, it's really amazing that you can even run things on a Raspberry Pi in the first place, and that many of the tools can scale from a small VPS to huge servers that would handle millions of requests.
That said, it doesn't have to always be this complex. If you want to have a maximally simple setup, just use something like PHP with a RDBMS like MariaDB/MySQL and server side rendering. Serve it out of a cheap VPS (i have been using Time4VPS, affiliate link in case you want to check them out: https://www.time4vps.com/?affid=5294, though DigitalOcean, Vultr, Hetzner, Linode and others are perfectly fine too), maybe use some super minimal CI like GitLab CI, Drone, or whatever your platform of choice supports.
That should be enough for most side projects and personal pages. I also opted for a Docker container with Docker Swarm + Portainer, since that's the simplest setup that i can use for a large variety of software and my own projects in different technologies, though that's a personal preference. Of course, not every project needs to scale to serving millions of users, so it's not like i need something advanced like Kubernetes (well, Rancher + K3s can also be good, though many people also enjoy Nomad).
Edit: there are PaaS out there that make things noticeably easier for you by focusing on doing some of the things above for you, but that can lead to a vendor lock, so be careful with those. Regardless, maybe solutions like Heroku or Fly.io are worth checking out as well, though i'd suggest you read this article: https://www.gnu.org/philosophy/who-does-that-server-really-s...
> There's always a vpc with a bunch of subnets for the different AZs.
It’s funny because you’re already out of touch with how a lot of people would avoid having to do this in 2021. If your stack was simpler, you might not have such infra as code dependencies.
When you say “if your stack was simpler” do you mean “if your problem was trivial”? I’m always interested in simpler ways to do things, but solving distributed system and data governance issues tends to involve putting things in different places.
Deploy everything using a cloud native architecture and have all services internet facing, read about "zero trust networks" to understand more about securing such things.
Maybe there are "data governance issues" stopping the internet facing thing from happening. But if not, that's a more modern approach then three tier network segmentation.
My main issue with this is that a lot of that coding is just reinventing the same wheels over and over again. Jumping through many hoops along the way like a trained poodle.
It's stupid. Why is this so hard & tedious. I've seen enough infrastructure as code projects over the last fifteen years to know that actually very little has changed in terms of the type of thing people build with these tools. There's always a vpc with a bunch of subnets for the different AZs. Inside those go vms that run whatever (typically dockerized these days) with a load balancer in front of it and some cruft on the side for managed services like databases, queues, etc. The LB needs a certificate. I've seen some minor variations of this setup but it basically boils down to that plus maybe some misc cruft in the form of lambdas, buckets, etc.
So why is it so hard to get all of that orchestrated? Why do we have to boil the oceans to get some sane defaults for all of this. Why do we need to micromanage absolutely everything here? Monitoring, health checks, logging systems, backups, security & permissions. All of this needs to be micromanaged. All of it is disaster waiting to happen if you get it wrong. All of it is bespoke cruft that every project is reinventing from scratch.
All I want is "hey AWS, Google, MS, ... go run this thing over there and tell me when its done. This is the domain it needs to run over. It's a bog standard web server expecting to serve traffic via http/websockets. Give me something with sane defaults. Not a disassembled jig saw puzzle with thousands of pieces. This stuff should not be this hard in 2021.