My coworkers insisted on always returning 200 and having the status code in JSON.
At least at that point it’s clearly not HTTP anymore, and it’s better than pretending to be compliant like your Bob. But something dies inside me whenever I have to work with it.
I'd say returning 200 for all successes is reasonable if the responses are simple.
Returning 200 for an error makes no sense. Having 400s and 500s is the simplest way to have observability over protocol behavior (think logs, error rates, etc). If you use all 200s, you'd have to re-implement observability by yourself, so you lose simplicity that you gained by ignoring those statuses.
It's the same thing with caching stuff. You could implement those outside the protocol, but then you'd be writing your own protocol (trying to be smarter than decades of engineering efforts).
That’s a strong assertion. There’s plenty of status codes that indicate something bad happened at the HTTP level but don’t convey information about the RPC.
I've taken a very operational view of HTTP errors, which is "What do I want things receiving this error to do?" Unfortunately, that's not a clean question since there's no list you can simple consult to get all behaviors that all HTTP error messages cause. The most important of these is, if this is being accessed by a browser, what will the error code make it do?
Fortunately, for a lot of my API-type work, I also get to not care. I don't want some smart cache to think it knows how to cache my responses or anything and don't care about the sort of infrastructure that thinks it understands HTTP doing anything with my request.
200 {"error": "..."} is not necessarily invalid from this point of view, either. 200, the request was successfully processed and the successful result of that request as far as HTTP is concerned is an error. There doesn't seem a great need to tell HTTP there's an error, HTTP doesn't really care. Telling the browser there's an error has some marginal utility, but if it's an API and there's no browser involved, that doesn't matter much either. The 200 isn't going to fool it into thinking it should put the error into the history or whatever.
I've also learned to avoid getting too fancy with the codes. You will invoke some weird behaviors from systems you didn't even know cared about your connection. 200 {"error": "..."} may seem "wrong", but it is also generally safe. It will do what you expect.
It might be nice to live in a world where there are HTTP error codes that are suitable for everything I need, instead of a big pile of useless codes for abortive standards that never came to be and things nobody uses, and an underspecified set of codes for the things I actually want and use, but there's no point pretending that the standard is something other than it is, and as it stands now, a lot of times the HTTP result code is almost useless.
Varnish, HAProxy, Apache mod-proxy, nginx all can do similar things. Some of them can do this even if you always return 200 (by having rewrite rules and so on). It is often better to leave this kind of work to some upper abstract layer. Some of thse codes are only applicable in a layered system (502, for example, often seen when nginx can't reach a backend application), so they seem useless to developers, but they're not.
For APIs, other stuff uses those codes. Tools like DataDog and NewRelic will get better if you use generic 400 and generic 500 for client and server errors respectively. You can make them work with 200s and a little configuration though, but it's extra work.
If you never needed any of this, it's better not to use it.
I should indeed have clarified that the browser web has a much richer set of headers and response codes in use, and they are truly useful, and anyone serving web pages at scale should indeed learn about them. IIRC it's still about 1/3rd to 1/4th of the nominally defined HTTP response codes that are useful, but it's still something.
The non-browser web, they approach useless. Which I'm not happy about and not celebrating or advocating for. It's just how it is.
400 you screwed up vs 500 I screwed up is always better than 200 OK not really.
You can get more specific, and for REST style APIs the correct specific HTTP status code is usually apparent for both successful and unsuccessful requests, but 2xx/4xx/5xx is simple and should be trivial to determine for anything you are using HTTP for even if it's not REST-like.
However, while your mileage may vary, I end up getting the same complaint from the users either way. Even when my 400 contains an exact reason why the input is incorrect.
Granted, on the one hand, this can be fixed on the individual level, but on the other hand, it's the same effect writ small that when writ large makes the response codes nearly useless, so this post is maybe half cathartic grousing. I can't push caring about response codes. I can document it, I can yield detailed errors, and I can be as careful as I like, but this is a "it takes two to tango" situation and at scale, on average, the other end doesn't want to tango.
I think that's one of the key differences between REST and other kinds of RPC architectures.
I've used SOAP and JSON-RPC, both of which (at least in many implementations) send RPCs as HTTP POST requests and receive 200 responses with any error messages in the body. They're just tunneling over HTTP. It's not necessarily wrong, although I'm convinced that leveraging the HTTP verbs and error codes with REST is a fundamentally better design for the use cases I've seen.
At least at that point it’s clearly not HTTP anymore, and it’s better than pretending to be compliant like your Bob. But something dies inside me whenever I have to work with it.