Almost every third party ML model I look at seems to have different versions, different dependencies, and requires deliberate trial and error when creating container images. It's a mess.
Having interpreters and packages strewn across the machine is a nightmare. The lack of standard tooling has created a lawlessly dangerous wild west. There are no maps, no guardrails, and you have to beware of the hidden snakes. It goes against the zen of python.
As a counter example, Rust packs everything in hermetically from the start. Python4 [1] could use this as inspiration. Cargo is what package and version management should be, and other languages should adopt its lessons.
[1] Let's make a clean break from Python3 even if we don't need a new version right now.
The ML community has horrendous engineering practices. Everyone knows this. This isn’t the fault of Python, nor should Python cater to people who build shoddy scaffolding around their black boxes.
I mean, you're not entirely wrong but Python really really doesn't make it easy.
Consider R, which is filled with the same kind of people. There's one package repository and if your package doesn't build cleanly under the latest version of R, it's removed from the repo.
Don't get me wrong, this has other problems but at least it means that all packages will work with a single language version.
> I mean, you're not entirely wrong but Python really really doesn't make it easy.
That's a vast exaggeration. It is not "really really" hard to spin up a venv and specify your requirements. People just don't do it, and blame the tools for what are bad engineering practices agnostic to any language.
"Really really" not easy would be handling C, C++, etc. dependencies.
Generally that is a straight forward process of compiling, reading the error message, googling “$dist install $dirname of missing dep” running the apt-get / emerge / yum “ command and then repeating the compile command. Sometimes people will depend on a rare and not bundled dep, but not that often. Worst case you need to upgrade auto make tool chain or rebuild boost or something.
Maybe more time than getting python deps to work but more deterministic and takes less cleverness.
I work in data science in python (and the parent was about ML) and basically everything in that space has C and Fortran level dependencies and this is where Python is really really bad, so no it is not as simple as you're making out.
I really really wish it was, as then I wouldn't have had to learn Docker.
Python is a much older and generalist language than R, so yes, while it would be great to impose this kind of order on things, it’s not practical for its current extent of use.
That being said, after two decades of using Python professionally, the only really problems I’ve ever encountered are “package doesn’t support this version for {reasons}” and “ML library is doing something undocumented and/or dumb that requires a specific Python version.” The former is normally because the package author is no longer maintaining their package and the latter is because, again, the ML community is among the absolute worst at creating solid tooling.
I don't disagree that Python's place in the ecosystem ("generalist" - i.e. load-bearing distro fossilization in everything from old binary linux distros, container layers, SIEM/SOAR products, serverless runtimes...) leads to much packaging complexity that R just doesn't have
However, Python (1991) is only 2 yrs older than R (1993)
Rust and Node (via nvm) feel good. The worst I run into is “this version of node isn’t installed” and then I just add it. And I don’t have to worry about where dependencies are being found. Python likes to grab them from all over my OS.
I use direnv and pyenv. When I cd to a repo/directory, the .envrc selects the correct Python and the directory has its own virtual environment into which I install any dependencies. I don't find that Python grabs packages from all over the OS.
pyenv works locally, no matter what the project opts to use. The only thing it needs for a project 'to be managed' is a .py-version file, which you can throw in .gitignore
It doesn't matter what you do. The vast majority of code I'm using from other people doesn't. Even my personal python methodology differs from yours.
Plus, you now have to teach and evangelize your method versus the dozens of others out there. It's crazy town.
The negative thoughts and feelings I once had for PHP are now directed mostly at Python. PHP fixed a lot of its problems over the last decade. Python has picked up considerable baggage in that time. It needs to take the time to do the same cleanup and standardization.
I was describing a workflow that works for me to someone who didn't seem to have found an effective Python workflow in hopes that it can work for them too. I work across a variety of languages and none that I've worked with doesn't have some issue that I can't complain about[1]. I personally don't find Python all that painful to work with (and I've been working with it since 1.5.2), but I understand my experience is not universal.
[1] If it's not the language, it's the dependency manager. If it's not the dependency manager, it's the error handing. If it's not the error handling, it's the build process. If it's not the build process, it's the community. If not the community, the tooling. Etc. I have some languages I like more and some less. Mostly it comes down to taste. I'm not here to apologize for or defend Python. I'm only here to describe how I use it effectively, and to correct what I thought were inaccuracies with respect to removing the GIL.
I use direnv because I work with many languages and repos and I don't want each language's version manager linked into my shell's profile. As well, direnv lets me control things besides the language version. Finally, direnv means I don't have to explicitly run any commands to set things up. I just cd to a directory.
FWIW, I don't think it's nice that rustup fetches and installs new versions without prompting, but I suppose that other users like it or get used to it. Fortunately most Rust projects work on any recently stable version.
> rustup fetches and installs new versions without prompting
I don't think it's true. rustup installs new version only when you run `rustup update`. What parent is talking about is pinning a particular rustc version in Cargo.toml, which allows rustup to download that version of rustc to build that particular project/crate.
rustup will automatically download that version when you interact with that project, though, and that's what I mean. It doesn't sit right with me, comes as a surprise, but I guess it's not the biggest issue in the world.
Node do allow you to declare what node version supported in your package.json. The definition is there, but there isn't any tool that read the declaration and switch to it accordingly. I feel it is somewhat half-assed. But is could also caused by the fact the entity that distribute the package (npm) and node binaries (various of linux repository) isn't the same group of people. So there isn't really anyone can do anything about it unless we get something like corepack someday. (probably someone should name it 'corenode' ?)
Isn't this all handled by pip typically? Even though most models don't necessarily put it in the readme, the user should be using some sort of env manager.
Having interpreters and packages strewn across the machine is a nightmare. The lack of standard tooling has created a lawlessly dangerous wild west. There are no maps, no guardrails, and you have to beware of the hidden snakes. It goes against the zen of python.
As a counter example, Rust packs everything in hermetically from the start. Python4 [1] could use this as inspiration. Cargo is what package and version management should be, and other languages should adopt its lessons.
[1] Let's make a clean break from Python3 even if we don't need a new version right now.