As someone who runs a bunch of private repos on Github, and hires freelancers and vendors to work on them, I like the way Github manages forks of private repos. When I end a relationship, and remove that person from my private repo, I want their fork of my code to go away too.
Why? Because it's cleaner. It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts. I set up a private repo because I want to control access to my code. That's why a private repo is private.
It also makes it harder for people to misrepresent their relationship to me later on. Sure, anyone with read access to one of my repos can pull it down and save a local copy. But that is different from logging into Github and having a fork of my repo. A Github fork looks and works differently from a local copy.
This whole situation seems weird because the idea of a "private" repo is not inherent to how git was intended to work; it's something Github invented to make money. So I'm not surprised that it violates people's expectations sometimes.
That makes no sense to me. You're relying on people not making their forks out of github. If that strategy works at all, it's incidental and I wouldn't rely on it for anything that matters.
> You're relying on people not making their forks out of github.
I'm not. Did you read my 3rd paragraph?
This is not a security strategy, it is a strategy for managing relationships--which is the purpose of using Github in the first place. There are far more secure ways to manage a git repo than Github, if that is the goal.
Not really. That means that you've worked with benevolent actors up until now. It's not that hard to keep a clone of the repo and upload later after the termination of the business rapport, either as a private or a public one.
The people who have access to my private repos may or may not be benevolent, but they are certainly under contract, and that's the relationship that Github is helping me manage.
The contract says that the codebase is mine, and the contractor doesn't have rights to use it for their own purpose. The Github private repo rules just make that real in a practical sense. It means the default way of working comports with what the contract says. It makes it easy for everyone to do the right thing, and harder to do the wrong thing.
Sure, a rogue contractor could re-upload my codebase to a new repo in their account later, but that is a pretty obvious violation of the contract. If I found out, I'd feel pretty confident suing them (if it came to that). There's nothing they can point to that says I authorized them to do that.
In contrast, if the default was to leave an official fork of my code in their Github account forever, then it becomes easier for them to do the wrong thing, and harder for me to say I didn't want them to do it. Some future employee could even do the wrong thing accidentally.
To reiterate, Github private repos help me manage my relationships. They don't enforce the terms of my relationships.
You know how git work right? It is decentralised! I don't see any added value! When I work on a project I always have the master, develop and feature branche(s) pulled from the repo on my machine. I could just create a new private repo or even a public one and push my local branches.
But unless you are giving your contractors a locked-down internet-disconnected laptop and physically locking them in an empty room, you need to trust that they aren't going to steal your code like that anyway.
Certainly. I---and some others, seemingly---was just saying that what Github does is not that much of an help (which is what seems to be meant at the thread root). The keyword is contractor here, as it should be the contract that protects you from theft &c. Though I'm not much knowledgeable about business, so maybe I'm failing to see how that's useful.
Shifting goalposts here- thread-parent said "I want their fork of my code to go away too," not "I want their fork of my code to no longer say 'forked from' on it".
If removing the relationship tag was the only purpose of this, GitHub could just remove the relationship tag, not delete the whole repository (throwing out the baby with the bathwater).
This doesn't look like shifting goalposts to me — it looks like you misunderstood where the goalposts were to begin with. Read the original comment charitably rather than with an eye for holes and I think you'll see that it wasn't about persistent malicious actors, just about what they see as sane defaults for people's relationship with your codebase. Once you're no longer associated with their private codebase, they think it makes sense for you to no longer have an official copy of it in your Github account.
> Shifting goalposts here- thread-parent said "I want their fork of my code to go away too," not "I want their fork of my code to no longer say 'forked from' on it".
in this context, those mean the same thing. if you copy the code it's not a "fork" in github terms.
This is not what people expect, so that's one obvious reason it's very likely the wrong decision. I would have never known this. If you are relying on a public external product on github, you have to go several painful steps to make it 'safe', you have to re-upload it, and there's no good reason to make people do it. It's a giant disincentive to fork things on github. To be safe of finding things later, there will probably be another git repo website that will agree to "host" your forked projects safely, so that you can access them.
I understand on the other side, there's value in being able to stop people who forked your code. But it's a very weak stop-ability, because they could just keep the code locally and re-upload.
My understanding from reading that was that even on a "public" project the owner can delete it, change it's status, and it disappears from availability.
Fortunately not. As noted in the article[1] linked in the OP:
> Deleting a public repository
> When you delete a public repository, one of the existing public forks is chosen to be the new parent repository. All other repositories are forked off of this new parent and subsequent pull requests go to this new parent.
> Changing a public repository to a private repository
> If a public repository is made private, its public forks are split off into a new network. As with deleting a public repository, one of the existing public forks is chosen to be the new parent repository and all other repositories are forked off of this new parent. Subsequent pull requests go to this new parent.
Imagine you run a website, foobar.com. You have foobar.com github org and foobar private repo (and also some public ones). Its the canonical org/repo. ninja123 has a github fork of it and it relays that from the github ui. "forked from foobar/foobar.com". rockstar123 also has a "fork" but it's just code sitting out there at rockstar123/foobar.com. There is no link to the canonical repo. If you fire ninja123 the link is severed. It just allows a way to show active, approved engagement and also to keep track of approved collaborators.
What if rockstar123 gets arrested for ICO fraud. Do you want others to see he has an official upstream link to your repo even after you have severed the relationship?
What if rockstar123 makes his fork public? It's just messy to keep the link.
I do not want to believe Github allows to make a privately-forked repository public. Other than that, I agree that the link is better removed, but what I don't understand is how that helps with keeping the code itself private, which is what I thought was meant at the root of the thread.
Op is not claiming it keeps it private. It's just used as a tool to signal and track approved, active collaborators to the official upstream. That's beneficial to control.
I think you're asking about the part after the comma, so here is the part from the comment that started this thread that makes me think so:
> Why? Because it's cleaner. It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts. I set up a private repo because I want to control access to my code. That's why a private repo is private.
You have foobar private repo
bad actor makes foobar public repo
edits readme says public edition of foobar repo.
People says ooh that's the one I want!
The op doesn't want the code to be on github and his problem was about his code still being on someone elses accoint as a private fork.
It sounded more like "if they get hacked my code isn't there"
Being legit doesn't seem to have any meaning for him or the author of the article. Also the code is the only thing needed for legitimacy. The iphone bootloader code in some repo is still the legit iphone bootloader (as an example)
> That makes perfect sense to me; the fork ON github is what lends the fork legitimacy versus "some dude found code."
No, he made two points. The first one is that this way you don't have your code in random freelancer's accounts, which I'm saying is wrong: a freelancer can take his code, and upload it back to his account.
So if this mechanism makes sense to you as a way of guaranteeing that your code won't be in other people's accounts, you're mistaken too.
In fact if you don't believe me give me access to your private repos and lets see what happens.
You're misunderstanding the intent I think. This isn't a guarantee that the code is kept private. It's a guarantee that the other copies don't have a Github "fork" relationship to the true repo.
Nobody is disputing that access to the private repo allows for a copy to be kept. But it's an "island", not a fork.
> So if this mechanism makes sense to you as a way of guaranteeing that your code won't be in other people's accounts, you're mistaken too.
Come on, no one is arguing that, and I'm sure you know that.
He said it's "cleaner", not more secure. Yes anybody can re-upload his code to a random github account, but then it's easier to see it's stolen code in contrast of having a github fork of the private repository.
While yes, you can simply create a new repo on GitHub with the local copy and name it similarly, it will not be linked to the original as a fork on GitHub.
But nothing is stopping them from cloning and pushing up to BitBucket or a private git hosting service. You merely have the appearance of cleanliness with none of the security.
The only way to really protect yourself is with a contract that stipulates the repo and any clones a contractor may have access to must be destroyed upon termination of the contract.
This has nothing to do with security. Would you upload secrets into a private Github repo? I wouldn't. Private repos are not more secure than public repos.
Github is a tool for collaboration. The point of a private repo is to manage who you are collaborating with. Once I don't want to collaborate with someone anymore, there is no reason for them to have a Github fork of my code anymore. As I said, it's cleaner.
I'm fully aware that they can save a local copy (did you read my 3rd paragraph above?), and of the importance of contracts.
Access control is 100% security. If you have a private GitHub repo, you have uploaded secrets. All that code is your secret. You pay GitHub to protect all the copies they know about, but you are fooling yourself if you rely on that after you've given access to that private repo away. You fundamentally can't revoke knowledge, just future access.
No, secrets are pieces of information that you could use to compromise my production environment--like passwords or private keys. Those should never be uploaded to Github, period.
Code doesn’t necessarily have to be confidential. “Secret recipe” algorithms, maybe.
It’s about trade-offs. If you must have “trade secrets” amongst your code, perhaps you can keep that code available only to whom you trust will follow your wishes with it, and have separate repos to share wherein you do not care as much about collaborators “stealing” the code.
As soon as you let someone into your codebase, almost anything can happen, right?
There are cases where it would make sense to sacrifice risks related to code-copying in order to have faster/better development.
Also, what’s stopping anyone working at ANY company from taking the company code and doing whatever they want?
"Nothing"? Obviously that's untrue. There are many things stopping them. Professionalism, for example.
There's "nothing" stopping someone at a job from printing every single one of their corporate emails and putting them into a binder to save forever, even after they've left the position. Nothing except professionalism, a sense of business ethics, the law, and the fact that it's a huge hassle.
This seems like a problem that's not a problem. If you are working in a private repo then there should be some formalized arrangement on who owns the code and who doesn't, and that would be the originator of the private repo. If, for some reason, that relationship is less clear, then as you say other contributors can just create their own clones of the repo elsewhere. If the obvious relationship is one of collaborative ownership, then a public repo is the obvious choice.
That's naive wishful thinking at best. The cons far outweighs the pros on this feature.
They're one button away from cloning the entire repo out of Github and bypassing all these fancy access control mechanisms without you even knowing about it.
This is one feature that causes a lot more pain than actual utility.
It doesn't cause pain for my contractors because a) they know how Github private repos work, and b) they don't want a copy of my code in their Github account once our relationship has ended. Killing forks upon removal makes life easier for everyone.
I don't have any particular issue with the way that Github do this, but you're exaggerating the effect of the github hosted fork. Somebody using a private fork to misrepresent their relationship to you seems like a stretch, as all it shows is that you gave them access to your repo, which you in fact did do. Also, not only "could" they clone a local copy, they almost certainly have done so as that's how 99% of work flows go.
Also, git does inherently allow for private repos, otherwise it would have been licenced differently, and it inherently allows for access control (ie repos don't have to be readable without authentication).
Like I said though, github's stance of disabling a private fork of a private repo when the permission is revoked doesn't seem unreasonable to me. I think the key thought missed by the OP was that they were both private repos. Also, they could deal with it differently when it's non-payment that caused the shutdown rather than a deletion/revoked permission
The explicit problem here though is not that the friend disabled the repo and all forks of the private repo got disabled, it't that github disabled the friend's private repo and this caused forks to be disabled.
First one understandable, I fork a private repo I can't expect it to stay forked if for some reason person who owns it doesn't want me to.
> But that is different from logging into Github and having a fork of my repo.
How is it really different? When you push it back up to github it is exactly as if you'd have forked it. The only difference is that it isn't marked as a fork and in this respect not shown when you look at the original repo graph.
It's nice that it works for your usecase, but since it's not a real protection against anything and only looks like a safety measure against abondend accounts that still have a copy of your code, it shouldn't be a feature that's on by default. (Imho)
> The only difference is that it isn't marked as a fork and in this respect not shown when you look at the original repo graph.
Thats the whole point. Of course you can't stop someone from pushing a repo anywhere they want. But imagine you run a website, foobar.com. You have foobar.com github org and foobar private repo. Its the canonical repo. ninja123 has a github fork of it and it relays that from the github ui. "forked from foobar/foobar.com". rockstar123 also has a "fork" but it's just code sitting out there at rockstar123/foobar.com. There is no link to the canonical repo. If you fire rockstar123 the link is severed. It just allows a way to show active, approved engagement and also to keep track of approved collaborators.
What if rockstar123 gets arrested for ICO fraud. Do you want others to see he has an official upstream link to your repo even after you have severed the relationship?
I don't think the existence of a "github fork" shows any active approval. The default behavior for repos is to allow anyone to fork them without requesting approval.
I think a lot of people don't understand your "I want no link on GitHub" argument and try to explain that you can't keep the code.
I'm coming from a different angle. I read the article as "wasn't able to access his fork anymore". If that is true, I think that is bullshit.
Severing the ties to the private repo? Fine, I get the (your?) point. But what I forked should still be there. Previously as "forked from...", now without the link?
While git is a decentralized version control system, I would argue that using Github is not really decentralized. Even when everyone has a full local copy to work on, the norm is for everyone to set the Github repo as the remote origin and push/pull from there, rather than from each other.
This is in contrast to how the Linux kernel development works, for instance.
Decentralized doesn't necessarily mean peer to peer. In the case of git, it simply means users work with a local repository that can easily be synchronized with another repository. A team contributing to the kernel could very well use github to do so, change their mind and switch their remote to bitbucket and continue the work there. If bitbucket exploded, they could set up their own remote, push any of the local copies and use that. If their internet connections exploded they could print patches and mail them to each other. Then when they're back online and done with their change, they can create a patch file and mail that to Torvalds.
But that's all besides the point. The point is that if you don't want your code laying around where you have no control over it and don't trust your contract and the law to be enough of a deterrent, you shouldn't give people copies of it and probably reconsider what kind of contractor you're willing to work with, rather than loading the term "fork" with an additional meaning.
I mean, you could even do git peer-to-peer via ssh. Each person sets up sshd and a bare git repository, add it to their remote and now they can push to each others "ssh" repositories and then fetch from it when wanted.
Sure, or with patches broadcasted on radio, scribbled on napkins etc.
I don't think you intended to counter my argument but in case anyone is interested in elaboration: my point about centralized/decentralize is that using github doesn't necessarily centralize your work. It's only central in the sense that any other remote repository is: if you can't access it you can't. If github dies, copies of the repository may still exist on a million other computers and any of those can be used as a basis for further collaboration by a wide variety of means. From a peer to peer perspective, github is just a peer like any other.
An other solution would be to create an org, give access to people you want as collaborators and forbid forking. They can work in branches in the main repository, and you can use protected branches to alleviate the risk of incorrect pushes.
But there is no github.com upstream relationship. Sure they can upload it, but it will be obvious it's rogue copy since there is no "forked from.." messaging to the canonical repo.
The author states that he didnt bother with support (luckily he had a local copy) so here are my 2 cents
I've been in the exact same spot under very similar circumstances. What I can add is that the github support was great, we got it resolved within the working day.
The repo was "deleted" to free up the private repos of my colleague but we didnt know that the fork on my account would go down with it. Its been probably a couple months till we got back to it and noticed the issue. They managed to recover the repo and unblock my fork. I just did a fresh repo and moved everything over.
While it is odd behavior, if its really bad they will probably manage to get you back on your feet.
Also Im not affiliated with github in any way just have some repos from student days over there. If it matters im on gitlab and google repos
In their business-continuity backups, you'd have to assume yes.
However, I would certainly be surprised if a Github support staff member went and fished out the off-site backup and mounted it to recover one missing repo. You'd hope that would require quite a few staff members to orchestrate.
But in general, yeah. Most engineers don't think about deletion very hard when they start designing a system, and it's also convenient (although with GDPR possibly soon illegal, situation dependent) for auditing and research to merely flag things as deleted and not show them to the customer. As a result, true data deletion generally gets justified away as a bit inconvenient and not really desirable anyway, and almost no software company really deletes anything on demand. Best you can hope for is semi-annual data purges to reclaim disk space.
If this bothers you, consider joining us in the world of self-hosting your services. Gitlab is in my opinion considerably better than Github anyway. Certainly it's worth spinning up a docker container and taking a peek around.
I say this every chance I get, but Github can and will hold your code ransom if your premium account lapses while you still have private repos. They do not give you the option to set those repos to public, they require you to pay money to regain access.
Maybe you can have them switch the repos to public if you go through their support, but Github doesn't offer that in their ransom note, and that would be an unacceptable solution, anyway.
Whether or not this is a risk to a given developer (or their company) is irrelevant to me, because they still have policy that allows them to hold code hostage for ransom, and that should make Github a complete non-starter when deciding on an SCM host.
Generally you are correct: service is paid for -> stop paying -> stop service. However, Github has a free tier for public repositories. Why can one not convert the repository to public then?
I like how Dropbox handles this. If your paid membership is cancelled and you are using more space than the free tier, they don't delete your files, just make them read-only. You can view or download them, but if you want to upload or update something, you have to buy premium or bring your space use within free tier limitations.
People may put all manner of things in private repos that they don't want to be made public, so github shouldn't just expose them to the outside world.
That doesn't mean they couldn't just offer a button to make them public after the fact, I don't think anyone was suggesting to make them public automatically...
Nope, that is exactly what it is. There is no reason not to give read-only or export-only access to private repos if your account is locked. Otherwise just delete the whole thing.
If anyone has ever cloned the project, Github cannot hold your code to ransom because you already have a copy of it and the whole history including commit messages. That is the beauty of git.
Github do have your issues and un-merged pull requests "to ransom", though. Make sure to back those up if they are important.
Pay your bills. I cannot take the idea that they're holding anything "hostage" when the whole thing is that you're not paying your bill. If you don't pay your car bill, they'll take it back.
You do not know other peoples' situations. And strictly speaking, demanding money to access your data is ransom.
> it would be better than automatically making private repo's public when you stop paying
False equivalency. I already said that "disabling repo and allowing primary user to download" was acceptable. I said nothing about private->public.
This whole situation only reaffirms my distrust in all cloud services.
EDIT: Seriously, -1 ? The user's data is sacrosanct. You disable most functionality, but you never, I repeat never delete data or make it irretrievable within a envelope of time for recovery. The only exception to that is if the user explicitly requests a permanent deletion - then you do so after appropriate warnings.
You're being downvoted for they ridiculous hyperbole of using the terms "ransom" and "hostage" for the situation, when it's just that you didn't pay your bill, so you don't get access to the service. Any web host would do the same, as would any other business. You don't pay your bill, you don't get the service. Why is that so hard to comprehend?
I think that's because the service is hosting the data, not the data itself. Holding onto the data which you (expect to) own is pretty much akin to a person taking care of a pet for money and when the money is not paid, keeps the cat. It's the obvious solution - the cat-sitter doesn't have any other leverage here - but Github does have other options than holding onto the cat.
Has anyone created a service that you auth in through GitHub and it backs up all your repos to S3 or Dropbox or something? I'd pay a few bucks for something like that.
Edit: I found backhub.co. Unfortunately their pricing is per-repo and my usage model involves lots of tiny repos.
Right. The service is certainly expected to do more. Was mostly addressing the issue of mirroring the data to dropbox. But even then my "solution" only saves whatever you do locally. So if you merge PRs through github, it wouldn't get mirrored.
If you have self-deployed Gogs (which is a smallish OSS GitHub clone) you can easily set it to mirror other repositories periodically. Very useful for backups.
Also simple to self-deploy since it's essentially a single service written in Go.
>Has anyone created a service that you auth in through GitHub and it backs up all your repos to S3 or Dropbox or something? I'd pay a few bucks for something like that.
So you'd pay money for a tool that copies your data, but not simply pay the money to access said data?
I am beginner so bear with me. I learnt git 2 years back and always end up using the same set of commands and never anything new which lets me work very smoothly on projects. I read a lot of blogs to understand how git works and how useful git reflog is. I was/am often confused about what is offered solely by git and github as these two are two different things. Is fork/PR/Issues native to git? I still don't have an answer.
Just to be clear we can create a branch on github as well as on git though I am not forced to do so on github. This maybe confusing for a beginner as both provide branching.
Thanks. This answers it. It looks like whenever git is introduced to someone is always through github. I think this is why most people assume a different picture of git than what it actually is. I think this demarcation between git and github must be clearly stated.
None of those things are native to git. The command line git tool’s help options open up the git manual. It is pretty comprehensive; so if the topic is missing, it’s probably a github feature. (Also, things like ‘git checkout --help’ open up subsections of the manual)
"GitHub PRs/issues are native to GitHub". If you switch to Gerrit, the code commits themselves follow, but the PRs and issues don't. They're only on GitHub.
I wouldn't point to gerrit as an example of self-hosting git. Gerrit's "patchsets" and "change-id"s concepts, as well as the weirdo remote push refs, really turns the git experience into something... very different. It's very far from both how "normal" git commits and tracking branches workflow work, and from github/gitlab/bitbucket's pull-request/merge-request style.
Because the gerrit workflow is extremely different from all other common git workflows (which is to say, you build a set of commits in your own branch or fork and ask the maintainer to pull it). Gerrit actively encourages you to rebase and rewrite (and lose) your development history in the pursuit of a "perfect" one-hot non-merge history patchset, which actively loses information about the context of where your changes where applied.
The problem with submitting a rebased branch is that while there are no apparent merge conflict and your changes appear to have been developed on the tip of master, your changes may actually have been developed on a much older version of master. Builds may appear to compile fine but they might not make sense - when you discard and rebase the context of commits, you lose important information.
GitHub wikis are just git repos themselves, and I clone mine. It's easier to edit them locally, too.
As for issues, I've suggested to GitHub that they make them available via git, and they said simply "I have passed your suggestion on to the team to consider adding the ability to clone Issues". I don't think they realize how nerve-wracking it is to put any data at all in GitHub issues. Even their recommended backup software is out of date and doesn't back up Issues completely.
This is typical. Most businesses don't duplicate their workflow stuff to a parallel system because database backups are sufficient.
GitHub, even with relatively recent issues and downtime, has better infrastructure and skilled personnel than most companies hosting their own instances of these services on some reclaimed box.
Even if they have great infrastructure, this article is evidence that they can still take you down at any given moment if they so much as feel like it.
I haven't checked GitHub specifically, but most of those terms include, "and we can ditch you any time for any reason, maybe with 30 days notice if you're paying"
Not to mention it's also common to include "we can change the terms at any time" - and of course, there's also the fact that this issue in specific has nothing to do with a ToS violation, but with permissions on the parent repository, by relying on them with no backups you're not only subject to their staff's whim, but to the whim of any bugs or "features" in their code.
If you want to take advantage of their infrastructure, that's fair, I understand, but at the very least, run some tooling to backup issues, pull requests, wiki pages, etc on a regular basis.
Some of the strain here is around two very different meanings of the word "fork", but only a single feature with that name on GitHub:
1) When I want to contribute to a project, make some local changes and offer them back to the mainline, I fork. My fork is a pseudo-throwaway thing; it contains code that is hopefully destined either to make it back to mainline, or likely to be abandoned if mainline moves on in active development.
2) When I want to go my own way with a project, using it as a starting point for a new direction of development that may wander in a direction quite far from the original (it might get renamed, it might continue development of the original becomes abandoned, etc.), I also fork.
Numerically meaning 1 is surely far more common, so it makes sense for the GitHub features and permissions etc., to optimize for that case.
Meaning 2 is closer to the original idea of a fork, from open source vernacular long predating GitHub.
Naming is hard, and I don't blame them for picking a single name for these two technologically similar (but socially different) scenarios.
I don’t think GitHub actually labels #2 as a fork, but traditionally that’s been the meaning.
They just needed a word to show that the repo is not original and is tied to an upstream repo.
For example, when I wanted to “fork” a project[0] that was no longer maintained and not accepting PRs, I actually had to contact GitHub to “unlink” my repo so that I my repo could be standalone and utilize all the features on GitHub (can’t remember which features off the top of my head).
I don’t know much of the historical details, but whenever I think of a fork, I think of processes. I think that GitHub’s official definition is just a simplification of what happens as a side effect.
I forked Popcorn Time when it came out, just because I was impressed with the app and I wanted to look over (and possibly modify) the code. I only forked in though, I didn't clone it locally. A week later when I had
some free time to do so, same issue, my personal fork disappeared. It was mildly aggravating, but also a soft reminder to question how much trust to put into third-parties.
I wouldn't call it a trust issue, but a due diligence one: making sure you do enough to ensure anything you care about is safe.
In this instance making sure that you understand the service you are using (including intricacies like this), or keeping an extra backup of your own (a locally cloned copy of the repo), or for proper paranoia, both of the above.
No matter how much you trust github and those in positions of responsibility for the projects you interact with, if you don't know the exact details of the feature-set you can come unstuck due to operating under false impressions. Github has done nothing wrong, as that is how the service is intended to work and it is documented as such (though it would appear some of the documentation is incorrect or at least inconsistent?), but you still get an inconvenient surprise because of a misunderstanding.
I'd argue this is barely worth the time spent. You'd use at least a hundred services of this sort, with a flux of at least one a week. 2-3 hours reading up on intricacies, setting up backups for your data? It's a crazy amount of time to spend on something you dont care about - you just want it to work.
If argue that if you care enough about the content that you'll do more than say "oh well" to yourself if it becomes inaccessible, then researching this sort of thing or (probably easier but maybe more costly in terms of bandwidth use) rigging it up to your existing backup infrastructure is the minimum effort you did make. If you don't you have no high ground from which to complain or blame others from if it going away inconveniences yourself.
If you don't care about it that much, why do you keep it anyway?
> Luckily I found an old local copy of my project, but this taught me not to rely on Github as only storage for code.
This is the distillation. If something matters to you, you should personally take responsibility for its long-term storage. Apparently for the author, finding a true duplicate was just luck. To minimize the risk, maintain multiple copies on multiple media. Github is only one medium.
The context is especially notable: this is source code and source code is (in a majority of cases) very small. Maintaining personal archives of project source is a comparatively simple and low-cost matter.
How reliant people are on Github is puzzling. If Github somehow failed one day, many businesses and OSS projects would just fail to operate. What surprises me is that software teams in companies use it too, when just sending commits/patches around is quite easy with any VCS.
Kind of like how that one NPM package took down like 14,000 other packages when someone took a DMCA request against an author's OTHER package and he just took them all down in protest--which took down their dependencies and their dependencies and ... "broke the world".
I have a close friend who past away a few months ago, and wanted to try to find some way to preserve all of his GH repos for his wife and children. I've locally cloned all of them, but the prospect that GH might delete all of his stuff if he fails to login for a certain amount of time or something is deeply concerning.
I don't suppose GH has some kind of memorial/preservation mode the way Facebook does?
There are a few blog posts of people who got GH support to release the names of inactive accounts, though it's not clear if those accounts had any repos at all.
thanks for reaching out. The individual case was resolved (found a local copy). But I find the policy worrisome for future projects. If the customer support was happy to reenable access for me, what is the point of the current policy?
That's why I've been migrating my Github/lab projects to IPFS as well. I have a few cheap VPSes I have my stuff pinned.
"Cloud" (read: someone else's server) is unreliable at best, or malicious at worst like this case. The people may not be bad actors, but the tech definitely is.
If people can get the fork back after reaching out to support@github.com, why would not GitHub resolve this issue once and for all, so that people will not need to reach out to support@github.com any more?
If Support just do this kind of thing why not put it in the UI. It's like unlinking a fork - GitHub Support just do it on asking - but why is GitHub spending unnecessary manpower when it could just be a button in the settings saying 'Unlink'.
> I haven’t tried contacting customer support, but as this appears to be official policy I would not expect a change there.
I'm sure the support would be able to do something for you there, anyway it's always good to contact them, even if they can't do anything, before saying things like this online.
I disagree. People need to be aware of this before it becomes an issue for them. Git is distributed and yes, relying on one resource is a risk in itself, however - if a fork isn't a fork it should be made clear.
yes but the point is he didn't even ask the support so his claims are just wrong. Maybe the support will always replace your fork with a copy if you just ask.
I agree the UX is bad though and should be improved
My point is he shouldn't have to ask support to let him have access to something he took a fork of. Bad UX aside, a fork allows one to think of a permanent branch that is distinct from the original. This is not the case.
Related to that: if a private repo is forked, the target account doesn't need a paying subscription. In that sense, it's the root repository holder that is paying for the new private repo.
I'm not arguing if it's a good or bad thing, it just seems consistent in that regard.
At least on Bitbucket they're pretty explicit about the rules around forks. Creating a new private repository, you get the option to select [No Forks | Allow Forks | Allow only private forks]. I don't see any kind of configuration like that on github.
That is silly either way. Once you have a local copy you can just push that copy to any git hoster and thus make it public. Forking through their web ui is just a convenience feature.
Apparently there's no github web UI to actually copy a repo so it's not listed as a fork, but of course git makes it quite possible, and github even provides directions, using git directly.
I'm imagining a browser plugin that notices when a repo you are about to fork is private, and (or even without that) adds a 'copy' button which gives you text to copy-and-paste into your terminal to make a true copy.
On the one hand, it bugs me when I see a repo that was obviously cloned/forked from another REPO and uploaded new without the github (fork) link to the original... but upon seeing this, I many have to reconsider doing that with a few repositories.
When I left a prior job, I forked and took over updates of a relatively popular open-source library under my own GH account... I'd hate to see it all disappear because someone from a former job decides to nuke the upstream repo.
In the alternative case, a company who terminates an engineer can't prevent them from accessing the code if they've forked it. This is a"problem" with git, which GitHub has tried to solve for its paying customers. Whether or not you agree... GitHub has worked like this for years.
If GitHub wants to implement DRM-like copy control, they should implement actual DRM-like copy control, and leave fork the heck alone.
Part of the problem is that instead of being explicit about how things work they're being implicit. Instead of users being told a fork is not a, you know, fork they're lied to only to have the fork vanish later.
Why is there no banner at the top of the repo warning you this is a fake repo? Why is there no ability for the original owner to delegate a full fork to the children? If this implemented this correctly you'd expect to have control over how it works (or turn it off).
I think a fork is just an internal branch on the original repositoriy for the Github Servers. From a data / sys admin point of view, it makes sense, since a hard-copy of a repository (and all the git objects with it) is just generally useless.
I would guess that it's stored and dealt with using a basic "copy on write" strategy. It's functionally a copy, which is all that usually matters. This may have been an oversight.
Yeah, I wouldn't be surprised if this was an oversight as well. I can't imagine that GitHub would delete the git blobs from their servers, even if you stop paying (harddrives are cheap!).
The fact you have to do this to make a fork, fork, is problematic. You could also download the fork, and then upload it again as a new root, but in both cases it entirely defeats the point.
A lot of people are coming up with workarounds or blindly defending Github in this thread. I think I'm the only way that sees this decision was done for commercial reasons ($), and has nothing to do with people "maintaining control of their private repo."
Frankly if fork doesn't, you know, fork due to some quirk in some random article you aren't going to read or find that to me is a massive black mark against Github. That's all I care about, Github working like Git, in most major ways.
`I think I'm the only way that sees this decision was done for commercial reasons ($)...`
Is it fair for me to expand "for commercial reasons" as:
Github is a for-profit business, and their business model is to provide a service to users and that also requires them to be able to turn that service off when customers stop paying.
`... and has nothing to do with people "maintaining control of their private repo."`
I disagree here, I think you're using a notion of "control" that is too narrow. A notion that captures the scope better would be that Github is adding value by enabling project management and collaboration. The ability to do pull requests, manage issues, etc. is all tied to forking.
That's what makes their service worth paying for, and it's also what they have to turn off if you don't want to pay.
`That's all I care about, Github working like Git, in most major ways.`
What you're saying here is reasonable enough I expect you can find providers that are pretty close to that standard, and it's a fair point that Github's business model has some sharp edges. In this case, it's a consequence of "we turn the lights off if you don't pay," and an alternative has to deal with the same bottom line.
>GitHub has the right to suspend or terminate your access to all or any part of the Website at any time, with or without cause, with or without notice, effective immediately. GitHub reserves the right to refuse service to anyone for any reason at any time.
Companies write these "We are can do whatever" clauses to license agreements so that they can point to them and make people go away.
IANAL, but EULA terms like are typically in breach with Consumer Law protections against unfair
contract terms. If you are private person (consumer) paying for a service, this kind of term may be
null and void.
So it sounds like active links between forks is intended to mirror the valid links between people/organizations that are doing business with one another. If people or organizations end their relationship, then there's a way to cutoff any secondary contributors to the codebase to have the ability to make pull requests from -and- receive code updates.
At first blush, I thought, "GitHub is mean and just locking-in their paying customers forever!!" But now after reading the comments here, I think this is a very pretty way to represent the nuances of a fork relationship.
If the original owners of a private codebase disappear, then it is reasonable and just that all of the forks to their code should as well. I believe it's reasonable and just because GitHub have no way to interpret _why_ the original owner dropped off and must respect their ownership by breaking the relationship.
You might use gitlab to clone your github repos. Web interface to clone them is super easy (obviously you can do it from the console also), there is no limit to private repos (their business model seems to be charging
for CI time and selling on-premise licenses).
I'm not affiliated with either hosting services and I use both :)
Don't rely on GitHub being your only source location. Git is a DVCS, meaning you don't have to have a single upstream. Make sure you are pushing / mirroring to a third party (GitLab, Bitbucket, personal Git Server, etc)
You could have a cron job automatically sync repos, but right now I have my own local copies and I use GitLab's "Repository mirroring" [1] which has worked really well for me!
This really sucks... I know companies that made forks of public (opensource) repo's just to have a "backup" in case the person or organisation decides to take down the repo. This should really be fixed as I don't see ANY benefit in the current way github handles this now. People talking about giving contractor's or other third party people access to later revoke them (after they are done) thinking they wont have a copy, are just plain simple wrong. Git is decentralised and the contractor/dev will always have his code on his machine unless HE decides to delete it.
The terminology is a GitHub-and-friends thing, but the idea is at the core of git. When you clone a repository, you're creating a "fork" of the repo on your own computer. Git is a DVCS (distributed VCS), where you can make many copies/forks of a repository and freely push/pull commits between them. (As opposed to something like SVN, where the repository lives on a single server, and all commits go directly to that server without a separate "push" step.)
The terminology is at least a couple decades older than GitHub has even existed, and was even mentioned somewhere by Linus in the creation of Git.
Fork used to have a very negative connotation, and that was related to the difficulty required to pull one off. When you forked a project, you had to set up your own hosting, web space, issue tracker, version control system (and you probably couldn't clone and start off the upstream's VCS), and so forth. The technical barrier made it so that you only forked a project when you have severe disagreements with upstream; some of the most famous in history include EGCS from GCC, and XEmacs from Emacs.
With DVCSes, including Git, much of that technical difficulty is automatically removed. Every clone can potentially be a fork (as in, a new independent project), and this was one of Linus's intentions. If forking is easy, it keeps upstream on their toes. After being adopted by DVCS hosting sites like GitHub, it turned around to being a positive term even.
I observe that people "fork" repositories (which is basically a cheap clone rather than a fork, bad terminology there) not for having a copy of their own to work on, but more like a glorified "like/+1/star". So even with public repos, it seems quite acceptable that Github tries to deduplicate aggressively.
This is also simply to their per private repo pricing model. You can fork private repos with a free account.
I'd just ask github support about this particular issue and I'm pretty sure they'll can either make the repo open or at least make the code available to you somehow.
This is no different than Google Docs and many other similar services. If you share a Google Doc with someone it will show up in their list of Docs. But it is just a reference to "their" doc. If they delete the doc there's no doc to reference and your "apparent" copy of the doc (which was really just a reference) disappears.
If you really wanted your own personal copy you need to make a duplicate. That duplicate will be yours not a reference to theirs.
It was strange the first time I ran into it. I deleted an account. I had shared the documents with another account of mine. Then I noticed when I deleted the first account I lost access to those docs. Now I know I hadn't copied the docs I'd only had references to them. Next time I'll know to make copies if I actually want copies and not just access to another account's shared docs.
This seems to be an edge case, affecting only private repositories. I can see how this can make sense in that context. Can we do without the pitchforks? (pun intended)
It still doesn't make sense in that use case. He could have cloned it to his local machine and still had access to the code. Had he done that he could have pushed it back to github as a completely new repository thereby creating a legitimate fork. It's a silly requirement from github.
Did you contact GitHub regarding this and ask them, before bashing them on social media? What was their response? They'll probably gladly remedy the situation.
> Luckily I found an old local copy of my project [...] I haven’t tried contacting customer support, but as this appears to be official policy I would not expect a change there.
The policy itself needs to be changed, the individual case is resolved.
Or the policy was misinterpreted. The piece of text that got quoted is present in every single EULA ever and does not at all indicate if this is actually intentional by GitHub.
Besides, you can request to disconnect your fork from an upstream project. I expect such a request will also resolve this problem.
Git is a distributed version control system and Github is not in a privileged position. Don't treat it like it is. Github is a convenience. Nothing more.
"Forks" are really just branches and branches are really just references. Forking just adds a few more references to the same repository.
And this is why I dont have a github account. One day, all those foolish users will be left hanging, crying for the dust which has replaced their code and CI. The deserve worse; they have enabled this central organization to subvert the heart of OSS. May github go down forever, along with google translate and all those other projects which have crippled our technical progress.
There's a little more nuance to it than that. While git users generally keep a local copy of a repo, collaborating with others is pretty much always done through a central server. Git users could theoretically collaborate without a central server, but it currently a far too manual process to be considered practical. There were a couple of attempts to implement decentralized collaboration in git under the name gittorrent, but those didn't go anywhere. Failure is understandable in this space, decentralized collaboration is a _hard_ problem and the tools to do it are still in their infancy. IPFS is probably the furthest along at this point and they're still a long ways away from having something production ready.
Sure, but it's decentralized in a sense that there's nothing special about any of the clones. If developers lose their shared clone of the repository, they can re-create it easily by seting up another shared clone that anyone can access and push there from their local repos. Nothing will be lost.
"Central" server is not special in any way, other than being readily accessible by everyone in the team.
Some services break this by piling other stuff on top of git that's not distributed (like issues, ci, etc.), but that's not a problem of git.
But in most people's workflows there is something special about the central server: it is the source of truth about the state of public branches. That is why losing the central server is so disruptive. It means you have to re-establish the source of truth somewhere else and get everyone to transition over to that new source.
With a truly decentralized system the source of truth would be content addressable. For example, it could be obtained via a public key rather than a server's domain name.
I should have completed my question in my earlier post. Specifically, would doing that instead of creating a fork through Github's interface prevent one from losing access to their project when the previous copy is deleted from Github?
For instance, you'd be able to create private forks in an organization without paying for the Github organization, then keep this private fork once the original membership is canceled.
Removing access to the fork in that instance would make sense, but that's not what happened here. The author didn't fork in an organisation, he forked to his own personal Premium account. Which he continues to pay for.
OP states that he still has unlimited private repos (presumably on a paid account) so it wouldn't be cheating the private repo limit in this case. GitHub could simply count a forked private repo against the account's private repo limit if the upstream repo is disabled to avoid the type of cheating I'm thinking of.
I haven't paid for private repos (at least not for a few years) and yet I'm a member of several ... and I can fork them as needed without paying. In my case, I would expect that I would lose access to my fork if the private repo was no longer a paid account.
Right, but I'm saying GitHub could allow you to keep accessing those cloned repos if the upstream repo was disabled as long as you switch to a paid plan. I.e. they could treat it the same way they treat your own private repos when you go from a paid to unpaid plan.
Why? Because it's cleaner. It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts. I set up a private repo because I want to control access to my code. That's why a private repo is private.
It also makes it harder for people to misrepresent their relationship to me later on. Sure, anyone with read access to one of my repos can pull it down and save a local copy. But that is different from logging into Github and having a fork of my repo. A Github fork looks and works differently from a local copy.
This whole situation seems weird because the idea of a "private" repo is not inherent to how git was intended to work; it's something Github invented to make money. So I'm not surprised that it violates people's expectations sometimes.