formatted solderpunk essay

This commit is contained in:
techpriest 2024-02-08 22:23:10 -06:00
parent e7ff3a5524
commit adba064366

View file

@ -1,90 +1,50 @@
Low budget P2P content distribution with git
archived: 1-24-24 from: https://portal.mozz.us/gemini/zaibatsu.circumlunar.space/~solderpunk/gemlog/low-budget-p2p-content-distribution-with-git.gmi
In recent months I've spent a lot less time than is typical thinking about anything to do with computers and the internet, but there is one train of thought I've been repeatedly pondering. I had hoped to write up a bunch of less technical stuff first (don't worry, that's still coming - I'm kind of disappointed in myself that I've lapsed into writing a massive computery, internety post so soon after coming back to writing here. Bad Solderpunk! In penance I'm going not going to write any more for at least a month - stay tuned for cycling, environmentalism and manga, though), but it seems like this technical idea has become just a little topical just recently, so perhaps now is actually a good time to get this idea out there. Let me be very clear from the outset that I'm just idea-sketching out loud here. This isn't a new project, or anything, I'm not giving the system I'm about to describe a name or committing to fleshing out the details or anything like that. That's not to say nothing will ever come of this, I just want to make it clear from the outset that these ideas are half-baked at best and I'm absolutely not committed to jumping head first into wherever this train of thought leads...
Protocols like Gemini and Gopher are an effective salve against many of the miseries inflicted by the modern web, but by no means do they solve *all* the web's problems. All three systems share the same big picture architecture, namely that the default pattern of usage is that content lives in exactly one place, a server which is online 24/7, 365 days a year and accessible from anywhere on Earth, and that to consume this content you request a copy of it at the instant of consumption, render it to the screen and then discard it (perhaps after a relatively brief cache lifetime), leaving no persistent copy, with the understanding that if you want to read something again next week or month or year you'll just request a fresh copy and do all this again. Because all three protocols work this way, all three of them share a long list of common shortcomings, mostly about losing access to stuff you'd like to still have access to. Online content can become inaccessible to *you* in the short term if your internet connection goes down. It can inaccessible to *anybody* in the short term if the server goes down. It can become inaccessible to large groups of people in the *long term* due to the ease with which authoritarian governments can block access to a single server. It can become inaccessible to *everybody* *forever* if the hosting service disappears (think Geocities), or if the person running a private server dies or is incapacitated and none of their friends or family know which bills to pay to keep the thing up. These problems can be mitigated to some extent via load sharing, content delivery networks, caching proxies, etc. All these solutions involve setting up yet *more* computers which are switched on and connected to the net 24/7, which is expensive both financially and environmentally. On a long enough timeline, the survival rate for all websites drops to zero: find some mailing list archives from the late 90s or early 00s and try visiting all the URLs people shared in it. More than 90% of them won't work. 20 or 25 years is not an awfully long time span for this kind of decay to happen in.
None of these observations are new or exciting, and there are no shortage of projects attempting to address various of these shortcomings in various domains. You've maybe heard of DAT and IPFS and SSB, and those are just the Johnny-come-latelies to this sphere. Freenet has been around for over 20 years, and I don't doubt that it has predecessors of its own. What all of these projects have in common is conceptual complexity. They're distributed, decentralised, peer-to-peer, content-addressed, cryptographically authenticated, and more. This isn't intended as a criticism. These projects have a much higher ratio of essential complexity to "empty complexity" than something like a modern web browser, because they're trying to solve substantially more difficult problems, making some conceptual complexity is unavoidable. But all of the projects above and their associated ideas have met with fairly limited implementation by developers and fairly limited uptake by users, and I think the high barrier to entry represented by a lot of conceptual complexity, even if it is essential, is probably a large part of the reason for this (that and a healthy serving of apathy, no doubt). I'm not trying to say that the search for clever solutions to these problems is futile, not at all. I'm just laying out what seem to me to be the facts.
Completely solving the problems associated with an always-online, purely client-server web is never going to be easy. The wait for something which works well enough and is user friendly enough to facilitate serious uptake is going to be a long (though hopefully worthwhile) one. In the meantime, it's tempting to wonder whether or not there is some kind of "80:20" solution to these problems which gets at least some of us at least some of the way there - enough of the way to be worthwhile - without a huge learning curve. Lately I've been thinking that maybe there is, and that maybe it's actually not even all that hard. In fact it's so incredible simple that I'm almost embarrassing to say it out loud, out of fear that if it were *that* simple then people would *obviously* already be doing, so clearly I've missed something big due to not being smart enough. Or maybe some people well of my radar *are* doing this, and that's what I've missed. Anyway, are you ready for this huge idea? Here it is.
Use git.
No, really, just use git. Not the way you're possibly using it already (like I am), as a kind of deployment mechanism, where you write your posts locally, commit them to a repo, then push to a remote copy of that repo only you have access t only you have access to, triggering a hook which checks out a copy of your work in whatever directory your web/Gopher/Gemini server looks in (although, if you're doing that, switching to using it the way I'm talking about is a piece of cake). I'm talking about using git for small internet content the way people use it for source code, as an actual distribution mechanism for ending up with a local copy of something on our disk that you then use offline (by compiling it, interpreting it, etc). I'm talking about your text-centric online content being nothing more than a public git repository. If somebody wants to read your posts, they clone your repo. Then they've got your posts on their disk, and they can read them from there. If they go offline, it doesn't matter, because your stuff is on their disk. They can read it today, and tomorrow, and next year. If your server goes offline, it doesn't matter, because your stuff is on their disk. If they like your stuff and want to read more of it, then next time both they and you are online, they are one `git pull` away from getting any updates you've made since their original clone. There is no need for Atom, or RSS, or carefully formatted index pages with datestamps integrated into link text. When distributing by git, visiting a site and subscribing to a site are one and the same thing. No extra technological concessions to the notion of "subscribability" are needed. Furthermore, when distributing by git, visiting a site and making a complete offline archive of the site are one and the same thing. There's no need for slow, clumsy, error-prone and admin-irritating loops of repeatedly fetching and parsing files using tools like wget to discover the URL of every single resource in a site. You just grab the whole thing at once in a single network transaction, no parsing required. Git is actually better than Atom/RSS and recursive wget combined! An Atom or RSS feed usually only has the 10 or 20 most recent updates in it, so if you're offline for a long time you'll miss some stuff. Git won't, you'll get every commit made since your last pull. And a recursive wget just leaves you with an offline copy of an entire site as it was in one point in time. There's no way get *just* the new stuff one month later - sure, with HTTP(S) you can use headers like If-Modified-Since to avoid fetching new copies of stuff that's changed, but you still need to make a request for every single page which *could* have changed. With git you just pull and that's it.
I've barely scratched the surface here. I'm going to keep going, but first let's really quickly think about this from a network privacy point of view. Cloning or pulling a git repo involves making network connections to *one* server, known in advance, and has no side effects. There are no cookies or anything cookie-esque to tie subsequent requests together at a more fine-grained or persistent level than the IP address. This is much better than the web, and exactly on par with Gemini and Gopher. If you want to, you can do git stuff over HTTPS or SSH, and that's normal and standard, so in this respect we're better than Gopher where plaintext is the only option. But if you don't want to use crypto, or your computer can't handle it, or you're using some futuristic internet overlay like Yggdrasil so you get transport security without baking it into every protocol, you can do a plaintext git:// clone. So for some folks this is better than Gemini, where it's TLS or bust. But the git-as-distribution-tool approach gives you something that none of the web or Gopher or Gemini give you: it's one network transaction for the *whole site*, and that's it. A git admin knows that you (or rather, your IP address) has cloned their repo and now has all their posts. But that's it. They don't know which posts you read, and which ones you don't. They don't know which posts you read once and which ones you read every day and which ones you only read in the middle of cold, lonely nights. There is nothing like a "click stream" for them to analyse. Even the boogeyman of "traffic analysis", where the size and latency of opaque encrypted transactions are used by third parties to reconstruct your path through a public site gain no traction here. Your fine-grained consumption habits are entirely invisible to everybody but you. That's really neat!
One more brief digression: I've described everything so far in network terms (and will get back to that shortly and then do it for the rest of this post). But keep in mind, please, that there is *nothing* network-centric about this idea. We're all very used to doing git clones and pulls over TCP/IP, but you can clone and pull from the filesystem just fine. Try it. Git won't bat an eyelid. That means you can clone and pull from USB sticks and SD cards, which means this whole thing works just fine over sneakernet. You don't have to go "all in" on sneakernet, you can mix and match it with networking in whatever proportion suits you, and transition slowly from using mainly one to mainly the other on an as-needed basis. I think about sneakernet a lot these days, and I think anybody else who's interested in sustainable/perma-/salvage computing ought to as well. I'll write more about this some other time. Let me just say for now that the fact that this git-for-distribution thing works seamlessly via sneakernet is a big plus for me.
Okay, back to the main thrust: by visiting/archiving/subscribing to a site via git we get even more than Atom/RSS and recursive wget combined can offer, with less effort on the part of either producer or consumer. Jake. But so far we've still talking about readers fetching content from a single authoritative source operated by authors, so we still have a lot the usual centralisation problems. This approach still puts a potentially heavy load on one authoritative server, it still requires lots of long distance data traffic, and if the author's server disappears forever *before* you got a chance to clone the repo, you're out of luck. Getting past these hurdles in a web/Gopher/Gemini context isn't easy. If I use recursive wget to get a complete local copy of some website, then in order to enable somebody else to use a recursive wget to get a complete copy from *me* (because my server is closer, or more reliable, or the original is gone) there's a lot more rigmarole involved. I'd need to setup a webserver and point it at my copy, and there's no guarantee that alone is enough. The site may not work properly without suitable URL rewriting or redirecting rules or similar configuration details in place on the server side. I'd need to reproduce those settings exactly, and the information required to do so is *not* something I'd end up with as a consequence of doing the original recursive wget. So the whole procedure kind of only works once, and can't reliably be chained, with an n-th party getting a fully functioning copy from a (n-1)-th party's copy. Even if redirects/rewrites weren't in the picture and this chaining *was* possible, there'd naturally be a big question of trust, as at any stage along the chain the site could be modified by somebody other than the original author and you'd be none the wiser. But none of these problems are there in the git version! You can clone a clone of a clone no worries, that's normal. Everybody who "visits a site" distributed by git has everything they need to *redistribute* the site. And git has built-in support for signing commits with GPG, which can go a long way toward resolving the trust problem (public keys can be distributed as part of the repository itself, which works out alright as long as you can be confident you make your initial clone from the genuine origin - not foolproof, but much better than nothing). All of this is just bog-standard git functionality, tried and tested, nothing new or exciting, 100% ready to go and documented in countless sources. This stuff is exactly what makes git a *distributed* version control system. The new idea here is really nothing more than using it to distribute writing to readers, instead of source code to developers or users.
It turns out we've *had* a decentralised, distributed, offline-first system for P2P storage and delivery of text files for 17 years now! It was just created for an application very different from blogging/phlogging/gemlogging. By the time git became an established and familiar technology, the web was in the full blown grip of "web 2.0" fever, and static, non-interactive content that was 90% text was consigned squarely to "the past". This resulted, I think, in a missed connection, which maybe we can finally make. There's nothing fundamentally wrong with interactivity, of course, nor with non-text media, either. But I don't need to tell anybody who is reading this via Gopher or Gemini that there's a whole universe of material which is interesting, or informative, or useful, or amusing, or uplifting, or otherwise valuable even if it's "just text" and even if you read it days or weeks or months or years after it was originally written. That's not a unique property of source code. It's true of our little small internet world, too! Git is just perfect for distributing exactly this kind of writing. You get delay-tolerant subscription for free: Atom and RSS can go to the dustbin of history. Constant internet connectivity is not required, although it doesn't hurt. You can pull from all your repos four times a day every day if you live all the time in an apartment with a permanent high-speed internet connection. If you're trying to spend less time online because you think that's better for you in some way(s), you can connect once in the morning, pull from all your repos and then disconnect and read what you received at your leisure. If you live on a boat and sometimes go without internet access for weeks at a time, that works just fine too. If you are travelling without regular internet access and you meet somebody on the way who follows some of the same repos you do, whichever one of you pulled from upstream less recently can pull from one who did so more recently to get some updates on the road - and then pull later from the official source once back in civilisation, without this switching of sources causing any problems. Stuff can continue to circulate for years after the original source disappears, provided enough people were interested enough in the first place to clone it and make their clones readable. To be honest, this feels to me like it could be an even better small internet platform than Gopher or Gemini, at least for some kinds of content (for others, perhaps not - I'll return to this later).
Of course, this is nothing like a *real* solution to any of the nasty problems of centralised client-server distribution. You can update your clone of a git repo from some source other than the original, official repo, and have confidence that what you get is genuine thanks to PGP, sure - if you know about that other source in advance. But there's no magic means by which knowing only the URL of the original repo you can automatically find the most up-to-date third party copy or copies which are online now and close by to you in network terms and pull from them instead. That's the kind of hard problem which makes real P2P systems complicated, and git does nothing at all to solve these. But we can 80:20 around this to some extent.
I've been vague up until now about exactly how this works in a hands on, daily use kind of way. I'm not proposing we literally spend our time doing git clones and git pulls manually by hand all the time (although you *could* use this system that way, and that should be seen as a feature, just like being able to access Gopherspace via telnet). We can build tools to streamline things. This is largely the reason, incidentally, for using git in particular and not Mercurial or Fossil or whatever else might be hot these days. Git is ubiquitous and isn't likely to stop being so anytime soon. It's been ported everywhere - you can use git today on Plan 9 or Minix 3 or whatever weird system floats your boat (are there still open source descendants of Solaris out there? If there are, I bet they have git). There are bindings to libgit in all major programming languages, allowing you to automate this stuff. All this work has already been done, and these tools are going to be kept up to date and ported further and documented better by people who don't know and don't care about the small group of dorks using git as a plain text content distribution system. It's exactly the same philosophy behind using TLS for Gemini and not something newer and better. Tiny guerilla computing projects can't afford to ignore the opportunity to have the enemy manufacture our weapons for us. So we build tools based on git, because a lot of us already know how to build them, and once they're built they'll be usable just about everywhere. We can throw together something which has the look and feel of a traditional Atom/RSS-based feed reader, but it's powered by git under the hood, it just looks at timestamped commits to figure out which files were updated when. And there's no reason we can't standardise on every repo designed to be used in this way having (or *optionally* having) a directory in the rep root with a well-known name which contains simple .ini or .json or .yaml or whatever files (no doubt getting everybody to agree on one of these would represent 99% of the work of actually bringing this idea to fruition) that provide a little bit of metadata in an easy-to-parse format. These could provide some of the feed metadata that you'd traditionally find in Atom/RSS, like a repository's title, subtitle, author, contact details and license information. They could provide GPG public keys. And they could be used to advertise the URLs of clones of the repo, its "official mirrors", and maybe where these clones are in the world and at what times of day they are mostly likely to be online (ditto for the original). The git-aware app could register all those URLs as additional "remotes" for the repo, and it could preferentially try to pull from the nearest one most most likely to be up when the user hits "refresh", and if that remote was down, it could fall back to the second best choice, and so on. This involves some manual coordination between authors and willing mirroring parties, and introduces a kind of dichotomy between "official mirrors" and "unofficial mirrors" which you'd need to learn about out of band and tell your client about, but I suspect we can tackle this in the usual grass-roots, small internet way and still end up somewhere better than we are right now. It's far from perfect, but it's also far from awful.
And we're *still* really just scratching the surface of what doing this would enable. To make it explicit, we're talking about a system where every participant keeps a full copy of the full history of every site they visit on their hard drive indefinitely. This sounds nuts at first. It also sounds nuts that in this system there is no way to fetch just a single post - if you want to read one post that somebody has told you about, you have to clone the full repo containing said post. That's, in some sense, woefully inefficient! These concerns diminish rapidly if we start thinking small. I've been phlogging on Gopher for over four years now. Anybody who has been following me all that time knows that I am *not* a succinct writer. I am relentlessly verbose. And yet, my phlog directory is 1.7 megabytes. Having to clone that whole lot to read one post doesn't seem so horrible knowing that. When visiting a single blog post on the web today you could easily pull down a lot more than 1.7 MB of external fonts, style sheets, surveillance Javascript, flashy background images and more. Cloning my whole phlog repo to read one post is less efficient than using Gopher to fetch just that one post, but it's still more efficient than the status quo of the web. Let's suppose that I continue to phlog at the same exhausting level of verbosity for fifty whole years in total. That would bring me up to just over 21 MB, which we can round up to 25 MB to make things simpler. Now, suppose you didn't want to just read *my* fifty years of rambling, but you wanted to read the ramblings of *one hundred people* who all wrote excessively for fifty years - arguably more output than any person really has the time to read. This would bring us up to 2.5 GB. That fits several times over on the smallest USB or SD storage device you can buy. Businesses literally give that much storage away for free in the form of promotional key chains. The above calculations could be off by a factor of ten (git itself obviously introduces some degree of storage overhead which I've completely failed to address so far and, in truth, know almost nothing about, but I'm pretty sure it's nothing like a factor of ten) and the storage burden of 25 GB would still be underwhelming, even for a 20 year old machine. We really can live this way. Text is *small*.
Having full local copies of everything ever written by anybody whom you've ever read a single small internet post by is a game changer in and of itself. Stuff like archive.org becomes at least partially obsolete, because you have the full history of each site locally. You can, to some extent, be your own search engine. Obviously you can't search your own disk to find stuff you've never previously fetched, but you can easily find stuff you vaguely recall reading a year ago, and if you've only just recently started following somebody who has been writing for years, you can search their back catalogue. You can ask your computer to find other posts you have on your disk which are "similar to" some particular post, in terms of them both using similar words or phrases which are otherwise rare. All sorts of machine learning, pattern recognition, recommendation engine type stuff could be done, if you wanted, but it's something you could do yourself entirely on your own machine with complete control and transparency and perfect privacy. If one of those metadata files in a well-known location in every repo mentioned earlier was a kind of machine-readable "git-roll" where authors could advertise the URLs of other repos that they are reading, then you could even do a little casual repo spidering (with a configurable maximum amount of disk space and monthly bandwidth dedicated to this - possibly both set to zero if you don't care for it). This all sounds somewhat futuristic, but indexing and searching and identifying fuzzy conceptual connections between a couple of gigabytes worth of text files is not exactly the computational cutting edge. I'm starting to feel like in some ways we have been denying ourselves super powers for years simply by continuing to distribute our content in a fashion which makes it really impractical to grab sites wholesale, even though the bandwidth and disk space required to do this (for simple text files, anyway) has long been easy to come by.
I've been unrelentingly positive about this whole prospect so far. So many benefits to content distribution via git! Aren't there any problems? Well, sure. There are two big ones that I've identified so far. One is technical, the other is, uhh, sociological? Or something? Let's deal with that one first. The basic issue is that stuff on the internet can become unavailable in two different ways. Sometimes stuff disappears involuntarily - due to technical faults, censorship, business failures, financial problems, etc. But sometimes stuff disappears because the author didn't want it up anymore and willingly took it down, which feels like a reasonable thing for authors to be able to do if they like. We might, very roughly, think of these as "bad disappearances" and "good disappearances", respectively. The problem is that it's not possible to solve the bad disappearance problems without making good disappearances impossible. Publishing something via this git system is in principle permanent and irreversible. If just one person clones or pulls from your repo before you take it down, other people can pull/clone from them and there's nothing you can do to stop this beyond asking nicely. It's not just "taking stuff down" that becomes infeasible. If you change your mind about something you wrote ten years ago and want to change it, you can do so - but everybody "subscribed" to your repository will be notified of this fact and will be able to see both the before and after versions. This kind of publishing is, by necessarily, radically long-lasting and radically transparent in a way that people aren't used to and many may not be ready for.
Many will say that the internet is *already* like this and you can never guarantee that anything you publish, via any protocol, won't be redistributed forever. This is exactly right. It's the very nature of a global network of general purpose computing devices, and we should never fool ourselves into thinking that any technology can prevent this. Furthermore, this isn't a problem unique to using git for publication, it's going to be a problem in *any* solution to these problems. Does that mean we should just forget about this issue? Maybe not. Just because something is always possible in principle doesn't mean that making it as quick and easy and convenient as possible will be without consequence. An internet which never forgets is handy in a lot of ways and in a lot of fields of endeavour. It's also strongly mismatched with human social psychology and norms. The small internet crowd tends to place a lot of emphasis on "human scale" computing and on personal connections, so I think this is worth flagging this and encouraging people to think about it. But I do also think it's possible to overstate how big of a deal this is. Maybe I've already done that. I dunno.
The other big problem, the technical one, is that of linking. That whole hypertext thing. Let's consider a "gitlog", i.e. a blog/phlog/gemlog-style resource which is published exclusively via a public git repository, and is not hosted on any of the traditional server-client request-per-page protocols ("gitlog" is a horrible name for this thing because it will cause massive confusion and search engine collision with the `git log` command, but I'll use it as a placeholder for now). Internal links within one gitlog are straightforward (at least if it's in HTML or gemtext, both of which support relative URLs), but how does the author of a post in this log provide a link to an individual post in another gitlog? An unambiguous pointer to an individual gitlog post necessarily has two parts: the URL of (any clone of) the repository, and a path relative to the repository root indicating the file containing the post in question. I am not aware of any pre-existing URL scheme for unambiguously conveying both these things at once, nor of any pre-existing hypertext format which allows "two part" links. It's not remotely hard to imagine how to cook up either one, perfectly straightforward in fact, but ugh, once we do that this stops being a super minimal "just use this existing thing to distribute your arbitrary existing text, with maybe a tiny bit of optional helper metadata sprinkled in if you want" approach and becomes a whole *thing* with its own unique format which you have to buy into. I really like that a lot of people are basically already 100% geared up to distribute their smol content this way by just making the private repository they already use for deployment publically readable, super quick and easy, no other change required. Anything which stands in the way of that feels like a bad idea. But without a standalone pure-gitlog linking solution, the whole system is limited to bihosting scenarios, where git-based distribution kind of lurks behind the scenes, and plain old gopher:// or gemini:// links what we actually include in our posts. This is not great, but perhaps something we can live with? Maybe there's a convention where if your Gopher or Gemini content is also available in this way, you configure your Gopher/Gemini server to respond to requests for a certain well-known endpoint with (i) a git repo URL and (ii) a regex or some-such for transforming your gopher:// or gemini:// URLs into paths relative to your repository root? That would work, I think, as a kind of easy machine-readable "gateway" from the Gopher/Gemini view of things to the git view of things. Maybe there are even better ways? I don't mean to suggest we can't somehow make this linking thing at least roughly work, I'm just highlighting that this is the most substantial issue I've thought of without an obvious and easy solution. I still think there's something worth pursuing inside all this. Hell, even if we give up on external linking, that's not the end of the world. The short, whimsical urban fantasy of Joneworlds is a genuine gem of modern Gopher/Geminispace, and the vast majority of it is entirely self-contained and very little would be lost by distributing it without any links at all.
I think at last that this is all I have to say about this for now. I mean, there's more I could say, all sorts of little details, but I think this is enough for now, to get the idea out there. I am happy to release this idea into the electronic wild and see what kind of life, if any, it may take up in the minds of the denizens of the small internet. I look forward to hearing people's thoughts.
I've written all this up over the past week or so (took longer than I thought it would!), but the core of the idea has been brewing for a few months, and I've been influenced in various ways by some stuff that I wrote and also some stuff I read over the past year or two. I'm going to try to dump links to all of these influential things below, but this probably won't be exhaustive, sorry.
I got started thinking about using git for content distribution during the formation of Circumlunar Space's newish zine project, Circumlunar Transmissions. The zines are hosted via Gemini and Gopher, but you can also clone a git repo, the idea being that people can use this to easily host Gemini or Gopher mirrors. At some point I started wondering about the possibility of just skipping that last part and distributing it entirely via git. A zine is a *perfect* use case for this. No sane person expects a zine to be editable or deletable after release.
Circumlunar Transmissions zine (republic.circumlunar.space)
I was motivated to write this stuff up *now*, not in another week or a month, by a recent post by ploum which asked "Could we imagine a decentralised and delay-tolerant network simple enough so you could implement it in a day?", and envisaged a system where folders of PGP-signed Markdown documents are copied to the local disk and browsed from there. There are some extra ideas in there about using the system for something like email, too. If you ignore that part and just focus on the Markdown distribution, well, I think git basically already does this.
Ploum's gemlog post "Offmini, My Dream of Making Gemini Offline and Distributed" (rawtext.club)
Not long at all after I read that post and decided I had better start writing, I came across a short article, inspired by the off-grid working habits of the 100 rabbits crew (who produce code and art while living on a 10 meter yacht, occasionally taking breaks to make awe-inspiring and death-defying ocean crossings), which asserted that "saving pages with wget is like low-budget p2p" (inspiring the title of this post), and asked a bunch of provocative questions:
What if the browser was local-first?
What if websites showed up as files and folders on my computer?
What if the browser saved a copy of everything I bookmarked?
What if I had my own personal wayback machine?
What if I had a little local Google that could search the full text of everything Ive ever saved?
What if I could copy those website files and remix them? Add links. Mark them up with highlights Write margin notes.
What if the whole web was built around copying/remixing/sharing?
Using git for distribution straightforwardly opens up all of these possibilities.
Article "Saving copies of everything is like low-budget p2p" (https://subconscious.substack.com)
The 100 rabbits Gemini capsule (gemini.circumlunar.space)
In writing this up, I tapped into some older ideas. The first is one of my own: I made a phlog post about 2 years ago wherein I claimed that "on a computer with even vaguely modern specs, it would probably be possible to use a Gopher client which *automatically and immediately* archived every singe documented you visited, as you visited it, and maintained a searchable full text index of those archives, without this being unduly taxing on processor time or disk space". I discussed this in the context of the value of being able to easily disappear, which is something we lose with git distribution.
My phlog post: "The individual archivist and ghosts of gophers past" (gopher://zaibatsu.circumlunar.space)
I take the question of when, if ever, radically permanent online writing make sense as seriously a I do because of an even earlier past made by Alex Schroeder, concerning the Secure Scuttlebut protocol. Alex says "I dont like systems where I cannot delete things. I dont need non-repudiation since Im talking to people, not signing contracts. Basically a “unforgeable append-only” system is similar to a legal set of contracts and not at all like conversations in real life". He's well aware that it's impossible to guarantee all participants in a distributed system will comply with a request to delete something, but still thinks it's better to build systems which at least *try* to let us undo. I'm still not 100% sure I agree, but I totally understand and respect the perspective. Systems with a hard "no take backs" property shouldn't be designed or used lightly - especially not in light-hearted social contexts, where they seem an especially bad match.
Alex's post "No take back" (alexschroeder.ch)
Drew Devault wrote about a year ago about better (open-source, non-commercial, pro-privacy) approaches to search engines, in which he floats the idea of search engines not crawling the entire web, but limiting themselves to a list of "tier 1" domains which are "authoritative or high-quality sources for their respective specializations", as well as pages which link to tier 1 domains. My idea of doing just a little bit of exploratory spidering and indexing of the git repos which are advertised in those repos you subscribe to, ending up with the ability to search a small, socially-defined corner of "gitspace", was inspired by this. Search is a useful feature even if you can't search *all* the things.
Drew's post "We can do better than DuckDuckGo" (drewdevault.com)
Finally, some links to other stuff I mentioned above:
DAT, a "new p2p hypermedia protocol" (https://www.datprotocol.com)
IPFS, a "peer-to-peer hypermedia protocol" (https://ipfs.io)
SSB, a secure and decentralised P2P messaging / file-sharing platform (https://scuttlebot.io)
Freenet, ye olde "peer-to-peer platform for censorship-resistant communication and publishing" (https://freenetproject.org)
Joneworlds, small internet fiction so good I've been manually saving local copies just in case... (republic.circumlunar.space)
<!DOCTYPE HTML>
<html lang="en">
<p>Low budget P2P content distribution with git</p>
<p>archived: 1-24-24 from: https://portal.mozz.us/gemini/zaibatsu.circumlunar.space/~solderpunk/gemlog/low-budget-p2p-content-distribution-with-git.gmi</p>
<p>In recent months I've spent a lot less time than is typical thinking about anything to do with computers and the internet, but there is one train of thought I've been repeatedly pondering. I had hoped to write up a bunch of less technical stuff first (don't worry, that's still coming - I'm kind of disappointed in myself that I've lapsed into writing a massive computery, internety post so soon after coming back to writing here. Bad Solderpunk! In penance I'm going not going to write any more for at least a month - stay tuned for cycling, environmentalism and manga, though), but it seems like this technical idea has become just a little topical just recently, so perhaps now is actually a good time to get this idea out there. Let me be very clear from the outset that I'm just idea-sketching out loud here. This isn't a new project, or anything, I'm not giving the system I'm about to describe a name or committing to fleshing out the details or anything like that. That's not to say nothing will ever come of this, I just want to make it clear from the outset that these ideas are half-baked at best and I'm absolutely not committed to jumping head first into wherever this train of thought leads...</p>
<p>Protocols like Gemini and Gopher are an effective salve against many of the miseries inflicted by the modern web, but by no means do they solve *all* the web's problems. All three systems share the same big picture architecture, namely that the default pattern of usage is that content lives in exactly one place, a server which is online 24/7, 365 days a year and accessible from anywhere on Earth, and that to consume this content you request a copy of it at the instant of consumption, render it to the screen and then discard it (perhaps after a relatively brief cache lifetime), leaving no persistent copy, with the understanding that if you want to read something again next week or month or year you'll just request a fresh copy and do all this again. Because all three protocols work this way, all three of them share a long list of common shortcomings, mostly about losing access to stuff you'd like to still have access to. Online content can become inaccessible to *you* in the short term if your internet connection goes down. It can inaccessible to *anybody* in the short term if the server goes down. It can become inaccessible to large groups of people in the *long term* due to the ease with which authoritarian governments can block access to a single server. It can become inaccessible to *everybody* *forever* if the hosting service disappears (think Geocities), or if the person running a private server dies or is incapacitated and none of their friends or family know which bills to pay to keep the thing up. These problems can be mitigated to some extent via load sharing, content delivery networks, caching proxies, etc. All these solutions involve setting up yet *more* computers which are switched on and connected to the net 24/7, which is expensive both financially and environmentally. On a long enough timeline, the survival rate for all websites drops to zero: find some mailing list archives from the late 90s or early 00s and try visiting all the URLs people shared in it. More than 90% of them won't work. 20 or 25 years is not an awfully long time span for this kind of decay to happen in.</p>
<p>None of these observations are new or exciting, and there are no shortage of projects attempting to address various of these shortcomings in various domains. You've maybe heard of DAT and IPFS and SSB, and those are just the Johnny-come-latelies to this sphere. Freenet has been around for over 20 years, and I don't doubt that it has predecessors of its own. What all of these projects have in common is conceptual complexity. They're distributed, decentralised, peer-to-peer, content-addressed, cryptographically authenticated, and more. This isn't intended as a criticism. These projects have a much higher ratio of essential complexity to "empty complexity" than something like a modern web browser, because they're trying to solve substantially more difficult problems, making some conceptual complexity is unavoidable. But all of the projects above and their associated ideas have met with fairly limited implementation by developers and fairly limited uptake by users, and I think the high barrier to entry represented by a lot of conceptual complexity, even if it is essential, is probably a large part of the reason for this (that and a healthy serving of apathy, no doubt). I'm not trying to say that the search for clever solutions to these problems is futile, not at all. I'm just laying out what seem to me to be the facts.</p>
<p>Completely solving the problems associated with an always-online, purely client-server web is never going to be easy. The wait for something which works well enough and is user friendly enough to facilitate serious uptake is going to be a long (though hopefully worthwhile) one. In the meantime, it's tempting to wonder whether or not there is some kind of "80:20" solution to these problems which gets at least some of us at least some of the way there - enough of the way to be worthwhile - without a huge learning curve. Lately I've been thinking that maybe there is, and that maybe it's actually not even all that hard. In fact it's so incredible simple that I'm almost embarrassing to say it out loud, out of fear that if it were *that* simple then people would *obviously* already be doing, so clearly I've missed something big due to not being smart enough. Or maybe some people well of my radar *are* doing this, and that's what I've missed. Anyway, are you ready for this huge idea? Here it is.</p>
<p>Use git.</p>
<p>No, really, just use git. Not the way you're possibly using it already (like I am), as a kind of deployment mechanism, where you write your posts locally, commit them to a repo, then push to a remote copy of that repo only you have access t only you have access to, triggering a hook which checks out a copy of your work in whatever directory your web/Gopher/Gemini server looks in (although, if you're doing that, switching to using it the way I'm talking about is a piece of cake). I'm talking about using git for small internet content the way people use it for source code, as an actual distribution mechanism for ending up with a local copy of something on our disk that you then use offline (by compiling it, interpreting it, etc). I'm talking about your text-centric online content being nothing more than a public git repository. If somebody wants to read your posts, they clone your repo. Then they've got your posts on their disk, and they can read them from there. If they go offline, it doesn't matter, because your stuff is on their disk. They can read it today, and tomorrow, and next year. If your server goes offline, it doesn't matter, because your stuff is on their disk. If they like your stuff and want to read more of it, then next time both they and you are online, they are one `git pull` away from getting any updates you've made since their original clone. There is no need for Atom, or RSS, or carefully formatted index pages with datestamps integrated into link text. When distributing by git, visiting a site and subscribing to a site are one and the same thing. No extra technological concessions to the notion of "subscribability" are needed. Furthermore, when distributing by git, visiting a site and making a complete offline archive of the site are one and the same thing. There's no need for slow, clumsy, error-prone and admin-irritating loops of repeatedly fetching and parsing files using tools like wget to discover the URL of every single resource in a site. You just grab the whole thing at once in a single network transaction, no parsing required. Git is actually better than Atom/RSS and recursive wget combined! An Atom or RSS feed usually only has the 10 or 20 most recent updates in it, so if you're offline for a long time you'll miss some stuff. Git won't, you'll get every commit made since your last pull. And a recursive wget just leaves you with an offline copy of an entire site as it was in one point in time. There's no way get *just* the new stuff one month later - sure, with HTTP(S) you can use headers like If-Modified-Since to avoid fetching new copies of stuff that's changed, but you still need to make a request for every single page which *could* have changed. With git you just pull and that's it.</p>
<p>I've barely scratched the surface here. I'm going to keep going, but first let's really quickly think about this from a network privacy point of view. Cloning or pulling a git repo involves making network connections to *one* server, known in advance, and has no side effects. There are no cookies or anything cookie-esque to tie subsequent requests together at a more fine-grained or persistent level than the IP address. This is much better than the web, and exactly on par with Gemini and Gopher. If you want to, you can do git stuff over HTTPS or SSH, and that's normal and standard, so in this respect we're better than Gopher where plaintext is the only option. But if you don't want to use crypto, or your computer can't handle it, or you're using some futuristic internet overlay like Yggdrasil so you get transport security without baking it into every protocol, you can do a plaintext git:// clone. So for some folks this is better than Gemini, where it's TLS or bust. But the git-as-distribution-tool approach gives you something that none of the web or Gopher or Gemini give you: it's one network transaction for the *whole site*, and that's it. A git admin knows that you (or rather, your IP address) has cloned their repo and now has all their posts. But that's it. They don't know which posts you read, and which ones you don't. They don't know which posts you read once and which ones you read every day and which ones you only read in the middle of cold, lonely nights. There is nothing like a "click stream" for them to analyse. Even the boogeyman of "traffic analysis", where the size and latency of opaque encrypted transactions are used by third parties to reconstruct your path through a public site gain no traction here. Your fine-grained consumption habits are entirely invisible to everybody but you. That's really neat!</p>
<p>One more brief digression: I've described everything so far in network terms (and will get back to that shortly and then do it for the rest of this post). But keep in mind, please, that there is *nothing* network-centric about this idea. We're all very used to doing git clones and pulls over TCP/IP, but you can clone and pull from the filesystem just fine. Try it. Git won't bat an eyelid. That means you can clone and pull from USB sticks and SD cards, which means this whole thing works just fine over sneakernet. You don't have to go "all in" on sneakernet, you can mix and match it with networking in whatever proportion suits you, and transition slowly from using mainly one to mainly the other on an as-needed basis. I think about sneakernet a lot these days, and I think anybody else who's interested in sustainable/perma-/salvage computing ought to as well. I'll write more about this some other time. Let me just say for now that the fact that this git-for-distribution thing works seamlessly via sneakernet is a big plus for me.</p>
<p>Okay, back to the main thrust: by visiting/archiving/subscribing to a site via git we get even more than Atom/RSS and recursive wget combined can offer, with less effort on the part of either producer or consumer. Jake. But so far we've still talking about readers fetching content from a single authoritative source operated by authors, so we still have a lot the usual centralisation problems. This approach still puts a potentially heavy load on one authoritative server, it still requires lots of long distance data traffic, and if the author's server disappears forever *before* you got a chance to clone the repo, you're out of luck. Getting past these hurdles in a web/Gopher/Gemini context isn't easy. If I use recursive wget to get a complete local copy of some website, then in order to enable somebody else to use a recursive wget to get a complete copy from *me* (because my server is closer, or more reliable, or the original is gone) there's a lot more rigmarole involved. I'd need to setup a webserver and point it at my copy, and there's no guarantee that alone is enough. The site may not work properly without suitable URL rewriting or redirecting rules or similar configuration details in place on the server side. I'd need to reproduce those settings exactly, and the information required to do so is *not* something I'd end up with as a consequence of doing the original recursive wget. So the whole procedure kind of only works once, and can't reliably be chained, with an n-th party getting a fully functioning copy from a (n-1)-th party's copy. Even if redirects/rewrites weren't in the picture and this chaining *was* possible, there'd naturally be a big question of trust, as at any stage along the chain the site could be modified by somebody other than the original author and you'd be none the wiser. But none of these problems are there in the git version! You can clone a clone of a clone no worries, that's normal. Everybody who "visits a site" distributed by git has everything they need to *redistribute* the site. And git has built-in support for signing commits with GPG, which can go a long way toward resolving the trust problem (public keys can be distributed as part of the repository itself, which works out alright as long as you can be confident you make your initial clone from the genuine origin - not foolproof, but much better than nothing). All of this is just bog-standard git functionality, tried and tested, nothing new or exciting, 100% ready to go and documented in countless sources. This stuff is exactly what makes git a *distributed* version control system. The new idea here is really nothing more than using it to distribute writing to readers, instead of source code to developers or users.</p>
<p>It turns out we've *had* a decentralised, distributed, offline-first system for P2P storage and delivery of text files for 17 years now! It was just created for an application very different from blogging/phlogging/gemlogging. By the time git became an established and familiar technology, the web was in the full blown grip of "web 2.0" fever, and static, non-interactive content that was 90% text was consigned squarely to "the past". This resulted, I think, in a missed connection, which maybe we can finally make. There's nothing fundamentally wrong with interactivity, of course, nor with non-text media, either. But I don't need to tell anybody who is reading this via Gopher or Gemini that there's a whole universe of material which is interesting, or informative, or useful, or amusing, or uplifting, or otherwise valuable even if it's "just text" and even if you read it days or weeks or months or years after it was originally written. That's not a unique property of source code. It's true of our little small internet world, too! Git is just perfect for distributing exactly this kind of writing. You get delay-tolerant subscription for free: Atom and RSS can go to the dustbin of history. Constant internet connectivity is not required, although it doesn't hurt. You can pull from all your repos four times a day every day if you live all the time in an apartment with a permanent high-speed internet connection. If you're trying to spend less time online because you think that's better for you in some way(s), you can connect once in the morning, pull from all your repos and then disconnect and read what you received at your leisure. If you live on a boat and sometimes go without internet access for weeks at a time, that works just fine too. If you are travelling without regular internet access and you meet somebody on the way who follows some of the same repos you do, whichever one of you pulled from upstream less recently can pull from one who did so more recently to get some updates on the road - and then pull later from the official source once back in civilisation, without this switching of sources causing any problems. Stuff can continue to circulate for years after the original source disappears, provided enough people were interested enough in the first place to clone it and make their clones readable. To be honest, this feels to me like it could be an even better small internet platform than Gopher or Gemini, at least for some kinds of content (for others, perhaps not - I'll return to this later).</p>
<p>Of course, this is nothing like a *real* solution to any of the nasty problems of centralised client-server distribution. You can update your clone of a git repo from some source other than the original, official repo, and have confidence that what you get is genuine thanks to PGP, sure - if you know about that other source in advance. But there's no magic means by which knowing only the URL of the original repo you can automatically find the most up-to-date third party copy or copies which are online now and close by to you in network terms and pull from them instead. That's the kind of hard problem which makes real P2P systems complicated, and git does nothing at all to solve these. But we can 80:20 around this to some extent.</p>
<p>I've been vague up until now about exactly how this works in a hands on, daily use kind of way. I'm not proposing we literally spend our time doing git clones and git pulls manually by hand all the time (although you *could* use this system that way, and that should be seen as a feature, just like being able to access Gopherspace via telnet). We can build tools to streamline things. This is largely the reason, incidentally, for using git in particular and not Mercurial or Fossil or whatever else might be hot these days. Git is ubiquitous and isn't likely to stop being so anytime soon. It's been ported everywhere - you can use git today on Plan 9 or Minix 3 or whatever weird system floats your boat (are there still open source descendants of Solaris out there? If there are, I bet they have git). There are bindings to libgit in all major programming languages, allowing you to automate this stuff. All this work has already been done, and these tools are going to be kept up to date and ported further and documented better by people who don't know and don't care about the small group of dorks using git as a plain text content distribution system. It's exactly the same philosophy behind using TLS for Gemini and not something newer and better. Tiny guerilla computing projects can't afford to ignore the opportunity to have the enemy manufacture our weapons for us. So we build tools based on git, because a lot of us already know how to build them, and once they're built they'll be usable just about everywhere. We can throw together something which has the look and feel of a traditional Atom/RSS-based feed reader, but it's powered by git under the hood, it just looks at timestamped commits to figure out which files were updated when. And there's no reason we can't standardise on every repo designed to be used in this way having (or *optionally* having) a directory in the rep root with a well-known name which contains simple .ini or .json or .yaml or whatever files (no doubt getting everybody to agree on one of these would represent 99% of the work of actually bringing this idea to fruition) that provide a little bit of metadata in an easy-to-parse format. These could provide some of the feed metadata that you'd traditionally find in Atom/RSS, like a repository's title, subtitle, author, contact details and license information. They could provide GPG public keys. And they could be used to advertise the URLs of clones of the repo, its "official mirrors", and maybe where these clones are in the world and at what times of day they are mostly likely to be online (ditto for the original). The git-aware app could register all those URLs as additional "remotes" for the repo, and it could preferentially try to pull from the nearest one most most likely to be up when the user hits "refresh", and if that remote was down, it could fall back to the second best choice, and so on. This involves some manual coordination between authors and willing mirroring parties, and introduces a kind of dichotomy between "official mirrors" and "unofficial mirrors" which you'd need to learn about out of band and tell your client about, but I suspect we can tackle this in the usual grass-roots, small internet way and still end up somewhere better than we are right now. It's far from perfect, but it's also far from awful.</p>
<p>And we're *still* really just scratching the surface of what doing this would enable. To make it explicit, we're talking about a system where every participant keeps a full copy of the full history of every site they visit on their hard drive indefinitely. This sounds nuts at first. It also sounds nuts that in this system there is no way to fetch just a single post - if you want to read one post that somebody has told you about, you have to clone the full repo containing said post. That's, in some sense, woefully inefficient! These concerns diminish rapidly if we start thinking small. I've been phlogging on Gopher for over four years now. Anybody who has been following me all that time knows that I am *not* a succinct writer. I am relentlessly verbose. And yet, my phlog directory is 1.7 megabytes. Having to clone that whole lot to read one post doesn't seem so horrible knowing that. When visiting a single blog post on the web today you could easily pull down a lot more than 1.7 MB of external fonts, style sheets, surveillance Javascript, flashy background images and more. Cloning my whole phlog repo to read one post is less efficient than using Gopher to fetch just that one post, but it's still more efficient than the status quo of the web. Let's suppose that I continue to phlog at the same exhausting level of verbosity for fifty whole years in total. That would bring me up to just over 21 MB, which we can round up to 25 MB to make things simpler. Now, suppose you didn't want to just read *my* fifty years of rambling, but you wanted to read the ramblings of *one hundred people* who all wrote excessively for fifty years - arguably more output than any person really has the time to read. This would bring us up to 2.5 GB. That fits several times over on the smallest USB or SD storage device you can buy. Businesses literally give that much storage away for free in the form of promotional key chains. The above calculations could be off by a factor of ten (git itself obviously introduces some degree of storage overhead which I've completely failed to address so far and, in truth, know almost nothing about, but I'm pretty sure it's nothing like a factor of ten) and the storage burden of 25 GB would still be underwhelming, even for a 20 year old machine. We really can live this way. Text is *small*.</p>
<p>Having full local copies of everything ever written by anybody whom you've ever read a single small internet post by is a game changer in and of itself. Stuff like archive.org becomes at least partially obsolete, because you have the full history of each site locally. You can, to some extent, be your own search engine. Obviously you can't search your own disk to find stuff you've never previously fetched, but you can easily find stuff you vaguely recall reading a year ago, and if you've only just recently started following somebody who has been writing for years, you can search their back catalogue. You can ask your computer to find other posts you have on your disk which are "similar to" some particular post, in terms of them both using similar words or phrases which are otherwise rare. All sorts of machine learning, pattern recognition, recommendation engine type stuff could be done, if you wanted, but it's something you could do yourself entirely on your own machine with complete control and transparency and perfect privacy. If one of those metadata files in a well-known location in every repo mentioned earlier was a kind of machine-readable "git-roll" where authors could advertise the URLs of other repos that they are reading, then you could even do a little casual repo spidering (with a configurable maximum amount of disk space and monthly bandwidth dedicated to this - possibly both set to zero if you don't care for it). This all sounds somewhat futuristic, but indexing and searching and identifying fuzzy conceptual connections between a couple of gigabytes worth of text files is not exactly the computational cutting edge. I'm starting to feel like in some ways we have been denying ourselves super powers for years simply by continuing to distribute our content in a fashion which makes it really impractical to grab sites wholesale, even though the bandwidth and disk space required to do this (for simple text files, anyway) has long been easy to come by.</p>
<p>I've been unrelentingly positive about this whole prospect so far. So many benefits to content distribution via git! Aren't there any problems? Well, sure. There are two big ones that I've identified so far. One is technical, the other is, uhh, sociological? Or something? Let's deal with that one first. The basic issue is that stuff on the internet can become unavailable in two different ways. Sometimes stuff disappears involuntarily - due to technical faults, censorship, business failures, financial problems, etc. But sometimes stuff disappears because the author didn't want it up anymore and willingly took it down, which feels like a reasonable thing for authors to be able to do if they like. We might, very roughly, think of these as "bad disappearances" and "good disappearances", respectively. The problem is that it's not possible to solve the bad disappearance problems without making good disappearances impossible. Publishing something via this git system is in principle permanent and irreversible. If just one person clones or pulls from your repo before you take it down, other people can pull/clone from them and there's nothing you can do to stop this beyond asking nicely. It's not just "taking stuff down" that becomes infeasible. If you change your mind about something you wrote ten years ago and want to change it, you can do so - but everybody "subscribed" to your repository will be notified of this fact and will be able to see both the before and after versions. This kind of publishing is, by necessarily, radically long-lasting and radically transparent in a way that people aren't used to and many may not be ready for.</p>
<p>Many will say that the internet is *already* like this and you can never guarantee that anything you publish, via any protocol, won't be redistributed forever. This is exactly right. It's the very nature of a global network of general purpose computing devices, and we should never fool ourselves into thinking that any technology can prevent this. Furthermore, this isn't a problem unique to using git for publication, it's going to be a problem in *any* solution to these problems. Does that mean we should just forget about this issue? Maybe not. Just because something is always possible in principle doesn't mean that making it as quick and easy and convenient as possible will be without consequence. An internet which never forgets is handy in a lot of ways and in a lot of fields of endeavour. It's also strongly mismatched with human social psychology and norms. The small internet crowd tends to place a lot of emphasis on "human scale" computing and on personal connections, so I think this is worth flagging this and encouraging people to think about it. But I do also think it's possible to overstate how big of a deal this is. Maybe I've already done that. I dunno.</p>
<p>The other big problem, the technical one, is that of linking. That whole hypertext thing. Let's consider a "gitlog", i.e. a blog/phlog/gemlog-style resource which is published exclusively via a public git repository, and is not hosted on any of the traditional server-client request-per-page protocols ("gitlog" is a horrible name for this thing because it will cause massive confusion and search engine collision with the `git log` command, but I'll use it as a placeholder for now). Internal links within one gitlog are straightforward (at least if it's in HTML or gemtext, both of which support relative URLs), but how does the author of a post in this log provide a link to an individual post in another gitlog? An unambiguous pointer to an individual gitlog post necessarily has two parts: the URL of (any clone of) the repository, and a path relative to the repository root indicating the file containing the post in question. I am not aware of any pre-existing URL scheme for unambiguously conveying both these things at once, nor of any pre-existing hypertext format which allows "two part" links. It's not remotely hard to imagine how to cook up either one, perfectly straightforward in fact, but ugh, once we do that this stops being a super minimal "just use this existing thing to distribute your arbitrary existing text, with maybe a tiny bit of optional helper metadata sprinkled in if you want" approach and becomes a whole *thing* with its own unique format which you have to buy into. I really like that a lot of people are basically already 100% geared up to distribute their smol content this way by just making the private repository they already use for deployment publically readable, super quick and easy, no other change required. Anything which stands in the way of that feels like a bad idea. But without a standalone pure-gitlog linking solution, the whole system is limited to bihosting scenarios, where git-based distribution kind of lurks behind the scenes, and plain old gopher:// or gemini:// links what we actually include in our posts. This is not great, but perhaps something we can live with? Maybe there's a convention where if your Gopher or Gemini content is also available in this way, you configure your Gopher/Gemini server to respond to requests for a certain well-known endpoint with (i) a git repo URL and (ii) a regex or some-such for transforming your gopher:// or gemini:// URLs into paths relative to your repository root? That would work, I think, as a kind of easy machine-readable "gateway" from the Gopher/Gemini view of things to the git view of things. Maybe there are even better ways? I don't mean to suggest we can't somehow make this linking thing at least roughly work, I'm just highlighting that this is the most substantial issue I've thought of without an obvious and easy solution. I still think there's something worth pursuing inside all this. Hell, even if we give up on external linking, that's not the end of the world. The short, whimsical urban fantasy of Joneworlds is a genuine gem of modern Gopher/Geminispace, and the vast majority of it is entirely self-contained and very little would be lost by distributing it without any links at all.</p>
<p>I think at last that this is all I have to say about this for now. I mean, there's more I could say, all sorts of little details, but I think this is enough for now, to get the idea out there. I am happy to release this idea into the electronic wild and see what kind of life, if any, it may take up in the minds of the denizens of the small internet. I look forward to hearing people's thoughts.</p>
<p>I've written all this up over the past week or so (took longer than I thought it would!), but the core of the idea has been brewing for a few months, and I've been influenced in various ways by some stuff that I wrote and also some stuff I read over the past year or two. I'm going to try to dump links to all of these influential things below, but this probably won't be exhaustive, sorry.</p>
<p>I got started thinking about using git for content distribution during the formation of Circumlunar Space's newish zine project, Circumlunar Transmissions. The zines are hosted via Gemini and Gopher, but you can also clone a git repo, the idea being that people can use this to easily host Gemini or Gopher mirrors. At some point I started wondering about the possibility of just skipping that last part and distributing it entirely via git. A zine is a *perfect* use case for this. No sane person expects a zine to be editable or deletable after release.</p>
<p>Circumlunar Transmissions zine (republic.circumlunar.space)</p>
<p>I was motivated to write this stuff up *now*, not in another week or a month, by a recent post by ploum which asked "Could we imagine a decentralised and delay-tolerant network simple enough so you could implement it in a day?", and envisaged a system where folders of PGP-signed Markdown documents are copied to the local disk and browsed from there. There are some extra ideas in there about using the system for something like email, too. If you ignore that part and just focus on the Markdown distribution, well, I think git basically already does this.</p>
<p>Ploum's gemlog post "Offmini, My Dream of Making Gemini Offline and Distributed" (rawtext.club)</p>
<p>Not long at all after I read that post and decided I had better start writing, I came across a short article, inspired by the off-grid working habits of the 100 rabbits crew (who produce code and art while living on a 10 meter yacht, occasionally taking breaks to make awe-inspiring and death-defying ocean crossings), which asserted that "saving pages with wget is like low-budget p2p" (inspiring the title of this post), and asked a bunch of provocative questions:</p>
<p>What if the browser was local-first?</p>
<p>What if websites showed up as files and folders on my computer?</p>
<p>What if the browser saved a copy of everything I bookmarked?</p>
<p>What if I had my own personal wayback machine?</p>
<p>What if I had a little local Google that could search the full text of everything Ive ever saved?</p>
<p>What if I could copy those website files and remix them? Add links. Mark them up with highlights Write margin notes.</p>
<p>What if the whole web was built around copying/remixing/sharing?</p>
<p>Using git for distribution straightforwardly opens up all of these possibilities.</p>
<p>Article "Saving copies of everything is like low-budget p2p" (https://subconscious.substack.com)</p>
<p>The 100 rabbits Gemini capsule (gemini.circumlunar.space)</p>
<p>In writing this up, I tapped into some older ideas. The first is one of my own: I made a phlog post about 2 years ago wherein I claimed that "on a computer with even vaguely modern specs, it would probably be possible to use a Gopher client which *automatically and immediately* archived every singe documented you visited, as you visited it, and maintained a searchable full text index of those archives, without this being unduly taxing on processor time or disk space". I discussed this in the context of the value of being able to easily disappear, which is something we lose with git distribution.</p>
<p>My phlog post: "The individual archivist and ghosts of gophers past" (gopher://zaibatsu.circumlunar.space)</p>
<p>I take the question of when, if ever, radically permanent online writing make sense as seriously a I do because of an even earlier past made by Alex Schroeder, concerning the Secure Scuttlebut protocol. Alex says "I dont like systems where I cannot delete things. I dont need non-repudiation since Im talking to people, not signing contracts. Basically a “unforgeable append-only” system is similar to a legal set of contracts and not at all like conversations in real life". He's well aware that it's impossible to guarantee all participants in a distributed system will comply with a request to delete something, but still thinks it's better to build systems which at least *try* to let us undo. I'm still not 100% sure I agree, but I totally understand and respect the perspective. Systems with a hard "no take backs" property shouldn't be designed or used lightly - especially not in light-hearted social contexts, where they seem an especially bad match.</p>
<p>Alex's post "No take back" (alexschroeder.ch)</p>
<p>Drew Devault wrote about a year ago about better (open-source, non-commercial, pro-privacy) approaches to search engines, in which he floats the idea of search engines not crawling the entire web, but limiting themselves to a list of "tier 1" domains which are "authoritative or high-quality sources for their respective specializations", as well as pages which link to tier 1 domains. My idea of doing just a little bit of exploratory spidering and indexing of the git repos which are advertised in those repos you subscribe to, ending up with the ability to search a small, socially-defined corner of "gitspace", was inspired by this. Search is a useful feature even if you can't search *all* the things.</p>
<p>Drew's post "We can do better than DuckDuckGo" (drewdevault.com)</p>
<p>Finally, some links to other stuff I mentioned above:</p>
<p>DAT, a "new p2p hypermedia protocol" (https://www.datprotocol.com)</p>
<p>IPFS, a "peer-to-peer hypermedia protocol" (https://ipfs.io)</p>
<p>SSB, a secure and decentralised P2P messaging / file-sharing platform (https://scuttlebot.io)</p>
<p>Freenet, ye olde "peer-to-peer platform for censorship-resistant communication and publishing" (https://freenetproject.org)</p>
<p>Joneworlds, small internet fiction so good I've been manually saving local copies just in case... (republic.circumlunar.space)</p>