The Interplanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files, thus closing the gap between the state of the global file sharing network previously, and where the network must be for complete decentralization and distribution.
IPFS serves 3 major roles: file management; file storage; and file versioning. IPFS technology aims to allow for hosting and distributing of massive (1000 terabyte) datasets; versioning and linking of these datasets; computing on datasets across organizations; high-volume, high-definition, on-demand, real-time media streams; and preventing the accidental disappearance of important files.
From its inception in 1991, Hypertext Transfer Protocol (HTTP) has unified the entire world into a single global information protocol, standardizing how we distribute and present information to each other. However in the most recent decade, as decentralized technologies have come into existence, current file sharing/information protocols have been unable to fully support these new decentralized platforms. In order for a globally decentralized ecosystem to thrive, it is critical that a distributed file sharing/information protocol system be used.
BitTorrent and BitSwap
If you’ve ever heard of µTorrent or other torrenting services, the concept of distributed storage may be familiar. Rather than a central server hosting a file for download, the file is broken into many segments, which are duplicated and stored in many computers or other servers throughout a network. These files can be “seeded” by hosts, allowing users to download a file from multiple sources. Once downloaded, those same users can then go on to seed that file for others.
In IPFS, the hosts seeding files are called nodes. IPFS incentivizes nodes who contribute to each other, and punishes nodes that are only pulling resources. These file segments being transferred are referred to as blocks (not typical blocks such as in a blockchain but more so as game theory blocks behind data syncing). Data distribution happens by exchanging blocks with peers using a BitTorrent inspired protocol: BitSwap. Like BitTorrent, BitSwap peers are looking to acquire a set of blocks, and have another set of blocks to offer in exchange. These exchanges are governed by a protocol called Filecoin, which is outside the scope of this primer but can be researched here.
Content Addressing and File Versioning
In standard HTTP the location of a file is determined by its address, which is assigned by the owner of the file. In IPFS, the address of a file is determined by the file’s contents and not by any single entity. Content addressing is accomplished by cryptographically hashing the contents and using that hash as the address. Instead of referring to objects (pics, articles, videos) by which server they are stored on, IPFS refers to everything by the hash of the file, which is derived from its contents.
Whether containing a single letter or an entire book, once a file goes through a hash algorithm it will have a unique hash address of 46 characters (in IPFS always beginning with “Qm”) . Any duplication of that information will result in the same hash, thus resolving deduplication. However if the contents of the file are modified in the slightest, a completely different looking 46-character hash address will be generated. Through this system every version of a file will have a unique hash and therefore be permanently stored throughout the network.
IPFS & HTTP
What’s wrong with HTTP? Simply put, there are too many single points of failure. Servers can be shutdown, modified, or blocked. These issues are commonly due to servers crashing without proper backups, domain ownership changing hands, companies going out of business, or government interference. All of these problems lead to permanently losing the ability to access the affected domains and resulting information.
How is IPFS different? With HTTP, you search for locations. With IPFS, you search for content.
When you ask the IPFS distributed network for a specific hash, it efficiently finds the nodes that have the data, retrieves it, and verifies it is the correct data using the hash. Multiple copies of data are stored on many nodes throughout the network, and are all easily retrievable based on their hash address created from the underlying content.
The distributed web will quickly become the fastest, most available, and largest store of data on planet earth. And no one will have the ability to destroy information by shutting it down.
Use Cases with IPFS
IPFS has the ability to not only improve the World Wide Web, but to usher in a new era of decentralized applications built upon this distributed network. There are several professions that will see immediate advantages. Archivists, researchers, and blockchain developers will be able to store, organize, distribute, and work with incredibly large datasets. Service providers and content creators will be able to deliver vast amount of information at a fraction of the traditional cost while increasing security. And the developing world will have resilient access to data, independent of low latency or connectivity to the network.
How district0x uses IPFS
IPFS plays a core role for district0x in 2 ways. It allows us to serve user uploaded files within districts. And it also allows us to serve up our website source code.
While these may seem simple, the importance of distribution and immutability of district0x files on the backend cannot be overstated. IPFS allows for district0x to guarantee our critical files are stored on our designated servers through pinning. Pinning is a process by which a node stores a particular object on the node’s internal storage, thus ensuring that object’s survival.
These files are also backed up by IPFS throughout the network on other nodes. This ensures that if any of our servers were to go down, we (and anyone else), would be able to immediately retrieve the district0x core files from the IPFS network. This is critical in bringing district0x one step closer to operating as a truly decentralized platform. Without a distributed file storage network, we would just be another application with an easily exploitable single point of failure.
Making it possible to download a file from many locations that aren’t managed by one organization…
- Supports a resilient internet. If someone attacks Wikipedia’s web servers or an engineer at Wikipedia makes a big mistake that causes their servers to catch fire, you can still get the same webpages from somewhere else.
- Makes it harder to censor content. Because files on IPFS can come from many places, it’s harder for anyone (whether they’re states, corporations, or someone else) to block things. In 2017, Turkey blocked Wikipedia and Spain blocked access to sites related to the Catalonian independence movement. We hope IPFS can help provide ways to circumvent actions like these when they happen.
- Can speed up the web when you’re far away or disconnected. If you can retrieve a file from someone nearby instead of hundreds or thousands of miles away, you can often get it faster. This is especially valuable if your community is networked locally, but doesn’t have a good connection to the wider internet. (Well-funded organizations with technical expertise do this today by using multiple data centers or CDNs — content distribution networks. IPFS hopes to make this possible for everyone.)
That last point is actually where IPFS gets its name: Inter-Planetary File System. We’re striving to build a system that works across places as disconnected or as far apart as planets. While that’s an idealistic goal, it keeps us working and thinking hard, and most everything we create in pursuit of that goal is also useful here at home.
Links don’t change on IPFS.
What about that link to the aardvark page above? It looked a little unusual:
That jumble of letters after
/ipfs/ is called a content identifier and it’s how IPFS can get content from multiple places.
Traditional URLs and file paths such as…
…identify a file by where it’s located — what computer it’s on and where on that computer’s hard drive it is. That doesn’t work if the file is in many places, though, like your neighbor’s computer and your friend’s across town.
Instead of being location-based, IPFS addresses a file by what’s in it, or by its content. The content identifier above is a cryptographic hash of the content at that address. The hash is unique to the content that it came from, even though it may look short compared to the original content. It also allows you to verify that you got what you asked for — bad actors can’t just hand you content that doesn’t match. (If hashes are new to you, check out the concept guide on hashes for an introduction.)