IPFS Whitepaper Draft 3

Note: this paper is quite old and IPFS spec has evolved a lot since then so some of these information might not be up to date. However, it is always interesting to understand what changed and why.

Introduction

  1. There’s no file system nowadays that provides a global, low latency and decentralized distribution.
  2. No current protocol (like HTTP) uses the new file distribution techniques invented in the past 15 years.
  3. Goal: enhance the web we know nowadays without degrading UX.
  4. Explore how the Merkle DAG can be used for high-throughput oriented file systems.

Background

Distributed Hash Tables

IPFS uses a DHT to locate which peers have what content. In common implementations, DHTs would serve as a place to store the data directly. In this case, it stores the peer that has the content.

Block Exchange

IPFS uses BitSwap, a BitTorrent inspired data exchange protocol.

Objects

  • Objects are content addresses just like in Git by their cryptographic hash.
  • Links to other objects are embedded, forming a Merkle DAG (it now uses the IPLD instead!!!).

Self-Certified File Systems

File systems in which their address location self certufues the server.

Design

  • No nodes are privileged
  • Objects are stored in local storage
  • Objects can represent files or other data structures

Stack:

  • Identities
    • each node has a public key
    • peer id = hash(publicKey)
    • public and private key stored encrypted with a passphrase
    • generation based on S/Kademlia
  • Network
    • uses libp2p hence can use any transport protocol
    • reliability on unreliable protocols
  • Routing
  • Exchange
    • blocks can be shared between objects
    • uses BitSwap:
      • each peer has a wantlist and a havelist
      • maximize the trade performance for the node
      • prevent exploitation from freeloaders
  • Objects
    • Uses IPLD
    • Deduplication of blocks
    • Content-addressed
    • Can use different storage backends
    • Pinning –> make sure an object isn’t removed
    • Anyone can publish objects in the DHT
  • Files
    • blob: addressable unit of data, represents a file.
    • list: represents a file composed by other objects, contain a sequence of blocks or lists.
    • tree: represents a directory, maps names to hashes.
    • commit: snapshot in the version history of any object. (is this still up to date?)
    • splitting files into lists
  • Naming
  • (OPINION) “Using IPFS” section: it’s a bit sad to see that most of the points are yet to be feasible nowadays. However, the protocol has evolved a lot in the past months and years.