Let’s describe how you can use the blockchain to store large amounts of data. Cases, examples and infographics in the article.
According to PR Newswire, the cloud storage market will grow from $ 30 billion in 2017 to $ 90 billion by 2022, with an annual growth rate of 24%. This means that this sector will be one of the fastest growing in the global economy and the blockchain will be crucial to its success. And thus why.
Traditional data storage (CRUD)
The easiest way to save information on the Internet is to use cloud services, such as Google Drive, MySQL and MongoDB. In this case, the user gets access to the storages of the company simply or in exchange for a paid subscription. At the same time, the company retains control over the database and provides customers with the right of access to the repositories, as well as is responsible for the safety and security of data.
In practice, it looks like this.
1. You use the desktop or web application to upload data to the company's servers.
2. The company enters the data center.
3. Whenever you want to access the downloaded information, your device sends a request to the data center and it provides access to the information.
This is the standard model that dominates the market. It has two advantages:
● the user can operate with four key functions: Create, Read, Update and Delete - together this is called CRUD;
● user can quickly load and unload data.
Otherwise, such storages are a rather dubious way of storing information, because, as practice shows, they are often hacked, they are unreliable and the information in them is used without the knowledge and consent of the user (in marketing).
That is why the theft of personal data has become the norm in the XXI century. For example, on November 19, 2018, the US authorities published the results of an investigation into the theft of personal data of 500 million customers of the Marriott hotel chain. And no one even is surprised of stealing of piquant photos of celebrities from iCloud.
Besides, the use of traditional storage methods is a direct way to monopolize the market, since the greater the storage capacity, the cheaper the services. And the monopolization of the market, as is known, leads to a decrease in the quality of services and a slowdown in the rate of scientific and technical growth.
Data stores on the blockchain
The blockchain technolodgy is based on a sequence of blocks, each of which carries a certain amount of information. This volume is limited to the blockchain framework. For example, for Bitcoin is 1 MB. The limit shows the maximum file size that can be uploaded to the Bitcoin blockchain. To store information about transactions 1 MB is enough, however, if you want to save an image or video file, you need to look for another solution.
A similar solution could theoretically become the Ethereum blockchain, since it has no restrictions on the block size. But in this case, another problem arises - the excessively high cost of storing information. The fact is that downloading data to the Ethereum blockchain is not free. It is spent on ethereum gas, which costs real money, and the larger the file (data) size, the more this gas spends.
In 2017, it was estimated that downloading 1 KB of information on a blockchain costs users 2 US dollars. It turns out that loading a blockchain, for example, a 600 KB text file will cost users $ 1,200, a 5 GB movie is more than $ 10 million. The price is probably much lower now, as the market has sunk, but it's still transcendental.
Options of using blockchain for data storage
Storing everything in the blockchain
The problem with limiting the maximum block size can be solved in several ways. The simplest of them means:
1. Breaking a file into segments (fragments) whose size is smaller than the block size. Thus, even the largest file can be written on a blockchain with a small block size.
2. Encryption of data in fragments so that only their owner can understand what is written in them. This will allow storing information in an open blockchain and be sure of its confidentiality.
3. Distribution of fragments over the blockchain network. Due to this, the file will be saved unchanged, while at least one user is synchronized with the blockchain.
This approach is borrowed from torrent trackers, but it is not suitable for storing data using the blockchain, even if you remove the transaction fees. There are several reasons for this. First, the information is recorded in the blockchain through transactions, and they require confirmation. A large file may require several thousand transactions, in other words, several hours, or even days.
Secondly, the information in the blockchain is unchanged. Therefore, you cannot delete or modify unnecessary data. All files that have fallen into the network and their variations will remain forever on the blockchain, and theoretically someone will be able to view them sooner or later. For example, when a blockchain loses popularity, then one user will be able to manage it individually, changing the rules of the system as you please.
Thirdly, immutability will lead to another problem - an avalanche-like growth of the blockchain size. If the information cannot be deleted, it will only accumulate, which will eventually make the blockchain size too large for the average user. For example, now the size of the Bitcoin blockchain is 220 GB, Ethereum is 600 GB. This is already too much for smartphones, tablets and much of the laptops.
Summing up, it can be said that storing information directly on the blockchain technology is not the best idea when it comes to big data. This option is suitable only in cases where the amount of information is within a few kilobytes. For example, when it comes to financial transactions, personal data or document flow.
Peer-to-peer File Systems
We are talking about such storage methods as, for example, the Interplanetary File System (IPFS). This blockchain technology is built on the BitTorrent protocol, which involves breaking up files into shards and storing them in multiple instances on the computers of system members.
This approach has several advantages:
● file will be downloaded by users only if it is interesting to someone;
● popular files are downloaded / distributed very quickly;
● data is address dependent, therefore it is impossible to falsify the internal contents of the file;
● this is a peer-to-peer solution.
Among the disadvantages, it can be noted that the file is downloaded to the network only if the user is online and such a system serves only static data. In addition, you can access the file only by knowing the name / path to it.
The blockchain in this scheme is used as an intermediary that binds participants together and is responsible for the authentication and integrity of files. In addition, it can be used to monetize the process: the seed receives money for the distribution of files, peers pay to download them.
Decentralized Cloud Storages
This is, in fact, the usual cloud storage like Dropbox. But the data is not placed on the servers of companies, but on the devices of users who rent them. There are many similar startups, for example Swarm, Storj or Sia.
Using such solutions you do not need to constantly be online to share information with other network members. It’s enough to upload the file to the cloud storage once. Such storages are stable, fast and have huge capacities.
However, they are only suitable for maintaining static data and do not support searching by content. In addition, they are not free because people rent equipment from each other.
If you need to store large amounts of structured information and search for content by request, you should pay attention to NoSQL. SQL is not suitable, because the information in them cannot be distributed into fragments due to limitations of the Brewer’s heuristic theorem. You need to sacrifice availability and consistency to make the database truly distributed.
This is exactly what NoSQL databases do — sacrifice the consistency of blockchain nodes, replacing it with “eventual consistency” (nodes become consistent after a certain time) for accessibility. Based on this approach, many projects have been implemented, for example, RethinkDB, Apache Cassandra, MongoDB and others. And these are excellent cases - fault tolerance, high speed, simple horizontal scalability and support for a plentiful query language.
Truth be told, they have one major drawback - all nodes must trust each other. This is important because if a malicious item appears among the nodes, it will be able to destroy the entire database on its own.
Cases: blockchains to store information
Cloud storage with huge volume and very fast transactions. Built on the RethinkDB cluster and uses NoSQL mechanisms for storing blocks, thanks to which it has high fault tolerance and capacity.
All members of the BigChainDB network are connected to a single cluster and have full rights to record, change and delete information, so this case is not suitable for public solutions. But it can be used for private corporate tasks:
● business and jurisprudence;
● accounting statements;
● asset tracking;
● financial data.
Storj and Sia
These are companies that operate as trading exchange platforms. They promise cheaper, faster and safer storage. However, this does not mean that their services are cheaper than Google, Amazon or DropBox. They just make a profit not only from rental rates, but also from the commissions for conducting transactions related to loading and extracting data.
The work scheme of Storj and Sia is, in fact, the mediation between those who rent hard drives and those who rent them. Blockchain is used for register of transactions, financial calculations and authentication of files in databases. At the same time, user data is stored outside the framework and may be deleted or become unavailable at any time if landowners decide to delete files or force majeure happens on their side.
Nevertheless, the demand for such storage facilities is constantly growing as the market grows and people like the possibility of using a new technology, even if they do not understand much about it.
Another database where people rent hard drives for rent. Its feature is the insurance deposits that landowners must deposit on smart contracts in order to gain access to monetization. Without such a premium, only the client side of the platform is available..
The insurance payment is inside the smart contract until the landowner decides to leave, having fulfilled all his obligations for which he received the money. If the landowner deletes the stored files or simply disappears, the money will be withdrawn from the smart contract and distributed within the system. With this mechanism, the TiesDB team eliminates unserious and unreliable landowners.
Therein, TiesDB is not formally a blockchain, since it is just a decentralized cloud storage that uses smart contracts to motivate and punish participants, as well as store information about rates, mutual settlements of participants and insurance deposits.
Besides, TiesDB uses the new blockchain to reconcile conflicting information about changes to the database and related financial transactions. For example, if product information is stored in TiesDB databases, information on the quantity of goods must also be entered in the database. If this is not done, the buyer may pay for goods that are no longer in stock.
A platform based on the same practices as Storj or Sia. The only difference is in the two details:
● The platform intends to stimulate nodes of medium capacity in order to avoid the threat of centralization on the part of large players and instability on the part of small players.
● The system will try to find nodes for storing data as close as possible to users who rent these nodes. This will increase the speed of loading and unloading, as well as reduce the likelihood of errors during data transfer.
Using these innovations, as well as a unique consensus mechanism that stimulates an increase in online disk space, Filecoin intends to bypass Google and Amazon in terms of storage capacity in the next few years.
The main idea of this solution is the creation of a fully encrypted P2P network, which will be a database for anonymous exchange of information through encrypted layers - like Tor, but for cloud storages. This will be possible thanks to the three Maidsafe elements:
● Self-encryption: data that encrypts itself. When a file is uploaded to the Maidsafe network structure, it is broken into many small fragments that are self-encrypted and distributed throughout the network. In this form, the file becomes unreadable for everyone except the owner.
● Distributed data caching. Data in the SAFE Network will be stored worldwide, and not on servers of a single company or network of companies. This will make the platform autonomous and increase the level of information security.
● Data availability. The network constantly creates and maintains duplicates of all the files it stores. This function leads to redundant information, which should protect it from loss due to the shutdown of individual nodes.
The final word
When using a blockchain for data storage, it is important to remember that current technologies do not allow storing large amounts of information within a chain of blocks. That’s why the blockchain in this industry is used in the form of an intermediary and an account book that monitors compliance with the terms of the transaction for the provision of storage by one person to another.
This means that neither blockchain, nor smart contracts, nor cryptography protect information in decentralized storages. In this case, the information has the same protection as in traditional storages.