The core issue as I see it is this: To what extent should stellar accommodate on-chain data storage? Right now we’re very conservative, but I can certainly see a situation where we add larger memo types. I myself would love to see native multihash support.
On-chain data, whether it be in memos to annotate transactions or or in data values associated with an account, has a cost associated with it: Larger memos in the transaction will slow transaction application and will expand the size of the txhistory
table and the size of the history archives. Likewise, larger account data values put pressure on the accountdata
table. Here’s the key takeaway IMO: scalability questions such as these will always be about balancing tensions and I don’t think simply looking to what other systems are doing will help us to answer the question… they have different systems with different goals and constraints. Simply saying “Factom does 10k” isn’t valid to me.
Perhaps we can push the discussion along by discussing the three potential stances our design can take with respect to on-chain user customizable data fields:
- Minimize size to increase throughput while still enabling links to off-chain data
- Maximize size to increase the utility of on-chain data
- Balance size and throughput to enable both on-chain utility (beyond simply storing off-chain links)
Our present stance is number 1, and I personally believe it is correct. I think that data values and fields should only expand to better support links to off-chain data, such as larger hashes. Larger transactions will beget larger transaction fees and since we’re aiming to increase financial access I think we should make sure to keep these as low as possible.
Maximizing size for utility (Stance 2) is a slippery slope, as mentioned. First it’s 10k for some JSON, then it’s a couple megs for a PDF, then people want to deliver video files via the blockchain. IMO it will lead to constant discussions about expanding the size as people come up with new and clever ways in how to abuse the ledger as a general DB. As hackers, we’re always trying to stretch limits and avoid writing any code that we don’t have to. Stance 2 is simply too easy to abuse, IMO.
Stance 3 is the toughest to navigate: What is the right size to balance the concerns involved? What metrics should we use to decide what is too large or what is too small? How many operations per second are enough? Do we expand our sizes in concert with increases in hardware and network performance?
Personally, I think we should enable easy off-chain data retrieval via horizon or some other ecosystem service outside of stellar-core. This will allow us to keep Stance 1, optimizing for scalability and throughput on-chain while allowing ease of integration with off-chain data. It’ll also let us avoid revisiting this discussion every N months in perpetuity; Choosing to support larger hash values is a much simpler discussion than deciding on how much JSON is too much JSON.
All this is just IMO, of course.