Mar
21
Data (r)evolution
Filed Under New models, Social web, Web/Tech, Events
Hank William’s presentation at BarCampNYC3 last weekend was the most memorable as well as useful for what I am currently thinking about. Here is a reconstructed transcript from my notes taken during his talk. Any confusion or mis-representations are mine.
In my work I see the same problem solved again and again, focused on number of different things – this issue interesting – how we store data, how do we represent the data? Relational databases don’t look the way our data logic works, if you look at web 2.0 applications and how people are using them.
Relational databases are not good for knowledge as information over time cannot easily evolve to have a structure that didn’t exist when we conceived our application. For example, accounting system, you can design such an application and it’s going to stay that way. With web 2.0 you don’t know when you start what your business model will be and that makes such approach very limiting. Over the last few decades, technologically we squeezed every ounce of performance from technology. The main problem is that they are just very inflexible and brittle!
Why do relational databases have this problem? When you create it, a record is just a collection of information, contact record – first name, last name - you can have thousands of them. But if I want to connect two records, for example, an invoice (name client, field products etc) and another record (I want to track of the fact that if was or not paid), I am going to have keep a field (name of the company) and two tables (a record and a check) defined within that record from the start. So if I want to do something later, authorisation for example, another record, and if I were to relate the invoice to authorisation I have to point the check and relate it to that.
Basically, every time I want to create a relationship between two objects I have to modify the record. That is bad, it means that no object in a system can be stable and every time a new relationship I have to add something to the object in question. When you create your system, you have decide how you want the system to be. When you have 100,000 records and want to change something or evolve, this is a problem.
My favourite comparison of this kind of approach: a woman is having a child, a doctor walks in and says: “Before you give birth, can me give you the name of every friend your child will have or has in the life and by the way, we have got five minutes.
That’s what a relational database really requires. Once you have it set up, you can’t just start doing something even if it makes sense. That’s the problem.
The solution is social graph, a concept that Facebook made famous. You can have multiple objects - these could be invoices, or contacts on Facebook, they could be anything - and the graph is the connections between them. The great thing is that you don’t worry about what this is in order to create a connection. You don’t need to modify anything to create a connection. Every object stands on its own.
This is a fairly radical concept in data management, as we can not only connect these objects, we can also say ‘what’s the relationship?’. Directional (husband, wife) (friend to) if you can imagine all your data with relationships where you can connect the objects on the fly, it radically changes the nature of web 2.0 application.
You can also have another object, not just a contact, a restaurant, e.g. Nobu and the relationship is favourite. You create a new thing called restaurant and immediately we can connect John with Nobu with a relationship favourite. To do this in relational DB you’d have to create a new table and modify John so he has favourite restaurant field, and three more slots to John. It would be complicated and that’s not workable on the web as these are the kinds of relationships that we want to represent.
We always think about how to connect stuff in Web 2.0, this is the way this stuff works, something that’s connected by default. If I want to have an app mapping my restaurants I’d create a new information silo.
How do you connect data of disparate types – pictures in Flickr to a record in Facebook, the idea is to manage these relationships across the whole web. The concept of semantic web is great but I think as it’s right now is FAIL, it’s too complicated, not the way people do things.
Kloudshare – the idea is to be able to store data in the cloud, do relationship search, to query the graph and be able to access it from your web applications. Not to have to set up MySQL server.
There are other issues about social graph, it doesn’t map very well yet although the tools out there So far I don’t have the sense that there are off-the-shelf tools for scaling this stuff. But the cool thing about graph is that it allows for different types of user interface. You can actually create a bunch of user interfaces – a business card – you can represent it and the relationship between various business cards so you can see the graph. I can explore each item on the graph, look at Nobu and see all people who thought Nobu was great etc.
The idea is that from the UI perspective, it is a very simple unified web where you can look at relationship of any object and opportunity to think about data and knowledge and how are things related. It is a profound thing that any piece of information that you have you can stand from that point of information and see the whole universe…
Would love to know more about Kloudshare, sadly nothing came up when I googled it.
Comments
One Response to “Data (r)evolution”
Leave a Reply


Adriana, all the comments you made are 100%. The RDBMS is one of the reasons for enourmous unnecessary complexity exiting in IT. The stronghold of Oracles, MS, IBM on the dbms market is vast. Everytime anybody sells a new software the first question being asked is it running RDBMS from Oracle, MS IBM. RDBMS is opimised for rapid flow of transactions like in a bank, where the semantics/knowledge that pertains to this information is fixed, we have now moved to a world where there is rapid movement of knowledge. This is a huge problem for transactional systems (read from Oracle write to oracle DB) That is because any knowledge/semantics change requires reprogramming. Dr. Paul Horn from IBM has predicted that the biggest challenge in IT will be complexity - 200 million people working in IT (unnecessarily doing the same things over and over). RDBMS is that significant component of the problem. So, I am glad dissention is starting to spread about RDBMS systems.
We at thoughtexpress have been working for the last 10 years on entirely different semantic/syntactic model to understand where complexity comes from and have a new system, something that semantic web tries to achive by will never do as it has incorrect forms of expression. Any that is enough from me.
Pawel