LP Login

Think Big. Move Fast.

As more user generated content floods the web, I’ve been thinking about how to draw more meaning from the content, and the idea that Meaning = Data + Structure. A number of readers commented on my previous post, about user generated structure. They point out that one of the challenges of relying on this approach is finding the right incentives to get users to do the work. I’m inclined to agree. I think user generated structure will be part of the solution, but it probably won’t be the while solution – it won’t be complete enough.

If people won’t do the work, perhaps you can get computers to do it. Is there a way to teach a computer to algorithmically “read” natural language documents (ie web pages), understand them, and apply metadata and structure to those documents? Trying to do this on a web wide basis rapidly gets circular – since this ability is exactly what we need for a computer to comprehend meaning, if it existed then you don’t need the structure in the first place. The structure is our hack to get there!

All is not lost though. In the grand tradition of mathematics and computer science, when faced with a difficult problem, you can usually solve an easier problem, declare victory and go home. In this case, the easier problem is to try to infer structure from unstructured data confined to a single domain. This substantially constrains the complexity of the problem. Alex Iskold has been advocating this approach to the semantic web.

Books, people, recipes, movies are all examples of nouns. The things that we do on the web around these nouns, such as looking up similar books, finding more people who work for the same company, getting more recipes from the same chef and looking up pictures of movie stars, are similar to verbs in everyday language. These are contextual actuals that are based on the understanding of the noun.

What if semantic applications hard-wired understanding and recognition of the nouns and then also hard-wired the verbs that make sense? We are actually well on our way doing just that. Vertical search engines like Spock, Retrevo, ZoomInfo, the page annotating technology from Clear Forrest, Dapper, and the Map+ extension for Firefox are just a few examples of top-down semantic web services.

Take people search as an example. By only worrying about information about people on the internet, people search engines can look for specific attributes of people (e.g. age, gender, location, occupation, schools, etc) and parse semi-structured web pages about people (e.g. social network profiles, people directories, company “about us” pages, press releases, news articles etc) to create structured information about those people. Perhaps more importantly though, it does NOT have to look for attributes that do not apply to people (e.g. capital city, manufacturer, terroir, ingredients, melting point, prime factors etc). By ignoring these attributes and concentrating on only a smaller set, the computational problem is made substantially simpler.

As an example, look at the fairly detailed data (not all of it correct!) available about me on Spock, Rapleaf, Wink and Zoominfo. Zoominfo in particular has done a great job on this, pulling data from 150 different web references to compile an impressively complete summary:

Zoominfo page on Jeremy liew

Companies with a lot of user generated content can benefit from inferring structure from the unstructured data supplied by their users. In fact, since they don’t need to build a crawler to index the web, they have a much simpler technical problem to solve than do vertical search engines. They only need to focus on the problems of inferring structure.

Many social media sites focus on a single topic (e.g. Flixster [a Lightspeed portfolio company] on movies, TV.com on TV, iLike on music, Yelp on local businesses, etc) and they can either build or borrow an ontology into which they can map their UGC.

Take the example of movies. A lot of structured data for movies already exists (e.g. actors, directors, plot summaries etc) but even more can be inferred. But by knowing something about movies, you could infer (from textual analysis of reviews) additional elements of structured data such as set location (London, San Francisco, Rwanda) or characteristics (quirky, independent, sad).

In addition to search, inferred structure to data can also be used for discovery. Monitor110 and Clearforest are two companies that are adding structure to news data (specifically, business news data) to unearth competitive business intelligence and investment ideas. By knowing some of the relationships between companies (supplier, competitor etc) and their products, and by analyzing news and blogs, Monitor110 and Clearforest can highlight events that may have a potential impact on a particular company or stock.

The common criticism leveled against this approach is that it is insufficient to handle the complexity of an interrelated world. Arnold Schwarzenegger for example is a person, a politician, an actor, a producer, an athlete and a sports award winner as the excerpt from Freebase below shows:

Schwartenegger Freebase screenshot

Confining an ontology to a single domain, such as movies in the example above, would mean that you are unable to answer questions such as “What Oscar nominated films have starred Governors of California?”.

This is a problem depending on whether you believe search is orienteering or of it is teleportating:

Teleporting means trying to get to the desired item in a single jump. In this study it almost always involves a keyword search. Orienteering means taking many small steps–and making local, situated decisions–to reach the desired item.

Teleportation requires a universal ontology. With Orienteering, local ontologies with some loose level of cross linking is enough. I suspect that we’re in an orienteering driven search world for the foreseeable future, and that local solutions for specific domains will provide sufficient benefit to flourish. Adaptive Blue and Radar’s Twine are two early examples of products that take this approach. Radar’s CEO,Nova Spivack, talked to Venturebeat recently in some depth on this topic.

Once again, would love to hear more from readers.

  • http://www.adaptiveblue.com Alex Iskold

    Hi Jeremy,

    I agree that narrow solutions are going to win, at least in the long term and I think that this is a good thing. If we start with the goal of annotating entire body of the web with semantics we are stuck with Herculian task.

    Taking a step back and thinking what is actually useful, leads us away and into the verticals: books, movies, stocks, recipes, restaurants, people, companies, countries – the basic stuff.

    So, as you are pointing out, we can solve these simpler problems first, deliver end user value, learn a ton from our mistakes, and then iterate to a broader solution.

  • Pingback: Meaning = Data + Structure « Lightspeed Venture Partners Blog

  • http://www.blist.com/blog/ mathew johnson

    Hi Jeremy,

    You characterize the chicken and egg problem of building apps with enough day 1 me-value to encourage people to start adding the meta information that will ultimately benefit the network of users.

    One way we are looking at things at blist is amplifying the small and precious amount of UGC structure that you are able to generate and syndicating it out to similar data elsewhere.

    As is abundantly clear to everyone, the velocity of new structure creation is much lower than the velocity of new content creation, and so it’s vital to build the the biggest multiplier possible to catch structure up to data.

    Mathew Johnson

  • http://me.dium.com Jud Valeski

    What a fun series of posts. Thanks for putting the energy into these! UGS is a fascinating topic and I was compelled to describe how my company, Me.dium, is approaching the problem with a blend of the human mind and machines.

    In short, asking users to do anything, as you point out, is a major uphill battle. Machines aren’t smart enough to do what our brains can do, so me.dium is passively leveraging the human mind, our behaviors, and content to get people the information they seek.

    I go into more detail in my blog post at http://me.dium.com/node/1026

  • http://www.altosresearch.com/blog mike simonsen

    Love the orienteering metaphor. I’d extend it to the ontology itself. The really useful applications are driven by people orienteering their way through the structure as well. It turns out in practice that human-powered domain expertise is required to get the thing off the ground.

    if you retitled the post as “Value = data + structure” (where meaning implies value), and the data portion of the equation is the given (UG or otherwise), then structure is the value add. That’s the secret sauce. And it doesn’t come from automated semantic recognition. It comes from application design insight and sparks of great marketing.

    Further, in this vertical world, domain expertise is critical for monetization (value begets revenue). Because the money answers aren’t self-evident in the data. Automating semantic analysis to /discover/ meaning is like automating mining before you know there’s value in diamonds. You can find X, Y, and Z minerals, their relationships to one another, their mass, and composition. But you can’t infer that my fiancee doesn’t want to put a chunk of quartz on her finger.

    It turns out to be self reinforcing too. You get customers in the vertical, they ask really smart questions that you never thought about and you go create the structure to discover the answers in the data.

    The framework looks something like this:
    expertise -> ontology -> data -> meaning -> repeat.

    To invert that process is really bad science ;-) You don’t create the hypothesis to fit the data, you hypothesize first.

    But we all start out with the grand visions of cross domain applications. Then you wake up one day and your elegant open platform has transformed into this rigid, structured beast that knows only movies, or people, or in our case real estate. Then you lean back and catch a few more z’s secure in the fact that same platform is making you money.

    Or maybe that’s entirely your point here…

  • mylaine

    Love the orienteering metaphor. I’d extend it to the ontology itself. The really useful applications are driven by people orienteering their way through the structure as well. It turns out in practice that human-powered domain expertise is required to get the thing off the ground.

    if you retitled the post as “Value = data + structure” (where meaning implies value), and the data portion of the equation is the given (UG or otherwise), then structure is the value add. That’s the secret sauce. And it doesn’t come from automated semantic recognition. It comes from application design insight and sparks of great marketing.

    Further, in this vertical world, domain expertise is critical for monetization (value begets revenue). Because the money answers aren’t self-evident in the data. Automating semantic analysis to /discover/ meaning is like automating mining before you know there’s value in diamonds. You can find X, Y, and Z minerals, their relationships to one another, their mass, and composition. But you can’t infer that my fiancee

  • Pingback: Meaning = data + Structure: More thoughts on user generated structure « Lightspeed Venture Partners Blog

  • Pingback: VentureBeat » Predictions for the consumer internet in 2008

  • Pingback: 2008 Consumer Internet Predictions « Lightspeed Venture Partners Blog

  • Pingback: 網絡集錦 « Alan Poon’s Blog

  • Pingback: Semantic web in travel « Lightspeed Venture Partners Blog

  • Pingback: Innovablog > Le Web Sémantique : Où sont les outils de création de contenu riche ?

  • Pingback: Kango Blog » Blog Archive » The Blogosphere Speaks Out About Kango-A Wrap Up

  • http://www.tablefy.com elda

    Hi Jeremy
    I stumbled upon this website and thought I that I would like to let you know that there is a startup that does this, introducing structure in the data that users generate.
    Check out http://www.tablefy.com, the founder meticulously micro blog his progress in promoting it, perhaps you can help him?
    https://twitter.com/tablefy
    here is a visual tour of the site: http://www.tablefy.com/pages/tour

  • http://www.tablefy.com elda

    argh i accidentally pressed post,
    i was gonna say, the data that the users generate in tablefy.com is inherently structured in a tabular way. Some call it data sheets, some comparison table but whatever it is, the structure is there.

    Enjoy

  • http://www.travelosity.us travelosity

    looking for flight deals offers from newyork to chaicago, i have found this website very intersting with great offers