Tuesday, April 7, 2009
The Unreasonable Persuasiveness of Prose
Halevy, Norvig, and Pereira's article "The Unreasonable Effectiveness of Data" [PDF] examines whether we need the Semantic Web. The crux of their argument is that in "very large data sources, the data holds a lot of detail."
Sadly, the argument falls apart in the last half of the last sentence on the first page:
Absurdity notwithstanding, there is a subtly important point lurking in there. Stephen Wolfram made it, with uncharacteristic simplicity, in an interview with Rudy Rucker:
Luckily, we don't need to wait that long. To begin with, there's already a great deal of semantic markup out in the wild, as I pointed out in the beginning of this post. Sure, it's not OWL or RDF or whatever hot shit the kids are using these days, but it's semantic nonetheless (as is evidenced by the fact that the folks at Google and Wolfram Science are able to extract semantics from it). What's to keep them, or someone like them, from serializing that information in some Semantic Web form, obviating the need for people to go back and do it themselves, and obviating the need to use some inane UI?
But really, that's only a side point along the way to the main point.
The main point is that the Semantic Web provides a number of interesting starting points for serializing assertions. We probably ought not think of it as an ending point, but as a side-effect: a way of capturing a train of thought. And as we figure out better ways to serialize our own train of thought, one possible side effect of that is computers that mimic us a little more closely.
Assuming of course, that's something we actually want. Personally, I like them how they are.
Sadly, the argument falls apart in the last half of the last sentence on the first page:
"without needing any manually annotated data."Well, that's rich. Google has sucked in petabytes of manually annotated data, and discovered that with all that manually annotated data, they don't need any manually annotated data to derive semantics.
Absurdity notwithstanding, there is a subtly important point lurking in there. Stephen Wolfram made it, with uncharacteristic simplicity, in an interview with Rudy Rucker:
"The problem with the Semantic Web is that the people who post the content are expected to apply the tags"Wolfram and the Google guys make an interesting point. If the Semantic Web were dependent upon a majority of folks using the same formalisms and identically serializing them, it would be a long time before we reaped much benefit. Possibly decades, or more.
Luckily, we don't need to wait that long. To begin with, there's already a great deal of semantic markup out in the wild, as I pointed out in the beginning of this post. Sure, it's not OWL or RDF or whatever hot shit the kids are using these days, but it's semantic nonetheless (as is evidenced by the fact that the folks at Google and Wolfram Science are able to extract semantics from it). What's to keep them, or someone like them, from serializing that information in some Semantic Web form, obviating the need for people to go back and do it themselves, and obviating the need to use some inane UI?
But really, that's only a side point along the way to the main point.
The main point is that the Semantic Web provides a number of interesting starting points for serializing assertions. We probably ought not think of it as an ending point, but as a side-effect: a way of capturing a train of thought. And as we figure out better ways to serialize our own train of thought, one possible side effect of that is computers that mimic us a little more closely.
Assuming of course, that's something we actually want. Personally, I like them how they are.
Subscribe to Posts [Atom]