Google Query Language

I call it the Google Query Language, GQL – advanced methods to treat the Web as giant database and Google as its SQL.

Here are some examples. Consider the following query:

“name is * * I (am | m) (a | an) *”

It doesn’t look like a basic Google search but it is, and can be entered into Google, or submitted as parameter via the Google Web API. What does it do? It looks for patterns like the following:

“My name is John Doe and I’m an artist”

The sequence is fixed but allows alternatives: if you use an asterisk, any word is allowed (“Name is *” would allow “Name is John” and “Name is Mary”). And when you use round brackets to enclose two words (separated by “|”), you allow a limited defined amount of words. For example in above query we allow “I am an”, “I’m a”, and so on.

The trick with queries like these is to stay below 10 words – and even less if you want to allow additional user input. However the asterisk wildchard does not count against the limit and should therefore be used in-between words to keep the amount as low as possible.

So let’s count above query; what he have is…

“name [1] is [2] * * I [3] (am [4] | m [5]) (a [6] | an [7]) *”

seven words that count against the Google query limit. So we allow for 3 more words to add.
We can try to add a variable word, e.g. “hard-working male”, or “artist” to then grab what’s filled in the asterisk in the Google snippet. So if we query:

“name is * * I (am | m) (a | an) artist”

… we could grab “John Doe” (if that would appear in the Google result). So this way we could write a search engine that lists all artists online. Note that this is a simplified example. GQL is a soft science and results are not as clear-defined as in a well laid-out database. However it’s also infinitely more pragmatic and might work much more reliable much more faster than any Semantic Web/ RDF/ XML approach

Here is another example. How do we grab things, like a list of films, via GQL? We can use the following:

  • “my favorite COMIC is *”
  • “my favorite FILM is *”
  • “my favorite SONG is *”
  • “my favorite POPSTAR is *”
  • “my favorite SINGER is * *”
  • “my favorite BOOK is *”
  • “my favorite ACTOR is * *”
  • “my favorite ACTRESS is * *”
  • “my favorite CAR is *”

(The upper-case word can be written in lower-case and should just emphasize the variable.) You get the picture, what we will get from the Google result snippet would be the defined asterisk. It would make for a list of films. We could even go as far as to analyze all results, fill our database, and thus get the most beloved movies/ books/ actors/ comics … of all time.

Another example: we want to find couples. Things that by their nature belong together, or are somehow connected. One GQL variant of this would be:

“i like * and *” -”and i”

If you find other successfull GQL samples, let me know! One way to add some GQL spice is to use the synonym operator (a tilde “~”) preceding words.

Google’s index is still growing. For GQL to perform better, it should. The more “natural-language babbling”* we have the more GQL works to our success.

*I call it babbling because the approach allows for nonsensical/ wrong statements being made, as long as the overall amount of statements is large enough to let the right ones shine through. For one person saying “Citizen Kane stinks” we have 10,000 saying “Citizen Kane is great”. For one person saying “1 + 1 = 3″, we have 10,000 saying “1 + 1 = 2″ (if the “Web database” is big enough, that is).
Also, “babbling” because the more mundane the statement seems the more precious it is as information atom – e.g. who would ever write in his blog “a bicycle has two wheels”? And yet it is exactly those sentences which will be written when a lot is written, by a lot of people, and every day. Those statements help a machine to construct more complex knowledge. Even to discover its own theories and make statements outside of what is known online.

FindForward.com is showing some of the above, but often, both the amount of data as well as the 10-word limit of Google make it hard. We can look for well-know things, but there are not enough people so that every imaginable topic is covered. I believe if the Web grows the factor of about 1000 its current size, and if Google can index it all, we can see a lot of interesting tools to discover “world consciousness” – or to put it in simpler form, search engines will be able to answer search requests. In several years, a search engine will then not be listing result pages – it will talk to the searcher:

“You are interested in this movie? Well, so are other people, like John, who has a personal fan site on it, which a lot of people link to. The official homepage is here, and contains trailers. A nice, though somewhat longwinded review can be found at moviereviews.com. The movie’s director is … and you can reach him here. Or if you just want to buy the DVD, the cheapest place you can get it is for $12 at …”

Source: blog.outer-court




Leave a Comment

You must be logged in to post a comment.

Google