SPARQL/SELECT

From Wikibooks, open books for an open world
Jump to navigation Jump to search

The SELECT clause consists of 2 or 3 parts.

SELECT  ... query result variables ...
WHERE {
        ... query pattern ...
}
        ... optional query modifiers ...
GROUP BY ...
HAVING ...
ORDER BY ...
LIMIT ...

The first part is the query result variables. These will be shown when the query is executed. The second part is the WHERE clause with the query pattern. This defines the data selection and generates the variables, The last part are the optional modifiers.

Example

SELECT ?child ?childLabel
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Query result variables[edit | edit source]

SELECT ?child ?childLabel ... In the above example ?child and ?childLabel are the variables. The variables are separated by spaces. A variable can be displayed with another name (Alias) by using AS, like (?child AS ?Child_of_Bach). Notice that the alias again should be a variable, and the combination should start and end with brackets (?a AS ?b).
PS. For aliases of Labels the label should be defined explicitly in the SERVICE. Alternatively the variable could be named as requested there as well.

SELECT (?child AS ?Child_of_Bach) (?childLabel AS ?Name)
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?child rdfs:label ?childLabel.
                         }
}

Try it!

Query pattern[edit | edit source]

The query pattern specifies the data selection and generates the variables,

# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.

In this case the triple ?child wdt:p22 wd:Q1339 specifies that the variable ?child must have the parent/father Bach.

Any of the triple parts Subject, Predicate and Object may be variables. This makes this selection very versatile.

Additional triples can be added, for instance to show gender, birth date and date of death. Each sentence should end with a period. The new variable should be added at the top (query result variables) to display them.

SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Optional query modifiers[edit | edit source]

See the chapter modifiers for a full description.

The above query can be sorted by the birth date by adding ORDER BY ?birth_date after the last curly bracket of the WHERE { } clause

SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?birth_date

Try it!

Removing duplicates[edit | edit source]

You might have noticed that the above query resulted in 21 records, while Bach had only 20 children. This is caused by 2 entries of Johann Christoph Friedrich Bach, because there are 2 different birth dates, 21 and 23 of June 1732.

Generally it is advised to use DISTINCT in the result variables like

SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child

Try it!

DISTINCT removes duplicate entries, but in this case that does not help as the entries are different. To be helpful ALL parameters should have been equal.

What helps is to group by child and combine the values of ?birth_date. Also ?date_of_death and ?genderLabel are grouped to be consistent

SELECT ?child ?childLabel 
       (GROUP_CONCAT(DISTINCT ?genderLabel;   SEPARATOR=", ") AS ?genderLabels)
       (GROUP_CONCAT(DISTINCT ?birth_date;    SEPARATOR=", ") AS ?birth_dates)
       (GROUP_CONCAT(DISTINCT ?date_of_death; SEPARATOR=", ") AS ?dates_of_death)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?child  rdfs:label ?childLabel.
                           ?gender rdfs:label ?genderLabel.
                         }
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_dates

Try it!

The optional query modifier used is GROUP BY, and the variables are combined by using (GROUP_CONCAT(DISTINCT ?var1; SEPARATOR=", ") AS ?var2). The ORDER BY has been modified by using the combined variable ?birth_dates, instead of ?birth_date. All labels should be defined explicitly in the SERVICE.

See the chapter modifiers for a full description.

Another way to remove duplicates it so show only one of the possible values, by using one of the Aggregate functions MIN, MAX, SUM or AVG.

SELECT ?child ?childLabel 
       (MIN(?genderLabel)   AS ?genderlabel1) 
       (MIN(?birth_date)    AS ?birth_date1) 
       (MAX(?date_of_death) AS ?date_of_death1)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?child  rdfs:label ?childLabel.
                           ?gender rdfs:label ?genderLabel.
                         }
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_date1

Try it!

The optional query modifier used is also GROUP BY, and the variables are combined by using (MIN(?var) AS ?var1). The ORDER BY has been modified by using the first birth date ?birth_date1, instead of ?birth_date. All labels should also be defined explicitly in the SERVICE.

Adding missing entries[edit | edit source]

Let's try one of the above queries for Obama (Q76):

SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child

Try it!

This results in NO results although Obama has 2 children. The reason is that to match this query, a potential result (a child) must match all the triples we listed: it must have a gender, and a birth data, and a date of death. If one or more of those properties don't exist, it won’t match. And that’s not what we want in this case: we primarily want a list of all the children – if additional data is available, we’d like to include it, but we don’t want that to limit our list of results.

The solution is to tell WDQS that those triples are OPTIONAL:

SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  OPTIONAL{ ?child wdt:P21 ?gender. }
  OPTIONAL{ ?child wdt:P569 ?birth_date. }
  OPTIONAL{ ?child wdt:P570 ?date_of_death. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child

Try it!

Now both children are shown, and it is clear that the date of death is not filled in yet, as these children are still alive.

See the chapter OPTIONAL for a full description.

Fool proof: Add missing and remove duplicate entries[edit | edit source]

A fool proof list of children and their details of any given father would be

SELECT ?child ?childLabel 
       (GROUP_CONCAT(DISTINCT ?genderLabel;   SEPARATOR=", ") AS ?genderLabels)
       (GROUP_CONCAT(DISTINCT ?birth_date;    SEPARATOR=", ") AS ?birth_dates)
       (GROUP_CONCAT(DISTINCT ?date_of_death; SEPARATOR=", ") AS ?dates_of_death)
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  OPTIONAL{ ?child wdt:P21 ?gender. }
  OPTIONAL{ ?child wdt:P569 ?birth_date. }
  OPTIONAL{ ?child wdt:P570 ?date_of_death. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?child  rdfs:label ?childLabel.
                           ?gender rdfs:label ?genderLabel.
                         }
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_dates

Try it!

This combines the OPTIONAL clause in case a property is not present and the GROUP_CONCAT clause in case of multiple properties.