Skip to content
gabriel-straub edited this page Sep 13, 2017 · 19 revisions

Except for uri, all of the below parameters can be specified either at the root (to query everything) or within a specific partition or collection in order to fix the scope.

ParameterStatusProposed by?Description
uriLivePerform an identifier lookup, results in a 30x redirect to the item if found
qLiveLocate items containing the specified text
limitLiveLimit the resultset to n items
offsetLiveReturn results starting at item #n
classLiveRestrict results to those having the specified class URI
mediaLiveRestrict results to those Creative Works and Concepts which have associated media of the specified class (any, collection, dataset, video, image, interactive, software, audio, text, or class URI)
typeLiveRestrict results to those Creative Works and Concepts which have associated media delivered as the specified MIME type (e.g., text/html, audio/mp4)
forLiveInclude media whose restricted-audience URI matches the given URI
scoreLiveSet the minimum prominence score that items must have to appear in results
modeLiveSet to autocomplete in order to perform stem matching
langLiveWhen performing text-based queries, specify the language of the search terms (e.g., cy-GB)
aboutProposedMMRestrict results to those items which have one or more of the specified concept URIs as a topic
duration-min, duration-maxDevCovaticRestrict results to works with media whose duration matches the specified range (either bound is optional), in seconds
dateProposedRestrict results to (a) events occurring on the specified date; and (b) works with media which has a publication/broadcast on the specified date
similarProposedGSRestrict results to those items within a certain (optionally-specifiable) distance of the n-dimensional coordinates of the specified item(s)

Future work for the Datalab graph:

Strengthen the graph (make it more usable)

  • Put on stable platform
  • Proper ETLs to move the data
  • Get data from the authoritative sources (rather than some of the short cuts we have used to date)
  • Make relationships (between content) queriable
  • Make it easy to mass extract for analytics and machine learning (e.g. all content names and descriptions)
  • Better search against the content
  • Different access requirements for different data

Widen the graph (add more content)

  • Long form articles (news and sports)
  • Interactive
  • Bitesize
  • Taster
  • Recipes
  • Weather

Deepen the graph (add more data for the content)

  • Channel

Ought to be present in the data, indexing/query TBC

  • Screening times

Present but not meaningfully indexed (broadcast events are first-order entities)

  • Existing tags (as already in the system)

Straightforward

  • ML based descriptors (with confidence)

Named graphs with their own attributes → index confidence factors

  • Key people (director, actors, etc)

As with existing tags

List of example requests we want to be able to run against the graph:

  • Give me … pieces of content of a specific type with a specific length that cover these topics …

limit, type, duration-min & duration-max, about

  • Give me … pieces of content of a specific type with a specific length that are similar to these pieces of content…

limit, type, duration-min & duration-max, similar (see note regarding similar above)

  • Tell me how much content we have on …
  • Tell me how much content we have on … of length … that was created before …
  • Tell me all the names and descriptors of news articles that were created since …
  • Tell me the average length of content on … and how that compares based on which year it was created in
  • Tell me how many minutes of total content we have on …
  • Give me all the descriptions used for content that was created in the last … months