Advice for the UNESCO OER Mapping Project
Email written as advice to the UNESCO OER mapping initiative.
This is beginning to read and sound very much like the debates around learning object metadata of the 1990s. I know that approaches such as LRMI represent an improvement in that elements are aligned with schema.org data types. But that said, knowledge of the history would be useful, and I do recommend to people making suggestions to look at IMS and IEEE LOM as well as LRMI.
It would also be helpful (through probably not practical) to review the discussion surrounding these specifications. For example, below, we read a request for "technical requirements for using the material." This is better addressed by describing the resource format and specifications (eg., its mime type) rather than specifying application software. This is because software changes rapidly. Consider the requirements in IMS-LOM documents specifying that a resource is 'best viewed in Internet Explorer 3.0'.
If course, this discussion is centered around OER *repositories* and not only the resources themselves. Consequently, mappings will need to describe repository properties. Consulting OAI or DSpace specifications would be helpful here. Minimally, we would want API specifications for resource creation, reading, update and deletion, as well as classification systems and resource metadata specifications.
All of this is difficult to build from the ground up. It is a discussion that has occupied the field for almost two decades. I am thinking at this point that the OER initiative should be drawing from the experiences of OER repositories and repository indices that already exist. The most useful beginning of a needs project ought most probably to be a summary of the properties of existing repository indices, including the range of resources indexed, metadata fields used, and more.
For those specifically interested in resource metadata, rather than repository profiles, may I recommend my article 'Resource Profiles' http://www.downes.ca/post/41750 (I'm sorry to recommend my own work but it will keep this post a lot shorter). It suggests approaches for the following sorts of metadata:
- first party metadata, which is metadata specifically about the resource itself, eg., technical data, rights metadata, bibliographic data
- second party metadtata (sometimes called 'paradata') related to the use of the metadata, such as ratings, accesses, etc
- third party metadata, such as classifications, educational metadata (including things like curriculum, keywords, etc)etc.
Additionally, readers should take account of the desirability of linked data. For example, the use of strings to represent authors and publishers creates the possibility of ambiguity, error and duplication. Contemporary resource repositories, such as Google Scholar or academia.edu, maintain separate registries of authors, which are linked to resources (JSTOR doesn't, but should, as a search for au:"Stephen Downes" already returns results from a bunch of strangers). It would be worth contemplating linking authors and OERs to additional resources, such as publishers and institutions (many of these are already described by schema.org). Another argument in favour of linked data is that any string data will need to have several properties, including character encoding and language. So it's best to use strings sparingly.
All of the considerations above must also be mapped to a consideration of what people will actually do in the way of creating and using resource metadata. I recall a study by Norm Friesen, for example, examining the use of IEEE-LOM to index learning objects. Though the specification enables detailed educational descriptions, most people used only ten percent of the fields. Much of the metadata available will be minimal. Any mapping will need to contemplate listings using the most basic data: title, link (ie., URI) and description. Any system should attempt to automatically generate metadata (my own website automatically generates image metadata) and make good use of tags.
Also (November 30): With respect to the summary and the map initiative itself, I would like to make one key recommendation: that it be a ‘submit-once’ system.
Typically, OER data owners would employ a form or some interface to deposit content into the database that will eventually be used to produce the map. The result of this approach is that OER data owners must submit separately for each mapping initiative. Eventually they tire of this, and the result is incomplete data.
So I would ask that any such map also *export* its data in a machine-readable format (plain XML will do, as would JSON, an RSS or Atom Extension, or pretty much any structured representation) along with licensing that allows it to be harvested and reused (pick whatever license you want). This would allow an OER data owner to submit *once* and have the data available for any number of maps.
I would also recommend:
- a mechanism that allows the OER data owner to update or edit records already submitted, to they can stay current
- an export mechanism, or a stand-alone record-creator, so an OER data owner can create the structured representation and store it on his or her own website
- a mechanism whereby databases of OER data repository information can publish and harvest each other’s data, thus essentially enabling them to sync records, so all databases will contain all OER information, no matter which database the record was originally added to
In redundancy is reliability. In synchronization is strength. In distribution is durability. In structured representation is stability.
This is beginning to read and sound very much like the debates around learning object metadata of the 1990s. I know that approaches such as LRMI represent an improvement in that elements are aligned with schema.org data types. But that said, knowledge of the history would be useful, and I do recommend to people making suggestions to look at IMS and IEEE LOM as well as LRMI.
It would also be helpful (through probably not practical) to review the discussion surrounding these specifications. For example, below, we read a request for "technical requirements for using the material." This is better addressed by describing the resource format and specifications (eg., its mime type) rather than specifying application software. This is because software changes rapidly. Consider the requirements in IMS-LOM documents specifying that a resource is 'best viewed in Internet Explorer 3.0'.
If course, this discussion is centered around OER *repositories* and not only the resources themselves. Consequently, mappings will need to describe repository properties. Consulting OAI or DSpace specifications would be helpful here. Minimally, we would want API specifications for resource creation, reading, update and deletion, as well as classification systems and resource metadata specifications.
All of this is difficult to build from the ground up. It is a discussion that has occupied the field for almost two decades. I am thinking at this point that the OER initiative should be drawing from the experiences of OER repositories and repository indices that already exist. The most useful beginning of a needs project ought most probably to be a summary of the properties of existing repository indices, including the range of resources indexed, metadata fields used, and more.
For those specifically interested in resource metadata, rather than repository profiles, may I recommend my article 'Resource Profiles' http://www.downes.ca/post/41750 (I'm sorry to recommend my own work but it will keep this post a lot shorter). It suggests approaches for the following sorts of metadata:
- first party metadata, which is metadata specifically about the resource itself, eg., technical data, rights metadata, bibliographic data
- second party metadtata (sometimes called 'paradata') related to the use of the metadata, such as ratings, accesses, etc
- third party metadata, such as classifications, educational metadata (including things like curriculum, keywords, etc)etc.
Additionally, readers should take account of the desirability of linked data. For example, the use of strings to represent authors and publishers creates the possibility of ambiguity, error and duplication. Contemporary resource repositories, such as Google Scholar or academia.edu, maintain separate registries of authors, which are linked to resources (JSTOR doesn't, but should, as a search for au:"Stephen Downes" already returns results from a bunch of strangers). It would be worth contemplating linking authors and OERs to additional resources, such as publishers and institutions (many of these are already described by schema.org). Another argument in favour of linked data is that any string data will need to have several properties, including character encoding and language. So it's best to use strings sparingly.
All of the considerations above must also be mapped to a consideration of what people will actually do in the way of creating and using resource metadata. I recall a study by Norm Friesen, for example, examining the use of IEEE-LOM to index learning objects. Though the specification enables detailed educational descriptions, most people used only ten percent of the fields. Much of the metadata available will be minimal. Any mapping will need to contemplate listings using the most basic data: title, link (ie., URI) and description. Any system should attempt to automatically generate metadata (my own website automatically generates image metadata) and make good use of tags.
Also (November 30): With respect to the summary and the map initiative itself, I would like to make one key recommendation: that it be a ‘submit-once’ system.
Typically, OER data owners would employ a form or some interface to deposit content into the database that will eventually be used to produce the map. The result of this approach is that OER data owners must submit separately for each mapping initiative. Eventually they tire of this, and the result is incomplete data.
So I would ask that any such map also *export* its data in a machine-readable format (plain XML will do, as would JSON, an RSS or Atom Extension, or pretty much any structured representation) along with licensing that allows it to be harvested and reused (pick whatever license you want). This would allow an OER data owner to submit *once* and have the data available for any number of maps.
I would also recommend:
- a mechanism that allows the OER data owner to update or edit records already submitted, to they can stay current
- an export mechanism, or a stand-alone record-creator, so an OER data owner can create the structured representation and store it on his or her own website
- a mechanism whereby databases of OER data repository information can publish and harvest each other’s data, thus essentially enabling them to sync records, so all databases will contain all OER information, no matter which database the record was originally added to
In redundancy is reliability. In synchronization is strength. In distribution is durability. In structured representation is stability.
Comments
Post a Comment
Your comments will be moderated. Sorry, but it's not a nice world out there.