Research¶

Folders and Labels¶

Folders are exclusive.
Labels are inclusive.
#Hashtags are labels.
Folders form a tree which may be flat.
Labels can form a tree but are otherwise flat.
Folder path: a/b/c
Nested label: a.b.c

Citation Metadata¶

Bibliographic citations can take many forms.

Citations are most useful in a structured form (with a schema).

DCMI
OAI-PMH
Schema.org CreativeWork

Citations in the bibliography or references or resources section of a textual document must be parsed in order to derive a citation graph.

Many impact statistics are derived from graph metricsa according to citation frequency (and, by implication, things like centrality).

See:

Search engines¶

Knowledge Engineering > Search Engine Indexing
Query syntax
Case sensitivity
Unicode symbols (Zero, Zerö, Zerø, Ƶero)
Stemming & Spelling Correction
- “walking” -> walk -> walk, walking, walkers, walked
Fuzzy matching
- ElasticSearch
  - “Typoes and Mispelings” > “Fuzziness” https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzziness.html
    - String distance (hamming distance)
    - Substitution, Insertion, Deletion (see also: Operational Transformation)
Regional language variants
- https://en.wikipedia.org/wiki/American_and_British_English_spelling_differences#-our.2C_-or
  - “Colour”, “Couleur”, and “Color”
- https://en.wikipedia.org/wiki/Romanization
  - “寿司”, “壽司”, and “Sushi”
String prefixes
- Does “Apple” also return e.g. “Grapple”; or just e.g. “apples”, “appleton”, “apple pie”
Stop words
- a, and*, the, or*, not*
Logical Term grouping
- “Quoting”, (Parentheses), Logical terms (Logic)
- “This one” AND “That one”
- “This one” AND (“that one”)
- this one AND that one
- -this one AND that one
- -(“this one”) AND “that one”
- (NOT “this one”) AND (“that one”)
Search algorithms:
- Search Engine Indexing
- Data Structures
- natural language
- Full table scan (match every row every time) [very slow]
- Document-Term graph / tree
  - “index” non-stop words and phrases as graph edges
  - “entity recognition” / “entity extraction” / “phrase extraction”
    - OpenNLP (Java), NLTK (Python), Watson
    - “Mark Twain grew up not in Hannibal, Missouri but in St Louis, Missouri.”
      - grew up
      - Mark Twain (Mark, Twain, Mark Twain)
      - Hannibal
      - Hannibal, Missouri
      - St Louis
      - St Louis, Missouri
  - Manual Index
    - “biased”
    - https://wrdrd.github.io/docs/genindex

Research Tools¶

Mendeley¶

Wikipedia: https://en.wikipedia.org/wiki/Mendeley

Zotero is similar to Mendeley.

Zotero¶

Wikipedia: https://en.wikipedia.org/wiki/Zotero

See:

CKAN¶

Src: https://github.com/ckan

Src: git https://github.com/ckan/ckan

CKAN (Comprehensive Knowledge Archive Network) is an open source web application for cataloging data written in Python.

There are a number of extensions for CKAN: http://extensions.ckan.org/
- ckanext-extractor can automatically extract text and metadata from datasets (including PDF). http://extensions.ckan.org/extension/extractor/
  
  see also:
  - LinkedReproducibility > StudyGraph: Document Nodes and Link Edges
  - OAI-PMH, Fedora Commons
- ckanext-datajson can generate data.gov JSON for datasets: http://extensions.ckan.org/extension/datajson/

DSpace¶

Wikipedia: https://en.wikipedia.org/wiki/DSpace

Homepage: http://www.dspace.org/

DSpace is an open source web application for creative works and their XML metadata written in Java.

DSpace supports OAI-PMH.
DSpace and Fedora Commons are now both part of DuraSpace.

Fedora Commons¶

Wikipedia: https://en.wikipedia.org/wiki/Fedora_Commons
Homepage: http://fedorarepository.org/
Docs: http://fedorarepository.org/features
Docs: https://wiki.duraspace.org/
Docs: https://wiki.duraspace.org/display/FEDORA4x/Fedora+4.x+Documentation

Fedora Commons (Fedora Repository, Fedora) is an open source web application for creative works and their XML metadata written in Java.

http://fedorarepository.org/features
Fedora supports OAI-PMH.
Fedora can index metadata with other search engines (e.g. Solr, ElasticSearch)
There are additional frontend web applications for Fedora:
- Hydra
- Islandora
Fedora Commons is the database for a number of well-known institutional repositories (e.g. book and digital asset library catalogs).

Note

Fedora Commons (“Fedora”, “Fedora Repository”) is distinct from the Fedora Linux operating system.

Fedora Commons is a Java web application which runs in a WAR container on many operating systems.

Hydra¶

Homepage: https://projecthydra.org/
Src: https://github.com/projecthydra
Docs: https://wiki.duraspace.org/display/hydra/The+Hydra+Project

Hydra is an open source web application frontend for Fedora Commons written in Ruby

Solr
Blacklight

Blacklight¶

Homepage: http://projectblacklight.org/
Src: git https://github.com/projectblacklight/blacklight
Docs: https://github.com/projectblacklight/blacklight/wiki

Blacklight is an open source web application written in Ruby for providing a search interface to Solr.

Hydra indexes Fedora Commons metadata with Solr; which can be displayed with Blacklight.

Islandora¶

Homepage: http://islandora.ca/about
Src: https://github.com/Islandora
Docs: http://islandora.ca/documentation

Hydra is an open source web application frontend for Fedora Commons written in PHP

Solr
Drupal (PHP)
Islandora indexes Fedora Commons metadata with Solr; which can be displayed with the Islandora Drupal application.

OAI-PMH¶

Wikipedia: https://en.wikipedia.org/wiki/Protocol_for_Metadata_Harvesting

OAI-PMH (Open Metadata Institute Protocol for Metadata Harvesting) is an XML over HTTP standard for sharing metadata about creative works with Dublin Core (DCMI dcterms) and other schema.

Fedora Commons supports OAI-PMH.
DSpace supports OAI-PMH.

Table of Contents

Previous topic

Next topic

This Page

Source

github.com/wrdrd/docs / master

wrdrd.github.io