LinkedReproducibility¶
#LinkedReproducibility
Knowledge Engineering > Linked Data
Units > RDF and Units [QUDT,]
ELI5: Our data is probably already aware of a cure.
Cure: Win/Win solution
Note
This page (linkedreproducibility.rst
) is licensed with
CC0 1.0
(Public Domain).
Please do implement these ideas and specifications.
StudyGraph: Document Nodes and Link Edges¶
We should use annotations with typed, reified edges to link between various studies with comparable and incomparable analyses. (e.g. OpenAnnotation OA RDF OWL with more data than threaded comments).
PDFs are the de-facto standard for scientific publishing
Many journals also / instead request HTML
PLoS, for example
Then, a “study” is a document
title – schema.org/name
authors – lead first (usu.)
schema.org/TODO – author, creator, contributor
abstract
tags/labels/keywords are edges to a tag/label/keyword node
hierarchical
MESH
PyPI Trove Classifiers
folksonomy
tags
tags often require deduplication / part of speech normalization / de-pluralization / etc
Concept URIs
wikipedia, dbpedia, wikidata, etc
links to Linked Data
What we lack are structured edges/relations between the actual studies
ex:confirms, ex:seemsToConfirm -> ex:confirmatoryEdge
[strength of association [“magnitude”]]
ex:reproduces, ex:seemsToReproduce -> ex:reproducibilityEdge
ex:refutes
ex:disproves
TODO: see the list i brainstormed -
frameworks for edges:
OpenCog AtomSpace (strength of association)
Develop best practices guidelines and and/or an RDF schema and vocabulary (”
repro:
) for linking between studies, their supporting data, and their collection methods with URIs.developing vocabularies:
[ ] Schema.org extension vocabularies
linked reproduciblity edges:
similarTo
concursWith
discordantWith
intendedToReproduce
reproduces
linked reproducibility classes and properties:
[x] schema.org/MedicalStudy, MedicalObservationalStudy, MedicalTrial
See: https://westurner.github.io/opengov/us/#personal-health-agenda
TODO: - pandas 3402 -
StructuredPremises: Premises as structured data¶
And then URIs for controls / study design
see schema.org/MedicalTrialDesign
[ ] these could/should be extended to all of science
logical premises (sequence of propositions)
i/o sequences
nbformat (IPython / Jupyter notebook format)
insufficient because we need stable premise permalinks (across versioned publishing URIs)
#premise-1
#premise-abc398f
conclusions (derivations)
this is a computation graph
it should have links (edges) to the datasets
“ENH: Linked Datasets (RDF)” https://github.com/pydata/pandas/issues/3402
figures should have links (edges) to the datasets
permalinks to premises
#TenSimpleRules for Reproducibile Computational Research | http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003285 | https://wrdrd.github.io/docs/consulting/data-science#tensimplerules-for-reproducible-computational-research
further questions
“downstream” studies / implementations
retraction management
decisions / policy predicated on said conclusions
LinkedMetaAnalyses¶
You evaluated 10, I evaluated (the same / a different) 10 studies
PRISMA
evaluation of controls
“the URI says they did a Triple Blind Study, but it doesn’t sound like they had groups named just e.g. X, Y, and Z”
disqualified / questionable / etc
schema.org/MedicalTrial -> schema.org/ScientificTrial
C = Class (RDFS)
P = Property (RDFS)
-
[ ] C: MetaAnalysis
[ ] C: CriteriaBase type
[ ] C: Criterion
[ ] C: ScientificStudy
[x] C: MedicalStudy
[ ] C: MedicalObservationalStudy <- ScientificObservationalStudy
[ ] C: MedicalTrial <- ScientificTrial
[x] C: Dataset
When do we show?
Deadline
Only if you also produce your own meta-analyses
Only if we’re doing Open Access (as required by stipulations of federal funding)
RDF Example¶
Linked Data + Reproducibility => Linked Reproducibility
Reproducibility ---\___ Linked Reproducibility
Linked Data ---/
:LinkedData rdf:type skos:Concept ;
rdfs:label "Linked Data"@en ;
schema:name "Linked Data"@en ;
owl:sameAs <https://en.wikipedia.org/wiki/Linked_data> ;
owl:sameAs <http://dbpedia.org/page/Linked_data> ;
owl:sameAs <http://ja.dbpedia.org/resource/Linked_data>
owl:sameAs <http://es.dbpedia.org/resource/Datos_enlazados> ;
owl:sameAs <http://fr.dbpedia.org/resource/Web_des_donn%C3%A9es> ;
owl:sameAs <http://nl.dbpedia.org/resource/Linked_data>
owl:sameAs <http://ko.dbpedia.org/resource/링크드_데이터> ;
owl:sameAs <http://wikidata.org/entity/Q515701> ;
.
:Reproducibility a skos:Concept ;
rdfs:label "Reproducibility"@en ;
schema:name "Reproducibility"@en ;
owl:sameAs <https://en.wikipedia.org/wiki/Reproducibility> ;
owl:sameAs <http://dbpedia.org/page/Reproducibility> ;
.
:LinkedReproducibility a skos:Concept ;
rdfs:label "Linked Reproducibility"@en ;
schema:name "Linked Reproducibility"@en ;
skos:related [ :LinkedData, :Reproducibility ] ;
.
CSV, CSVW, and metadata rows¶
A classic data table with 1 metadata header row (column label):
column label |
sample |
date |
width |
height |
---|---|---|---|---|
1 |
2016-06-19T06:28:49-0500 |
20.0 |
30.0 |
|
2 |
2016-06-19T06:29:22-0500 |
40.0 |
50.0 |
|
3 |
2016-06-19T06:29:48-0500 |
60.0 |
70.0 |
A data table with 7 metadata header rows (column label, property URI path, DataType, unit, accuracy, precision, significant figures):
column label |
sample |
date |
width |
height |
---|---|---|---|---|
property URI path |
||||
Unit |
unit:Meter |
unit:Meter |
||
accuracy |
||||
precision |
||||
significant figures |
*.1 |
*.1 |
||
1 |
2016-06-19T06:28:49-0500 |
20.0 |
30.0 |
|
2 |
2016-06-19T06:29:22-0500 |
40.0 |
50.0 |
|
3 |
2016-06-19T06:29:48-0500 |
60.0 |
70.0 |
References¶
TODO: @westurner
reddit
twitter
hackernews