.. index:: Knowledge Engineering .. _knowledge engineering: Knowledge Engineering ======================== | Wikipedia: https://en.wikipedia.org/wiki/Knowledge_engineering | Wikipedia: https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Knowledge | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Graph_theory | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Ontology | WikipediaCategory: ``_ * https://en.wikipedia.org/wiki/Knowledge#Communicating_knowledge * https://en.wikipedia.org/wiki/Schema .. contents: .. index:: Symbols .. _symbols: Symbols --------- | Wikipedia: https://en.wikipedia.org/wiki/Symbol | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Symbols * ``__ * https://en.wikipedia.org/wiki/List_of_logic_symbols * :ref:`Art & Design` * [...] * :ref:`URI` are symbols: * :ref:`urn` (:term:`term: URN `) * :ref:`url` (:term:`term: URL `) * Linguistics https://en.wikipedia.org/wiki/Linguistics * https://en.wikipedia.org/wiki/Morpheme + https://en.wikipedia.org/wiki/Phoneme + https://en.wikipedia.org/wiki/Grapheme * https://en.wikipedia.org/wiki/Word * https://en.wikipedia.org/wiki/Phrase * https://en.wikipedia.org/wiki/Clause * ``__ * https://en.wikipedia.org/wiki/Paragraph * https://en.wikipedia.org/wiki/Document * :ref:`Mathematical Notation` * :ref:`LaTeX` * :ref:`MathML` * :ref:`ASCIIMathML` * :ref:`MathJax` .. index:: Character encoding .. index:: Character set .. index:: Character map .. index:: Codeset .. _character encoding: .. _character set: Character encoding ++++++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Character_encoding | WikipedaCategory: https://en.wikipedia.org/wiki/Category:Character_encoding * https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings .. index:: Control Characters .. _control character: .. _control characters: Control Characters ```````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Control_character * :ref:`ASCII` Control Characters https://en.wikipedia.org/wiki/Control_character#In_ASCII * :ref:`Unicode` Control Characters https://en.wikipedia.org/wiki/Unicode_control_characters .. warning:: Control characters are often significant. Common security errors involving control characters: * https://cwe.mitre.org/data/definitions/74.html CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection') * https://cwe.mitre.org/data/definitions/93.html CWE-93: Improper Neutralization of CRLF Sequences ('CRLF Injection') .. code:: python x = "line1_start" x2 = "thing\r\n\0line1_end" x = x + x2 x = x + "line2...line2_end\n" records = x.splitlines() # ! error * https://cwe.mitre.org/data/definitions/140.html CWE-140: Improper Neutralization of Delimiters + https://cwe.mitre.org/data/definitions/141.html CWE-141: Improper Neutralization of Parameter/Argument Delimiters + https://cwe.mitre.org/data/definitions/142.html CWE-142: Improper Neutralization of Value Delimiters + https://cwe.mitre.org/data/definitions/143.html CWE-143: Improper Neutralization of Record Delimiters + https://cwe.mitre.org/data/definitions/144.html CWE-144: Improper Neutralization of Line Delimiters + https://cwe.mitre.org/data/definitions/145.html CWE-145: Improper Neutralization of Section Delimiters .. index:: Escape Sequences .. _escape sequences: Escape Sequences ~~~~~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Escape_sequence * https://en.wikipedia.org/wiki/Escape_sequences_in_C#Table_of_escape_sequences * https://en.wikipedia.org/wiki/CDATA * https://en.wikipedia.org/wiki/CDATA#Nesting * :ref:`XML`, :ref:`HTML` & escape sequences:: & < > /> " # HTML & Templates

text

# attr='here"s one' * :ref:`Python` escape sequences: .. code:: python s = "Here's one" s = 'Here\'s one' s = '''Here's one''' s = 'Here\N{APOSTROPHE}s one' s = 'Here'"'s"' one' * :ref:`Bash` escape sequences: .. code:: bash s1="$Here's one" s1="${Here}'s one" s2='${Here}\'s one' # ! error s2='${Here}'"'s"' one' s3=""$Here"'s one" s3=""${Here}"'s one" .. index:: ASCII .. _ascii: ASCII ``````` | Wikipedia: https://en.wikipedia.org/wiki/ASCII ASCII (*American Standard Code for Information Exchance*) defines 128 characters. * https://en.wikipedia.org/wiki/Teleprinter#Teleprinter_operation * https://en.wikipedia.org/wiki/Telegraph_code - Python: .. code:: python from __future__ import print_function for i in range(0,128): print("{0:<3d} {1!r} {1:s}.".format(i, chr(i))) .. index:: Unicode .. _unicode: Unicode ````````` | Wikipedia: https://en.wikipedia.org/wiki/Unicode | Wikipedia: https://en.wikipedia.org/wiki/Unicode_symbols * https://en.wikipedia.org/wiki/Unicode_symbols#Symbol_block_list * Entering Unicode Symbols: | https://en.wikipedia.org/wiki/Unicode_input#Hexadecimal_code_input - https://en.wikipedia.org/wiki/Unicode_input#Hexadecimal_code_input + ∴ -- Therefore -- ``u+2234`` - :ref:`X11`: ``ctrl-shift-u 2234`` - :ref:`Vim`: ``ctrl-v u2234`` - :ref:`Python`: * Python 3 Unicode HOWTO: https://docs.python.org/3/howto/unicode.html * Python 2 Unicode HOWTO: https://docs.python.org/2/howto/unicode.html .. code:: python c1 = u'∴' # Python 2.6-3.2, 3.4+ c2 = '∴' # Python 3.0+ c3 = '\N{THEREFORE}' # howto/unicode#the-string-type glyph name u1 = unichr(0x2234) # Python 2+ u2 = chr(0x2234) # Python 3.0+ from builtins import chr # Python 2 & 3 u3 = chr(0x2234) # Python 2 & 3 u4 = chr(8756) # int(hex(8756)[2:], 16) == 8756 (0x2234) chars = [c1, c2, u1, u2, u3, u4] from operator import eq assert all((eq(x, chars[0]) for x in chars)) * Python and :ref:`UTF-8`: * Python 2 Codecs docs: https://docs.python.org/2/library/codecs.html * * https://pymotw.com/2/codecs/ * e.g. :ref:`JSON` with :ref:`UTF-8`: .. code:: python # Read an assumed UTF-8 encoded JSON file with Python 2+, 3+ import codecs with codecs.open('filename.json', encoding='utf8') as file_: text = file_.read() Unicode encodings: * UTF-1 * UTF-5 * UTF-6 * :ref:`UTF-8` * UTF-9, UTF-18 * UTF-16 * UTF-32 .. index:: UTF-8 .. _utf-8: .. _utf8: UTF-8 ~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/UTF-8 UTF-8 is a :ref:`Unicode` :ref:`Character encoding` which can represent all Unicode symbols with 8-bit code units. * https://en.wikipedia.org/wiki/UTF-8#Examples * In 2015, UTF-8 is the most common web character encoding. * :ref:`HTML` ``charset`` meta attribute: ```` * :ref:`XML` Header: ```` * :ref:`HTTP` Header: ``content-type: text/html; charset=UTF-8`` * Why use UTF-8? https://www.w3.org/International/questions/qa-choosing-encodings#useunicode .. index:: Logic, Reasoning, and Inference .. _logic reasoning and inference: Logic, Reasoning, and Inference --------------------------------- https://en.wikipedia.org/wiki/Epistemology * https://en.wikipedia.org/wiki/Truth * https://en.wikipedia.org/wiki/Belief * https://en.wikipedia.org/wiki/Theory_of_justification .. contents:: :local: .. index:: Logic .. _logic: Logic +++++++ | Wikipedia: https://en.wikipedia.org/wiki/Logic | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Logic * https://en.wikipedia.org/wiki/List_of_logic_symbols * https://en.wikipedia.org/wiki/Category:Latin_logical_phrases See: * :ref:`Inference` .. index:: Set Theory .. _set-theory: Set Theory ```````````` | Wikpedia: https://en.wikipedia.org/wiki/Set_theory .. index:: Boolean Algebra .. _boolean-algebra: Boolean Algebra ```````````````` | Wikipedia: https://en.wikipedia.org/wiki/Boolean_algebra .. index:: Many-valued Logic .. _many-valued-logic: Many-valued Logic ```````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Many-valued_logic .. index:: Three-valued Logic .. _three-valued-logic: Three-valued Logic ~~~~~~~~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Three-valued_logic :: { True, False, Unknown } { T, F, NULL } # SQL { T, F, None } # Python { T, F, nil } # Ruby { 1, 0, -1 } # .. index:: Fuzzy Logic .. _fuzzy-logic: Fuzzy Logic ~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Fuzzy_logic .. index:: Probabilistic Logic .. _probabilistic-logic: Probabilistic Logic ~~~~~~~~~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Probabilistic_logic .. index:: Propositional Calculus .. index:: Propositional Logic .. _propsitional logic: .. _propsitional calculus: Propositional Calculus ``````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Propositional_calculus | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Propositional_calculus | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Theorems_in_propositional_logic * Premise ``P`` * Conclusion ``Q`` .. index:: Modus ponens .. _modus ponens: Modus ponens ~~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Modus_ponens + ``P -> Q`` -- Premise 1 ``P1`` ``P_1`` ("P sub 1") + ``P`` -- Premise 2 ``P2`` ``P_2`` ("P sub 2") + ``∴ Q`` -- Conclusion ``Q`` ``Q_0`` ("Q sub 0") .. index:: Predicate Logic .. _predicate logic: Predicate Logic ````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Predicate_logic * Universe of discourse * Predicate * ∃ -- There exists -- Existential quantifier * ∀ -- For all -- Universal quantifier .. index:: Existential quantification .. _existential quantification: Existential quantification ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Existential_quantification * ∃ -- "There exists" is the **Existential quantifier** symbol. * An existential quantifier is true ("holds true") if there is one (or more) example in which the condition holds true. * An existential quantifier is *satisfied* by **one** (or more) examples. .. index:: Universal quantification .. _universal quantification: Universal quantification ~~~~~~~~~~~~~~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Existential_quantification * ∀ -- "For all" is the **Universal quantifier** symbol. * A universal quantification is **disproven by one counterexample** where the condition does not hold true. * disproven by one counterexample. .. index:: Hoare Logic .. _hoare logic: Hoare Logic ````````````` | Wikipedia: https://en.wikipedia.org/wiki/Hoare_logic * precondition ``P`` * command ``C`` * postcondition ``Q`` See: * :ref:`Propositional Calculus`, :ref:`Predicate Logic` * :ref:`Given-When-Then` .. index:: First-order Logic .. index:: FOL .. _FOL: First-order Logic ``````````````````` | Wikipedia: https://en.wikipedia.org/wiki/First-order_logic First-order logic (*FOL*) * Terms + Variables + ``x``, ``y``, ``z`` + ``x``, ``x_0`` ("x subscript 0", "x sub 0") + Functions + ``f(x)`` -- function symbol (arity 1) + ``a`` -- constant symbol (arity 0) ( ``a()`` ) + Formulas ("formulae") + Equality * ``=`` -- equality + Logical Connectives ("unary", "binary", sequence/tuple/list) + ``¬`` -- ``~``, ``!`` -- negation (unary) + ... + ``∧`` -- ``^``, ``&&``, ``and`` -- conjunction + ``∨`` -- ``v``, ``||``, ``or`` -- disjunction + ``→`` -- ``->``, ``⊃`` -- implication + ``↔`` -- ``<->``, ``≡`` -- biconditional + ... + ``XOR`` + ``NAND`` + Grouping Operators + Parentheses ``( )`` + Brackets ``< >`` + Relations + ``P(x)`` -- predicate symbol (n_args=1, arity 1, valence 1) + ``R(x)`` -- relation symbol (n_args=1, arity 1, valence 1) + ``Q(x,y)`` -- binary predicate/relation symbol (n_args=2, ...) + Quantifier Symbols "universe relation" * :ref:`∃ ` * :ref:`∀ ` + ... https://en.wikipedia.org/wiki/First-order_logic .. index:: Description Logic .. index:: DL .. _description logic: .. _dl: Description Logic ``````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Description_logic Description Logic (*DL*; DLP (Description Logic Programming)) * https://en.wikipedia.org/wiki/Description_logic#Notation * https://en.wikipedia.org/wiki/Description_logic#Relationship_with_other_logics :: Knowledge Base = TBox + ABox * https://en.wikipedia.org/wiki/TBox (Schema: Class/Property Ontology) * https://en.wikipedia.org/wiki/ABox (Facts / Instances) See: * :ref:`OWL`, :ref:`entailment` * :ref:`Semantic web` * :ref:`N3` for ``=>`` implies .. index:: Reasoning .. _reasoning: Reasoning ++++++++++ https://en.wikipedia.org/wiki/Deductive_reasoning https://en.wikipedia.org/wiki/Category:Reasoning https://en.wikipedia.org/wiki/Semantic_reasoner See: :ref:`DL` .. index:: Inference .. _inference: Inference ``````````` | Inference: https://en.wikipedia.org/wiki/Inference * https://en.wikipedia.org/wiki/Rule_of_inference (Logic) * https://en.wikipedia.org/wiki/List_of_rules_of_inference * https://en.wikipedia.org/wiki/Category:Statistical_inference (Logic + Math) .. index:: Entailment .. _entailment: Entailment ~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Entailment * https://www.w3.org/TR/owl2-profiles/#Introduction See: :ref:`data science` .. index:: Data Engineering .. _data engineering: Data Engineering ----------------- Data Engineering is about the 5 Ws (who, what, when, where, why) and **how** data are stored. | Who: schema:author ``@westurner`` ; | What: schema:name "WRD R&D Documentation"@en ; | When: schema:codeRepository <``__> ; | Where: schema:codeRepository <``__> ; | Why: schema:description "Documentation purposes"@en ; | How: schema:programmingLanguage :ReStructuredText ; | How: schema:runtimePlatform [ :Python, :CPython, :Sphinx ] ; .. contents:: :local: .. index:: File Structures .. _file structures: File Structures +++++++++++++++++ https://en.wikipedia.org/wiki/File_format ``_ ``_ https://en.wikipedia.org/wiki/Index#Computer_science * :ref:`tar` and :ref:`zip` are file structures that have a *manifest* and a *payload* * :ref:`filesystems` often have redundant manifests (and/or deduplication according to a hash table manifest with an interface like a :ref:`dht`) * :ref:`web standards` and :ref:`semantic web standards` which define file structures (and stream protocols): * :ref:`XML` * :ref:`RDF` (:ref:`RDF/XML`, :ref:`Turtle`, :ref:`n3`, :ref:`rdfa`, :ref:`json-ld`) * :ref:`JSON` (:ref:`json-ld`) * :ref:`HTTP` .. index:: Git File Structures .. _git file structures: Git File Structures `````````````````````` :ref:`Git` specifies a number of file structures: :ref:`Git Objects `, :ref:`Git References `, and :ref:`Git Packfiles `. Git implements something like **on-disk** *shared snapshot objects* with commits, branching, merging, and multi-protocol push/pull semantics: https://en.wikipedia.org/wiki/Shared_snapshot_objects .. index:: Git Object .. _git object: Git Object ~~~~~~~~~~~~~ | Docs: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects .. index:: Git Reference .. _git reference: Git Reference ~~~~~~~~~~~~~~~~ | Docs: https://git-scm.com/book/en/v2/Git-Internals-Git-References .. index:: Git Packfile .. _git packfile: Git Packfile ~~~~~~~~~~~~~~~~ | Docs: https://git-scm.com/book/en/v2/Git-Internals-Packfiles "Git is a content-addressable :ref:`filesystem `" .. index:: Bup .. _bup: ========== bup ========== | Homepage: https://bup.github.io/ | Source: git https://github.com/bup/bup | Docs: https://github.com/bup/bup/blob/master/README.md | Docs: https://bup.github.io/man.html | Docs: https://github.com/bup/bup/blob/master/DESIGN Bup (*backup*) is a backup system based on :ref:`git packfiles ` and rolling checksums. [:ref:`Bup` is a very] efficient backup system based on the :ref:`git packfile` format, providing fast incremental saves and global deduplication (among and within files, including virtual machine images). .. index:: Torrent file structure .. _torrent file structure: Torrent file structure ``````````````````````` A :term:`bittorrent torrent file` is an encoded manifest of tracker, :ref:`DHT`, and :term:`web seed ` :term:`URIs `; and segment checksum hashes. * Like :ref:`MPEG-DASH` and :ref:`HTTP Live Streaming`, :ref:`bittorrent` downloads file segments over :ref:`http`. See: :ref:`bittorrent`, :ref:`named data networking`, :ref:`web distribution` .. index:: File Locking .. _file locking: File Locking ++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/File_locking File locking is one strategy for synchronization with concurrency and parallelism. * An auxilliary ``.lock`` file is still susceptible to *race conditions* * :ref:`C` file locking functions: ``fcntl``, ``lockf``, ``flock`` * :ref:`Python` file locking functions: ``fcntl.fcntl``, ``fcntl.lockf``, ``fcntl.flock``: https://docs.python.org/2/library/fcntl.html * To lock a file for all processes with :ref:`Linux` requires a *mandatory file locking* mount option (`mount -o mand``) and per-file setgid and noexec bits (``chmod g+s,g-s``). * To lock a file (or a range / record of a file) for all processes with :ref:`Windows` requires no additional work beyond ``win32con.LOCKFILE_EXCLUSIVE_LOCK``, ``win32file.LockFileEx``, and ``win32file.UnlockFileEx``. * CWE-667: Improper Locking: https://cwe.mitre.org/data/definitions/667.html#Relationships + https://en.wikipedia.org/wiki/File_locking#Problems + https://en.wikipedia.org/wiki/Race_condition + CWE-833: Deadlock https://cwe.mitre.org/data/definitions/833.html https://en.wikipedia.org/wiki/Deadlock .. index:: Data Structures .. _data structures: Data Structures ++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Data_structure | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Data_structures | Docs: https://en.wikipedia.org/wiki/List_of_data_structures * https://rosettacode.org/wiki/Category:Programming_Tasks * https://rosettacode.org/wiki/Greatest_common_divisor * https://rosettacode.org/wiki/Go_Fish .. index:: Arrays .. _arrays: Arrays ```````` | Wikipedia: https://en.wikipedia.org/wiki/Array_data_structure | Docs: https://en.wikipedia.org/wiki/List_of_data_structures#Arrays An array is a data structure for unidimensional data. * Arrays must be resized when data grows beyond the initial shape of the array. * Sparse arrays are sparsely allocated. * A multidimensional array is said to be a :ref:`matrix `. .. index:: Matrix .. index:: Matrices .. _matrix: Matrices `````````` | Wikipedia: ``_ A matrix is a data structure for multidimensional data; a multidimensional :ref:`array `. .. index:: Lists .. _lists: Lists ``````` | Wikipedia: https://en.wikipedia.org/wiki/Linked_list | Docs: https://en.wikipedia.org/wiki/List_of_data_structures#Lists A list is a data structure with nodes that link to a next and/or previous node. .. index:: Graphs .. _graphs: Graphs ```````` | Wikipedia: ``__ | Wikipedia: ``__ | Wikipedia: ``__ | Docs: https://en.wikipedia.org/wiki/Conceptual_graph | WikipediaCategory: ``__ | WikipediaCategory: ``__ | WikipediaCategory: ``__ A graph is a :term:`system` of nodes connected by edges; an abstract data type for which there are a number of suitable data structures. * A node has edges. * An edge connects nodes. * Edges of **directed graphs** flow in only one direction; and so require two edges with separate attributes (e.g. 'magnitude', 'scale' | Wikipedia: https://en.wikipedia.org/wiki/Directed_graph * Edges of an **undirected graph** connect nodes in both directions (with the same attributes). | Wikipedia: ``__ * Graphs and :ref:`trees` are **traversed** (or *walked*); according to a given algorithm (e.g. :ref:`DFS`, :ref:`BFS`). * Graph nodes can be listed in many different *orders* (or with a given *ordering*): * Preoder * Inorder * Postorder * Level-order * There are many :ref:`data structure ` representatations for :ref:`graphs`. * There are many data serialization/marshalling formats for graphs: * Graph edge lists can be stored as adjacency :ref:`matrices `. * :ref:`NetworkX` supports a number of graph storage formats. * :ref:`RDF` is a :ref:`standard semantic web ` :ref:`linked data` format for :ref:`graphs`. * :ref:`JSON-LD` is a :ref:`standard semantic web ` :ref:`linked data` format for :ref:`graphs`. * There are many :ref:`Graph Databases` and :ref:`triplestores` for storing graphs. * A cartesian product has an interesting graph representation. (See :ref:`compression algorithms`) .. index:: NetworkX .. _networkx: NetworkX ~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/NetworkX | Homepage: https://networkx.github.io/ | Source: git https://github.com/networkx/networkx | Docs: https://networkx.readthedocs.io/en/latest/ | Docs: https://networkx.readthedocs.io/en/latest/tutorial/ | Docs: https://networkx.readthedocs.io/en/latest/reference/classes.html | Docs: https://networkx.readthedocs.io/en/latest/reference/algorithms.html NetworkX is an :ref:`open source` graph algorithms library written in :ref:`Python`. .. index:: DFS .. index:: Depth-first search .. _dfs: DFS ~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Depth-first_search DFS (*Depth-first search*) is a :ref:`graph ` traversal algorithm. :: # Given a tree: 1 1.1 1.2 2 2.1 2.2 # BFS: [1, 1.1, 1.2, 2, 2.1, 2.2 See also: :ref:`BSP`, Firefly Algorithm .. index:: BFS .. index:: Breadth-first search .. _bfs: BFS ~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Breadth-first_search BFS (*Breadth-first search*) is a :ref:`graph ` traversal agorithm. :: # Given a tree: 1 1.1 1.2 2 2.1 2.2 # BFS: 1, 2, 1.1, 1.2, 2.1, 2.2 * [ ] BFS and :ref:`BSP` .. index:: Topological Sorting .. _topological sorting: Topological Sorting ~~~~~~~~~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Topological_sorting A DAG (*directed acyclic* :ref:`graph `) has a topological sorting, or is topologically sorted. * The unix ``tsort`` utility does a topological sorting of a space and newline delimited list of edge labels: .. code:: bash $ tsort --help Usage: tsort [OPTION] [FILE] Write totally ordered list consistent with the partial ordering in FILE. With no FILE, or when FILE is -, read standard input. --help display this help and exit --version output version information and exit GNU coreutils online help: For complete documentation, run: info coreutils 'tsort invocation' $ echo -e '1 2\n2 3\n3 4\n2 a' | tsort 1 2 a 3 4 * Installing a set of packages with dependencies is a topological sorting problem; plus e.g. version and platform constraints (as solvable with a SAT constraint satisfaction solver (see :ref:`conda` (pypi:pycosat))) * A topological sorting can identify the "root" of a **directed acyclic graph**. * *Information gain* can be useful for less discrete problems. .. index:: Trees .. _trees: Trees ``````` | Wikipedia: https://en.wikipedia.org/wiki/Tree_data_structure | Docs: https://rosettacode.org/wiki/Tree_traversal A tree is a directed :ref:`graph `. * A tree is said to have branches and leaves; or just nodes. There are many types of and applications for trees: * https://en.wikipedia.org/wiki/List_of_data_structures#Trees * https://en.wikipedia.org/wiki/B-tree * https://en.wikipedia.org/wiki/Trie * https://en.wikipedia.org/wiki/Abstract_syntax_tree * https://en.wikipedia.org/wiki/Parse_tree * https://en.wikipedia.org/wiki/Decision_tree * https://en.wikipedia.org/wiki/Minmax * https://en.wikipedia.org/wiki/Database_index * Search: Indexing, Lookup .. index:: Compression Algorithms .. _compression algorithms: Compression Algorithms +++++++++++++++++++++++++ .. index:: bzip2 .. _bzip2: bzip2 ``````` | Wikipedia: https://en.wikipedia.org/wiki/Bzip2 | File Extension: ``.bz2`` | Homepage: http://bzip.org/ bzip2 is an :ref:`open source` lossless compression algorithm based upon the ``Burrows-Wheeler`` algorithm. * bzip2 is usually slower than :ref:`gzip` or :ref:`zip`, but more space efficient .. index:: gzip .. _gzip: gzip `````` | Wikipedia: https://en.wikipedia.org/wiki/Gzip | Homepage: https://www.gnu.org/software/gzip/ | File Extension: ``.gz`` | Source: https://ftp.gnu.org/gnu/gzip/ | Docs: https://www.gnu.org/software/gzip/manual/ | Docs: https://www.gnu.org/software/gzip/manual/gzip.html gzip is a compression algorithm based on ``DEFLATE`` and ``LZ77``. * gzip is similar to :ref:`Zip`, in that both are based upon ``DEFLATE`` .. index:: tar .. _tar: tar ```` | Wikipedia: ``__ | File Extension: ``.tar`` :ref:`tar` is a file archiving format for storing a manifest of records of a set of files with paths and attributes at the beginning of the actual files all concatenated into one file. * TAR = ( table of contents + data stream ) * ``.tar.gz`` is :ref:`tar` + :ref:`gzip` * ``.tar.bz2`` is :ref:`tar` + :ref:`bzip2` TAR and :ref:`gzip` or :ref:`bzip2` can be streamed over SSH:: # https://unix.stackexchange.com/a/95994 tar czf - . | ssh remote "( cd ~/ ; cat > file.tar.gz )" tar bzf - . | ssh remote "( cd ~/ ; cat > file.tar.bz2 )" See also: :ref:`zip` (:ref:`windows`) .. index:: ZIP .. _zip: zip ```` | Wikipedia: ``__ zip is a lossless file archive compression .. index:: Checksums .. index:: Hash Functions .. _hash function: Hash Functions ++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Hash_function | Wikipedia: https://en.wikipedia.org/wiki/Cryptographic_hash_function Hash functions (or *checksums*) are one-way functions designed to produce uniquely identifying identifiers for blocks or whole files in order to verify data :ref:`integrity`. * A *hash* is the output of a hash function. * In :ref:`Python`, ``dict`` keys must be *hashable* (must have a ``__hash__`` method). * In :ref:`Java`, :ref:`Scala`, and many other languages ``dicts`` are called ``HashMaps``. * :ref:`MD5` is a checksum algorithm. * :ref:`SHA` is a group of checksum algorithms. .. index:: CRC .. index:: Cyclical Redundancy Check .. _crc: CRC ```` | Wikipedia: https://en.wikipedia.org/wiki/Cyclic_redundancy_check A CRC (*Cyclical Redundancy Check*) is a hash function for error detection based upon an extra *check value*. * :ref:`Hard drives` and :ref:`SSDs ` implement CRCs. * :ref:`Ethernet` implements CRCs. .. index:: MD5 .. _md5: MD5 ````` | Wikipedia: https://en.wikipedia.org/wiki/MD5 MD5 is a 128-bit hash function which is now broken, and deprecated in favor of :ref:`SHA-2 ` or better. .. code:: bash md5 md5sums .. index:: SHA .. _sha: SHA ```` | Wikipedia: https://en.wikipedia.org/wiki/Secure_Hash_Algorithm * SHA-0 -- 160 bit (retracted 1993) * SHA-1 --- 160 bit (deprecated 2010) * SHA-2 --- sha-256, sha-512 * SHA-3 (2012) .. code:: bash shasum shasum -a 1 shasum -a 224 shasum -a 256 shasum -a 384 shasum -a 512 shasum -a 512224 shasum -a 512256 .. index:: Filesystems .. index:: File Systems .. _filesystems: Filesystems ++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/File_system Filesystems (*file systems*) determine how files are represented in a persistent physical medium. * On-disk filesystems determine where and how redundantly data is stored * On-disk filesystems: :ref:`ext`, :ref:`btrfs`, :ref:`FAT`, :ref:`NTFS`, :ref:`HFS+` * :ref:`network filesystems` link disk storage pools with other resources (e.g. :ref:`NFS`, :ref:`Ceph`, :ref:`GlusterFS`) .. index:: RAID .. _raid: RAID `````` | Wikipedia: https://en.wikipedia.org/wiki/RAID RAID (*redundant array of independent disks*) is set of configurations for :ref:`hard drives` and :ref:`SSDs ` to *stripe* and/or *mirror* with *parity*. :: RAID 0 -- striping, -, no parity ... throughput RAID 1 -- no striping, mirroring, no parity ... RAID 2 -- bit striping, -, no parity ... legacy RAID 3 -- byte striping, -, dedicated parity ... uncommon RAID 4 -- block striping, -, dedicated parity RAID 5 -- block striping, -, distributed parity ... min. 3; n-1 rebuild RAID 6 -- block striping, -, 2x distributed parity RAID Implementations: * RAID may be implemented by a physical controller with multiple drive connectors. * RAID may be implemented as a BIOS setting. * RAID may be implemented with software e.g. :ref:`lvm`, :ref:`btrfs`. * https://en.wikipedia.org/wiki/RAID#Software-based * https://en.wikipedia.org/wiki/RAID#Firmware-_and_driver-based ("*fake RAID*") * Data Scrubbing Data scrubbing is a technique for checking for inconsistencies between redundant copies of data Data scrubbing is routinely part of RAID (with *mirrors* and/or *parity* bits). https://en.wikipedia.org/wiki/Data_scrubbing .. index:: MBR .. _mbr: MBR ````` | Wikipedia: https://en.wikipedia.org/wiki/Master_boot_record MBR (*Master Boot Record*) is a boot record format and a file partition scheme. * DOS and :ref:`Windows` use MBR partition tables. * Many/most UNIX variants support MBR partition tables. * :ref:`Linux` supports MBR partition tables. * Most PCs since 1983 boot from MBR partition tables. * When a PC boots, it reads the MBR on the first configured drive in order to determine where to find the bootloader. .. index:: GPT .. _gpt: GPT ````` | Wikipedia: https://en.wikipedia.org/wiki/GUID_Partition_Table GPT (*GUID Partition Table*) is a boot record format and a file partition scheme wherein partitions are assigned GUIDs (*Globally Unique Identifiers*). * :ref:`OSX` uses GPT partition tables. * :ref:`Linux` supports GPT partition tables. * https://en.wikipedia.org/wiki/GUID_Partition_Table#UNIX_and_Unix-like_operating_systems .. index:: LVM .. index:: Logical Volume Manager .. _lvm: LVM `````````````````````` | Wikipedia: ``__ | Homepage: https://www.sourceware.org/lvm2/ | Source: ftp://sources.redhat.com/pub/lvm2/ | Docs: https://www.sourceware.org/dm/ | Docs: https://www.tldp.org/HOWTO/LVM-HOWTO/index.html | Docs: https://www.tldp.org/HOWTO/LVM-HOWTO/anatomy.html LVM (*Logical Volume Manager*) is an :ref:`open source` software disk abstraction layer with snapshotting, copy-on-write, online resize and allocation and a number of additional features. * In LVM, there are *Volume Groups* (VG), *Physical Volumes* (PV), and *Logical Volumes* (LV). * LVM can do striping and high-availability sofware :ref:`RAID`. * LVM and ``device-mapper`` are now part of the :ref:`Linux` kernel tree (the LVM :ref:`linux` kernel modules are built and included with most distributions' default kernel build). * LVM Logical Volumes can be resized online (without e.g. rebooting to busybox or a LiveCD); but many :ref:`filesystems` support only onlize grow (and not online shrink). * There is feature overlap between :ref:`lvm` and :ref:`btrfs` (pooling, snapshotting, copy-on-write). .. index:: btrfs .. _btrfs: btrfs ``````` | Wikipedia: https://en.wikipedia.org/wiki/Btrfs | Homepage: https://btrfs.wiki.kernel.org/index.php/Main_Page | Source: https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories | Source: git git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git | Docs: https://btrfs.wiki.kernel.org/index.php/Getting_started#Basic_Filesystem_Commands | Docs: https://btrfs.wiki.kernel.org/index.php/Problem_FAQ | Docs: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-btrfs.html | Docs: https://wiki.archlinux.org/index.php/Btrfs | Docs: https://help.ubuntu.com/community/btrfs btrfs (:ref:`B-tree ` *filesystem*) is an :ref:`open source` pooling, snapshotting, checksumming, deduplicating, union mounting copy-on-write on-disk :ref:`Linux` filesystem. .. index:: ext2 .. index:: ext3 .. index:: ext4 .. _ext: ext ```` | Wikipedia: https://en.wikipedia.org/wiki/Ext2 | Wikipedia: https://en.wikipedia.org/wiki/Ext3 | Wikipedia: https://en.wikipedia.org/wiki/Ext4 ext2, ext3, and ext4 are the ext (*extended filesystem*) :ref:`open source` on-disk filesystems. * ext filesystems are the default filesystems of many :ref:`Linux` distributions. * :ref:`windows` machines can access ext2, ext3, and ext4 filesystems with ext2explore and ext2fsd. * :ref:`OSX` machines can access ext2, ext3, and ext4 filesystems with OSXFuse and FUSE-EXT2. .. index:: FAT .. index:: FAT12 .. index:: FAT16 .. index:: FAT32 .. _fat: FAT ````` | Wikipedia: https://en.wikipedia.org/wiki/File_Allocation_Table FAT is a group of on-disk filesystem standards. * FAT is used on cross-platform USB drives. * FAT is found on older :ref:`Windows` and DOS machines. * FAT12, FAT16, and FAT32 are all FAT filesystem standards. * FAT32 has a maximum filesize of 4GB and a maximum volume size of 2 TB. * :ref:`Windows` machines can read and write FAT partitions. * :ref:`OSX` machines can read and write FAT partitions. * :ref:`Linux` machines can read and write FAT partitions. .. index:: ISO9660 .. _iso9660: ISO9660 ````````` | Wikipedia: https://en.wikipedia.org/wiki/ISO_9660 | FileExt: ``.iso`` ISO9660 is an :ref:`ISO` standard for :ref:`disc drive ` images which specifies a standard for booting from a filesystem image. * Many :ref:`Operating System ` distributions are distributed as :ref:`ISO9660` ``.iso`` files. * ISO9660 and :ref:`Linux`: + An ISO9660 ISO can be *loop mounted*:: mount -o loop,ro -t iso9660 ./path/to/file.iso /mnt/cdrom + An ISO8660 CD can be *mounted*:: mount -o ro -t iso9660 /dev/cdrom /mnt/cdrom * Most CD/DVD burning utilities support ISO9660 ``.iso`` files. * ISO9660 is useful in that it specifies how to encode the boot sector (*El Torito*) and partition layout. * Nowadays, ISO9660 ``.iso`` files are often converted to raw drive images and written to bootable :ref:`USB` Mass Storage devices (e.g. to write a install / recovery disq for :ref:`Debian`, :ref:`Ubuntu`, :ref:`Fedora`, :ref:`Windows`) .. index:: HFS+ .. _hfs+: HFS+ ````````` | Wikipedia: https://en.wikipedia.org/wiki/HFS_Plus HFS+ (*Hierarchical Filesystem*) or *Mac OS Extended*, is the filesystem for Mac OS 8.1+ and :ref:`OSX`. * HFS+ is required for :ref:`OSX` and Time Machine. https://www.cnet.com/how-to/the-best-ways-to-format-an-external-drive-for-windows-and-mac/ * :ref:`Windows` machines can access HFS+ partitions with: HFSExplorer (free, :ref:`Java`), Paragon HFS+ for Windows, or MacDrive https://www.makeuseof.com/tag/4-ways-read-mac-formatted-drive-windows/ * :ref:`Linux` machines can access HFS+ partitions with ``hfsprogs`` (``apt-get install hfsprogs``, ``yum install hfsprogs``). .. index:: NTFS .. _ntfs: NTFS ``````` | Wikipedia: https://en.wikipedia.org/wiki/NTFS NTFS is a proprietary journaling filesytem. * :ref:`Windows` machines since Windows NT 3.1 and Windows XP default to NTFS filesystems. * Non-Windows machines can access NTFS partitions through NTFS-3G: https://en.wikipedia.org/wiki/NTFS-3G .. index:: FUSE .. _fuse: FUSE ````` | Wikipedia: https://en.wikipedia.org/wiki/Filesystem_in_Userspace | Homepage: https://github.com/libfuse/libfuse | Source: https://github.com/libfuse/libfuse | Docs: https://libfuse.github.io/doxygen/ FUSE (*Filesystem in Userspace*) is a userspace filesystem API for implementing filesystems in userspace. * FUSE support is included in the :ref:`Linux` kernel since 2.6.14. * FUSE is available for most :ref:`POSIX` platforms. Interesting FUSE implementations: * PyFilesystem is a :ref:`Python` :term:`language api` interface which supports `FUSE`: https://docs.pyfilesystem.org/en/latest/ * There are FUSE bindings for :ref:`Hadoop` :ref:`HDFS`. * :ref:`Ceph` can be mounted with/over/through `FUSE`. * :ref:`GlusterFS` can be mounted with/over/through `FUSE`. * :ref:`NTFS`-3G mounts volumes with `FUSE`. * virtualbox-fuse supports mounting of :ref:`virtualbox` VDI images with FUSE. * :ref:`SSHFS`, GitFS, GmailFS, GdriveFS, WikipediaFS and :ref:`Gnome` GVFS are all FUSE filesystems. .. index:: SSHFS .. _sshfs: SSHFS ~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/SSHFS | Source: https://github.com/libfuse/sshfs | Docs: https://wiki.archlinux.org/index.php/Sshfs | Docs: https://help.ubuntu.com/community/SSHFS | Docs: https://github.com/osxfuse/osxfuse/wiki/SSHFS SSHFS is a :ref:`FUSE` filesystem for mounting remote directories over SSH. .. index:: Network Filesystems .. index:: Network File Systems .. _network filesystems: Network Filesystems +++++++++++++++++++++ | Wikipedia: ``__ .. index:: Ceph .. _ceph: Ceph ````` | Wikipedia: ``__ | Homepage: https://ceph.io/ | Source: git https://github.com/ceph/ceph | Docs: https://docs.ceph.com/docs/master/ | Docs: https://docs.ceph.com/docs/master/rados/ | Docs: https://docs.ceph.com/docs/master/radosgw/ | Docs: https://docs.ceph.com/docs/master/radosgw/s3/ | Docs: https://docs.ceph.com/docs/master/radosgw/swift/ | Docs: https://docs.ceph.com/docs/master/radosgw/keystone/ | Docs: https://docs.ceph.com/docs/master/rbd/rbd-openstack/ Ceph is an :ref:`open source` network filesystem (a :ref:`distributed database ` for files with attributes like owner, group, permissions) written in :ref:`C++` and :ref:`Perl` which runs over top of one or more on-disk filesystems. * Ceph Block Device (*rbd*) -- striping, caching, snapshots, copy-on-write, :ref:`kvm`, :ref:`libvirt`, :ref:`OpenStack` Cinder block storage * Ceph Filesystem (*cephfs*) -- :ref:`POSIX` :ref:`filesystem ` with :ref:`FUSE`, :ref:`NFS`, :ref:`CIFS`, and :ref:`HDFS` APIs * Ceph Object Gateway (*radosgw*) -- :term:`RESTful API`, :ref:`AWS` S3 API, :ref:`OpenStack` Swift API, :ref:`OpenStack` Keystone authentication .. index:: CIFS .. _cifs: CIFS `````` CIFS (*Common Internet File System*) is a centralized network filesystem protocol. * Samba ``smbd`` is one implementation of a :ref:`CIFS` network file server. .. index:: DDFS .. _ddfs: DDFS `````` | DDFS (*Disco Distributed File System*) is a distributed network filesystem written in :ref:`Python` and :ref:`C`. * DDFS is like a :ref:`python` implementation of :ref:`HDFS` (which is written in :ref:`Java`). .. index:: GlusterFS .. _glusterfs: GlusterFS ``````````` | Wikipedia: https://en.wikipedia.org/wiki/GlusterFS | Homepage: https://www.gluster.org/ | Source: https://github.com/gluster/glusterfs | Docs: https://gluster.readthedocs.io/en/latest/ | Docs: https://gluster.readthedocs.io/en/latest/Quick-Start-Guide/Quickstart/ | Docs: https://gluster.readthedocs.io/en/latest/Install-Guide/Setup_virt/ | Docs: https://gluster.readthedocs.io/en/latest/Install-Guide/Setup_Bare_metal/ | Docs: https://gluster.readthedocs.io/en/latest/Install-Guide/Setup_aws/ | Docs: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/GlusterFS%20Cinder/ | Tcp ports: 111, 24007, 24008, 24009, 24010, 24011, 38465:38469 GlusterFS is an :ref:`open source` network filesystem (a :ref:`distributed database ` for files with attributes like owner, group, permissions) which runs over top of one or more on-disk filesystems. * GlusterFS can serve volumes for :ref:`OpenStack` Cinder block storage .. index:: Hadoop distributed filesystem .. index:: HDFS .. _hdfs: HDFS `````````` | Wikipedia: https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS HDFS (*Hadoop Distributed File System*) is an :ref:`open source` distributed network filesystem. * HDFS runs code next to data; rather than streaming data through code across the network. * HDFS is especially suitable for :ref:`MapReduce`-style distributed computation. * Apache `Hadoop` works with files stored over HDFS, FTP, :ref:`S3`, WASB (Azure) * There are HDFS :term:`language apis ` for many languages: :ref:`Java`, :ref:`Scala`, :ref:`Go`, :ref:`Python`, :ref:`Ruby`, :ref:`Perl`, :ref:`Haskell`, :ref:`C++` * :ref:`Mesos` can manage distributed HDFS grids. * :ref:`ElasticSearch` * It's possible to configure a `Jenkins` :ref:`continuous integration` cluster as :ref:`Hadoop` cluster. * Many databases support storage over HDFS (:ref:`HBase`, :ref:`Cassandra`, :ref:`Accumulo`, :ref:`Spark`) * :ref:`Ceph` can now serve files over :ref:`HDFS`. * HDFS can be mounted as a :ref:`FUSE` filesystem (e.g. with :ref:`Linux`). * HDFS can be accessed from the commandline with the Hadoop *FS shell*: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html * HDFS can be browsed with hdfs-du: https://github.com/twitter/hdfs-du .. index:: NFS .. _nfs: NFS `````` | Wikipedia: https://en.wikipedia.org/wiki/NFS NFS (*Network File System* #TODO) is an :ref:`open source` centralized network filesystem. .. index:: S3 .. _s3: S3 `````````````` * :ref:`AWS` S3 * :ref:`OpenStack` Swift * :ref:`Ceph` * :ref:`GlusterFS` .. index:: Swift .. _swift: Swift ``````` * :ref:`OpenStack` Swift * :ref:`Ceph` * :ref:`GlusterFS` .. index:: SMB .. _smb: SMB ``````` | Wikipedia: https://en.wikipedia.org/wiki/Server_Message_Block SMB (*Server Message Block*) is a centralized network filesystem. * SMB has been superseded by :ref:`CIFS`. .. index:: WebDAV .. _webdav: WebDAV ```````` | Wikipedia: https://en.wikipedia.org/wiki/WebDAV | Standard: https://tools.ietf.org/html/rfc2518 | Standard: https://tools.ietf.org/html/rfc4918 WebDAV (*Web Distributed Authoring and Versioning*) is a network filesystem protocol built with :ref:`HTTP`. * WebDAV specifies a number of unique :ref:`HTTP` methods: * ``PROPFIND`` (``ls``, ``stat``, ``getfacl``), * ``PROPPATCH`` (``touch``, ``setfacl``) * ``MKCOL`` (``mkdir``) * ``COPY`` (``cp``) * ``MOVE`` (``mv``) * ``LOCK`` (:ref:`file locking`) * ``UNLOCK`` () .. index:: Databases .. _databases: Databases +++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Database * https://en.wikipedia.org/wiki/Database_schema * https://en.wikipedia.org/wiki/Create,_read,_update_and_delete * https://en.wikipedia.org/wiki/CRUD * https://en.wikipedia.org/wiki/ACID * https://en.wikipedia.org/wiki/Query_plan * https://en.wikipedia.org/wiki/Database_index * :ref:`search engine indexing` * https://en.wikipedia.org/wiki/Category:Database_software_comparisons * https://db-engines.com/en/ranking .. index:: ORM .. index:: Object Relational Mapping .. _orm: Object Relational Mapping ``````````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Object-relational_mapping * https://en.wikipedia.org/wiki/Data_mapper_pattern * https://en.wikipedia.org/wiki/Active_record_pattern https://en.wikipedia.org/wiki/Object-relational_impedance_mismatch * https://en.wikipedia.org/wiki/List_of_object-relational_mapping_software .. index:: Relation Algebra .. _relation algebra: Relation Algebra ````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Relation_algebra * https://en.wikipedia.org/wiki/Relation_algebra#Expressing_properties_of_binary_relations_in_RA See: :ref:`relational algebra` .. index:: Relational Algebra .. _relational algebra: Relational Algebra ``````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Relational_algebra * ``_ * https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators * https://en.wikipedia.org/wiki/Relational_algebra#Common_extensions See: :ref:`relation algebra`, :ref:`relational databases` .. index:: Relational Databases .. index:: SQL Databases .. _relational databases: Relational Databases ````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Relational_database https://en.wikipedia.org/wiki/Relational_model :ref:`relational algebra` https://en.wikipedia.org/wiki/Database_normalization * https://en.wikipedia.org/wiki/Referential_integrity * https://en.wikipedia.org/wiki/Functional_dependency * https://en.wikipedia.org/wiki/Dangling_pointer * https://en.wikipedia.org/wiki/Natural_key * https://en.wikipedia.org/wiki/Surrogate_key * https://en.wikipedia.org/wiki/Foreign_key * https://en.wikipedia.org/wiki/Denormalization https://en.wikipedia.org/wiki/Relational_database_management_system * https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems * :ref:`mysql` * :ref:`postgresql` * :ref:`sqlite` * :ref:`Virtuoso` * https://db-engines.com/en/ranking/relational+dbms What doesn't SQL do? * :ref:`RDF`, :ref:`OWL` * https://en.wikipedia.org/wiki/OLAP .. index:: SQL .. _sql: SQL ~~~~ | Wikipedia: https://en.wikipedia.org/wiki/SQL * ``_ * ``_ * https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL SQL security: * https://en.wikipedia.org/wiki/SQL_injection * https://cwe.mitre.org/top25/#CWE-89 (#1 Most Prevalent Dangerous Security Error (2011)) See: :ref:`Object Relational Modeling ` .. index:: MySQL .. _mysql: MySQL ~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/MySQL | Homepage: https://www.mysql.com/ | Download: https://dev.mysql.com/downloads/mysql/ | Source: git https://github.com/mysql/mysql-server | Doc: https://dev.mysql.com/doc/ MySQL Community Edition is an :ref:`open source` relational database. .. index:: PostgreSQL .. _postgresql: PostgreSQL ~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/PostgreSQL | Homepage: https://www.postgresql.org/ | Download: https://www.postgresql.org/download/ | Source: git https://git.postgresql.org/git/postgresql.git | Docs: https://www.postgresql.org/docs/ | Docs: https://www.postgresql.org/docs/12/index.html | Docs: https://www.postgresql.org/docs/12/sql.html PostgreSQL is an :ref:`open source` relational database. * PostgreSQL has native support for storing and querying :ref:`JSON`. * PostgreSQL has support for geographical queries (*PostGIS*). .. index:: SQLite .. _sqlite: SQLite ~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/SQLite | Homepage: https://www.sqlite.org/ | Download: https://www.sqlite.org/download.html | Source: | Docs: https://www.sqlite.org/docs.html | Docs: https://www.sqlite.org/different.html | Docs: https://www.sqlite.org/threadsafe.html | Docs: https://www.sqlite.org/uri.html | FileExt: ``.sqlite`` SQLite is a serverless :ref:`open source` relational database which stores all data in one file. * SQLite is included in the :ref:`Python` standard library. .. index:: Virtuoso Universal Server .. index:: Virtuoso .. _virtuoso: Virtuoso ~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Virtuoso_Universal_Server | Homepage: https://virtuoso.openlinksw.com | Source: git https://github.com/openlink/virtuoso-opensource | Docs: http://docs.openlinksw.com/virtuoso/ | Docs: http://docs.openlinksw.com/virtuoso/sqlreference.html | Docs: http://docs.openlinksw.com/virtuoso/rdfandsparql.html | Docs: http://docs.openlinksw.com/virtuoso/rdfsparql.html | Docs: http://docs.openlinksw.com/virtuoso/rdfsparqlrule.html | Docs: http://docs.openlinksw.com/virtuoso/rdfgraphsecurity.html | Docs: http://docs.openlinksw.com/virtuoso/virtuosospongerusage/ Virtuoso :ref:`open source` edition is a multi-paradigm :ref:`relational database ` / :ref:`XML` document database / :ref:`RDF triplestore `. * Relational Tables Data Management (Columnar or Column-Store :ref:`SQL` RDBMS) * Relational Property Graphs Data Management (:ref:`SPARQL` :ref:`RDF` based Quad Store) * Content Management (:ref:`HTML`, TEXT, :ref:`TURTLE`, :ref:`RDF/XML`, :ref:`JSON`, :ref:`JSON-LD`, :ref:`XML`) * Web and other Document File Services (Web Document or File Server) * :ref:`Five-Star Linked Open Data ` Deployment (:ref:`RDF`-based :ref:`Linked Data` Server) * Web Application Server (SOAP or :term:`RESTful ` interaction modes). * Virtuoso supports ODBC, JDBC, and DB-API relational database access. * Virtuoso powers :ref:`DBpedia`. .. index:: NoSQL .. index:: NoSQL Databases .. _nosql: NoSQL Databases ````````````````` | Wikipedia: https://en.wikipedia.org/wiki/NoSQL ``_ ``_ * ``_ * ``_ * https://en.wikipedia.org/wiki/Apache: .. index:: Graph Databases .. _graph databases: Graph Databases `````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Graph_database https://en.wikipedia.org/wiki/Graph_database#Graph_database_projects * https://en.wikipedia.org/wiki/AllegroGraph [:ref:`RDF`] * :ref:`Blazegraph` [:ref:`RDF`, :ref:`OWL`] * :ref:`neo4j` * :ref:`Accumulo` + https://en.wikipedia.org/wiki/Sqrrl * :ref:`Virtuoso` [:ref:`RDF`, :ref:`OWL`] * https://db-engines.com/en/ranking/graph+dbms Graph Queries * https://en.wikipedia.org/wiki/Graph_database#APIs_and_Graph_Query.2FProgramming_Languages * :ref:`SPARQL` * :ref:`Gremlin` * :ref:`Blueprints` * :ref:`Spark` GraphX .. index:: Blazegraph .. _blazegraph: Blazegraph ~~~~~~~~~~~~ | Homepage: https://www.blazegraph.com/ | Download: https://www.blazegraph.com/download | Src: https://github.com/blazegraph/database | Docs: https://www.blazegraph.com/learn | Docs: https://www.blazegraph.com/inference | Docs: https://www.blazegraph.com/blueprints | Docs: https://www.blazegraph.com/sesame | Docs: https://www.blazegraph.com/develop | Docs: https://www.blazegraph.com/docs/api/ | Docs: https://wiki.blazegraph.com/wiki/index.php/Main_Page Blazegraph is an :ref:`open source` :ref:`graph database ` written in :ref:`Java` with support for :ref:`Gremlin`, :ref:`Blueprints`, :ref:`RDF`, :ref:`RDFS` and :ref:`OWL` inferencing, :ref:`SPARQL`. * Blazegraph was formerly known as *Bigdata*. * Blazegraph 1.5.2 supports :ref:`Solr` (e.g. TF-IDF) indexing. * Blazegraph will power the :ref:`Wikidata` Query Service (RDF, SPARQL): https://lists.wikimedia.org/pipermail/wikidata-tech/2015-March/000740.html * MapGraph is a set of :ref:`GPU`-accelerations for graph processing. .. index:: Blueprints .. _blueprints: Blueprints ~~~~~~~~~~~ | Wikipedia: | Homepage: | Src: git https://github.com/tinkerpop/blueprints | Docs: https://github.com/tinkerpop/blueprints/wiki Blueprints is an :ref:`open source` :ref:`graph database ` API (and reference graph data model). Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data model. Blueprints is analogous to the JDBC, but for graph databases. As such, it provides a common set of interfaces to allow developers to plug-and-play their graph database backend. Moreover, software written atop Blueprints works over all Blueprints-enabled graph databases. Within the TinkerPop software stack, Blueprints serves as the foundational technology for: * Pipes: A lazy, data flow framework * :ref:`Gremlin`: A graph traversal language * Frames: An object-to-graph mapper * Furnace: A graph algorithms package * Rexster: A graph server * There are many blueprints API implementations (e.g. Rexster, :ref:`neo4j`, :ref:`Blazegraph`, :ref:`Accumulo`) .. index:: Gremlin .. _gremlin: Gremlin ~~~~~~~~ | Wikipedia: ``__ | Src: git https://github.com/tinkerpop/gremlin | Docs: https://github.com/tinkerpop/gremlin/wiki Gremlin is an :ref:`open source` domain-specific language for traversing property graphs. * Gremlin works with databases that implement the :ref:`blueprints` graph database API. .. index:: Neo4j .. _neo4j: Neo4j ~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Neo4j | Homepage: https://neo4j.com/ | Download: http://neo4j.com/download/ | Src: git https://github.com/neo4j/neo4j | Docs: http://neo4j.com/developer/get-started/ | Docs: http://neo4j.com/docs/ | Docs: http://neo4j.com/docs/2.2.3/ | Docs: http://neo4j.com/developer/cypher/ | Docs: http://neo4j.com/docs/stable/cypher-refcard/ | Docs: https://en.wikipedia.org/wiki/Cypher_Query_Language | Docs: http://neo4j.com/open-source-project/ Neo4j is an :ref:`Open Source` HA graph database written in :ref:`Java`. * Neo4j implements the :ref:`Paxos` distributed algorithm for HA (*high availability*). * Neo4j can integrate with :ref:`Spark` and :ref:`ElasticSearch`. * Neo4j is widely deployed in production environments. * There is a :ref:`blueprints` API implementation for Neo4j: https://github.com/tinkerpop/blueprints/wiki/Neo4j-Implementation .. index:: RDF Triplestores .. index:: RDF Databases .. index:: Triplestores .. _triplestores: RDF Triplestores ````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Triplestore https://en.wikipedia.org/wiki/List_of_subject-predicate-object_databases * :ref:`Blazegraph` * ``__ * ``__ * :ref:`Virtuoso` * https://db-engines.com/en/ranking/rdf+store Graph Pattern Query Results * :ref:`SPARQL` * https://en.wikipedia.org/wiki/Redland_RDF_Application_Framework * http://librdf.org/notes/contexts.html * ``__ * SAIL (Storage and Inferencing Layer) API * https://en.wikipedia.org/wiki/CubicWeb * :ref:`RDFLib` ``rdfs:seeAlso`` * :ref:`Linked Data` * :ref:`Semantic Web` * :ref:`semantic Web Standards` * :ref:`Semantic Web Tools` .. index:: Distributed Databases .. _distributed databases: Distributed Databases ```````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Distributed_database | Wikipedia: https://en.wikipedia.org/wiki/Distributed_data_store See: :ref:`distributed algorithms` .. index:: Accumulo .. _accumulo: Accumulo ~~~~~~~~~~ | Wikipedia: | Homepage: https://accumulo.apache.org/ | Download: https://accumulo.apache.org/downloads/ | Source: git https://github.com/apache/accumulo | Docs: https://accumulo.apache.org/1.7/accumulo_user_manual.html | Docs: https://accumulo.apache.org/1.7/accumulo_user_manual.html#_accumulo_design | Twitter: https://twitter.com/apacheaccumulo Apache Accumulo is an :ref:`open source` distributed database key/value store written in :ref:`Java` based on :ref:`BigTable` which adds realtime queries, streaming iterators, row-level ACLs and a number of additional features. * Accumulo supports :ref:`MapReduce`-style computation. * Accumulo supports streaming iterator computation. * Accumulo supports :ref:`HDFS`. * Accumulo implements a programmatic :ref:`Java` query API. .. index:: BigTable .. _bigtable: BigTable ~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/BigTable | Docs: https://research.google.com/archive/bigtable.html Google BigTable is a open reference design for a distributed key/value column store and a proprietary production database system. * BigTable functionality overlaps with that of the newer Pregel and Spanner distributed databases. * Cloud BigTable is a :ref:`PaaS` / :ref:`SaaS` service with :ref:`Java` integration through an adaptation of :ref:`HBase` API. .. index:: Beam .. index:: Apache Beam .. _apache-beam: .. _beam: Apache Beam ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | Homepage: https://beam.apache.org/ | Src: git://git.apache.org/beam.git | Src: https://github.com/apache/beam | Docs: https://beam.apache.org/documentation/ Apache Beam is an open source batch and streaming parallel data processing framework with support for Apache Apex, Apache Flink, `Apache Spark`_, and Google Cloud Dataflow. .. index:: Cassandra .. _cassandra: Cassandra ~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Apache_Cassandra | Homepage: https://cassandra.apache.org/ | Download: https://cassandra.apache.org/download/ | Source: git https://github.com/apache/cassandra | Docs: https://wiki.apache.org/cassandra/FrontPage | Docs: https://wiki.apache.org/cassandra/GettingStarted | Docs: https://docs.datastax.com/en/cassandra-oss/3.x/ | Docs: https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archIntro.html Apache Cassandra is an :ref:`open source` distributed key/value super column store written in :ref:`Java`. * Cassandra is similar to :ref:`AWS` Dynamo and :ref:`BigTable`. * Cassandra supports :ref:`MapReduce`-style computation. * Cassandra supports :ref:`HDFS`. * Facebook is one primary supporter of :ref:`Cassandra` development. .. index:: Hadoop .. _hadoop: Hadoop ~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Apache_Hadoop | Homepage: https://hadoop.apache.org/ | Download: https://hadoop.apache.org/releases.html | Source: git git://git.apache.org/hadoop.git | Source: git https://github.com/apache/hadoop | Docs: https://hadoop.apache.org/docs/current/ | Docs: https://hadoop.apache.org/docs/stable/ Apache Hadoop is a collection of :ref:`open source` distributed computing components; particularly for :ref:`MapReduce`-style computation over Hadoop :ref:`HDFS` distributed filesystem. .. index:: HBase .. _hbase: HBase ~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Apache_HBase | Homepage: https://hbase.apache.org/ | Download: https://www.apache.org/dyn/closer.cgi/hbase/ | Source: git git://git.apache.org/hbase.git | Source: git https://github.com/apache/hbase | Docs: https://hbase.apache.org/book.html | Docs: https://hbase.apache.org/book.html#conceptual.view Apache HBase is an :ref:`open source` distributed key/value super column store based on :ref:`BigTable` written in :ref:`Java` that does :ref:`MapReduce`-style computation over Hadoop :ref:`HDFS`. * HBase has a :ref:`Java` API, a :term:`RESTful API`, an `avro` API, and a :ref:`Thrift` API .. index:: Apache Hive .. index:: Hive .. _hive: Hive ~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Apache_Hive | Homepage: https://hive.apache.org/ | Download: https://hive.apache.org/downloads.html | Docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual | Docs: https://hive.apache.org/javadocs/r1.2.1/api/index.html | Docs: https://cwiki.apache.org/confluence/display/Hive/Home Apache Hive is an :ref:`open source` data warehousing platform written in :ref:`java`. * Hive can read data from :ref:`HDFS` and :ref:`S3`. * :ref:`Hive` supports :ref:`Avro`, Parqet. * HiveQL is a :ref:`SQL`-like language. .. index:: Apache Parquet .. index:: Parquet .. _parquet: Parquet ~~~~~~~~ | Homepage: https://parquet.apache.org/ | Download: https://parquet.apache.org/downloads/ | Source: git git://git.apache.org/incubator-parquet-mr.git | Source: git https://github.com/apache/parquet-mr | Standard: https://github.com/apache/parquet-format | Docs: https://parquet.apache.org/documentation/latest/ Apache Parqet is an :ref:`open source` columnar storage format for :ref:`distributed databases` Apache Parquet is a columnar storage format available to any project in the :ref:`Hadoop` ecosystem, regardless of the choice of data processing framework, data model or programming language. * The *Parquet format* and *Parquet metadata* are encoded with :ref:`Thrift`: * See also: :ref:`CSV`, :ref:`CSVW` .. index:: Presto .. _presto: Presto ~~~~~~~~ | Homepage: https://prestodb.io/ | Source: git https://github.com/facebook/presto | Docs: https://prestodb.io/docs/current/ Presto is an :ref:`open source` distributed query engine designed to query multiple datastores at once. * Presto has connectors for :ref:`Cassandra`, :ref:`Hive`, JMX, Kafka, :ref:`MySQL`, and :ref:`PostgreSQL`. * Presto does not yet support :ref:`SPARQL`. * Presto does not yet support :ref:`SPARQL` federated query. .. index:: Apache Spark .. index:: Spark .. _spark: Spark ~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Apache_Spark | Homepage: https://spark.apache.org/ | Download: https://spark.apache.org/downloads.html | Source: git git://git.apache.org/spark.git | Source: git https://github.com/apache/spark | Docs: https://spark.apache.org/documentation.html | Docs: https://spark.apache.org/docs/latest/ | Docs: https://spark.apache.org/docs/latest/cluster-overview.html | Docs: https://spark.apache.org/docs/latest/quick-start.html Apache Spark is an :ref:`open source` distributed computation platform. * Spark is in-memory; and 100x faster than :ref:`MapReduce`. * Spark can work with data in/over/through :ref:`HDFS`, :ref:`Cassandra`, :ref:`OpenStack` :ref:`Swift`, :ref:`AWS` :ref:`S3`, and the local filesystem. * Spark can be provisioned by YARN or :ref:`Mesos`. * Spark has :ref:`Java`, :ref:`Scala`, :ref:`Python`, and `R` :term:`language APIs `. * Spark set a world sorting benchmark record in 2014: https://spark.apache.org/news/spark-wins-daytona-gray-sort-100tb-benchmark.html .. index:: GraphX .. _graphx: ========= GraphX ========= | Wikipedia: https://en.wikipedia.org/wiki/Apache_Spark#GraphX | Homepage: https://spark.apache.org/graphx/ | Docs: https://spark.apache.org/docs/latest/graphx-programming-guide.html GraphX is an :ref:`open source` graph query framework built with :ref:`Spark`. .. index:: Distributed Algorithms .. _distributed algorithms: Distributed Algorithms ++++++++++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Distributed_algorithm | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Distributed_algorithms :ref:`Distributed Databases` and distributed :ref:`information systems` implement :ref:`Distributed Algorithms` designed to solve for :ref:`confidentiality`, :ref:`integrity`, and :ref:`availability`. As separate records / statements to be ``yield``-ed or emitted: * :ref:`Distributed Databases` implement :ref:`Distributed Algorithms`. * Distributed :ref:`information systems` implement :ref:`Distributed Algorithms`. See Also: * https://en.wikipedia.org/wiki/Parallel_computing * https://en.wikipedia.org/wiki/Supercomputer#Distributed_supercomputing * .. index:: Distributed Computing Problems .. _distributed computing problems: Distributed Computing Problems ```````````````````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Distributed_computing | WikipediaCategory: https://en.wikipedia.org/wiki/Category:Distributed_computing_problems * ``_ * https://en.wikipedia.org/wiki/Leader_election * https://en.wikipedia.org/wiki/Distributed_concurrency_control * https://en.wikipedia.org/wiki/Distributed_lock_manager * .. index:: Non-blocking algorithm .. _non-blocking algorithm: Non-blocking algorithm ``````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Non-blocking_algorithm * ``__ * See: :ref:`file locking` .. index:: Distributed Hash Table .. index:: DHT .. _dht: DHT ``` | Wikipedia: https://en.wikipedia.org/wiki/Distributed_hash_table A DHT (Distributed Hash Table*) is a distributed key value store for storing values under a consistent file checksum hash which can be looked up with e.g. an exact string match. * At an API level, a DHT is a key/value store. * :term:`DNS` is basically a DHT * :ref:`distributed databases` all implement some form of a structure simiar to a DHT (a replicated *keystore*); often for things like bloom filters (for fast search) * :ref:`Cassandra`, :ref:`Ceph`, :ref:`GlusterFS` * :ref:`browsers` that maintain a local cache could implement a DHT (e.g. with :ref:`websockets` or :ref:`webrtc`) * :ref:`webtorrent` (:ref:`Javascript`, :ref:`Node.js`, :ref:`WebRTC`) * :ref:`BitTorrent` :term:`magnet URIs ` (:term:`URNs `) contain a *key*, which is a *checksum* of a manifest, which can be retrieved from a :ref:`DHT`:: # . # key_uri = "IJBDPDSBT4QZLBIJ6NX7LITSZHZQ7F5I" dht = DHT(); value = dht.get(key_uri) * :ref:`named data networking` is also essentially a cached :ref:`DHT`. .. index:: MapReduce .. _mapreduce: MapReduce ```````````` | Wikipedia: https://en.wikipedia.org/wiki/MapReduce MapReduce is a :ref:`distributed algorithm ` for distributed computation. * :ref:`BigTable`, :ref:`Hadoop`, :ref:`HDFS`, `Disco`, :ref:`DDFS` all support :ref:`mapreduce`-style computation. * See also: bashreduce .. index:: Paxos .. _paxos: Paxos ``````` | Wikipedia: ``__ | Docs: ``__ * ``__ * :ref:`BigTable`, Spanner, Megastore * :ref:`Ceph` * :ref:`neo4j` .. index:: Raft .. _raft: Raft `````` | Wikipedia: ``__ | Homepage: https://raft.github.io/ * ``__ * Leader / Candidate / Follower * Heartbeat (Leader -> Followers [-> Candidates]) * :ref:`etcd` (:ref:`CoreOS`, :ref:`Kubernetes`, :ref:`configuration management`) * :ref:`skydns` .. index:: BSP .. index:: Bulk Synchronous Parallel .. _bsp: Bulk Synchronous Parallel ```````````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Bulk_synchronous_parallel Bulk Synchronous Parallel (*BSP*) is a :ref:`distributed algorithm ` for distributed computation. * Google Pregel, Apache Giraph, and Apache :ref:`Spark` are built for a :ref:`bsp` model * :ref:`mapreduce` can be expressed very concisely in terms of BSP. .. index:: Distributed Computing Protocols .. _distributed computing protocols: Distributed Computing Protocols +++++++++++++++++++++++++++++++++ .. contents:: :local: * https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats * :ref:`Programming Languages`' implementations: - https://en.wikipedia.org/wiki/Java_Remote_Method_Invocation - https://twisted.readthedocs.io/en/latest/core/howto/pb-usage.html * :ref:`ws-` * :ref:`REST` (:term:`RESTful HTTP API `) * :ref:`Protocol Buffers` * :ref:`Thrift` * :ref:`Avro` * :ref:`msgpack` * :ref:`WebSocket ` * :ref:`WebRTC` * :ref:`JSON-WSP` * :ref:`LDP` (:ref:`Turtle` or :ref:`JSON-LD` :ref:`RDF` over :ref:`HTTP`) * :ref:`REST` * :ref:`WAMP` * https://en.wikipedia.org/wiki/List_of_web_service_protocols .. index:: CORBA .. _corba: CORBA `````` | Wikipedia: https://en.wikipedia.org/wiki/Common_Object_Request_Broker_Architecture CORBA (*Common Object Request Broker Architecture*) is a :ref:`distributed computing protocol ` now defined by :ref:`OMG` with implementations in many languages. * CORBA is a distributed object-oriented protocol for platform-neutral distributed computing. * CORBA objects are marshalled and serialized according to an IDL (*Interface Definition Language*) with a limited set of datatypes (see also :ref:`XSD`, :ref:`Distributed Computing Protocols`: :ref:`Protocol Buffers`, :ref:`Thrift`, :ref:`Avro`, :ref:`msgpack`, :ref:`JSON-LD`) * CORBA ORBs (*Object Request Brokers*) route requests for objects (see also :ref:`ESB`) * CORBA objects are either in local address space (see also ``file://`` / ``/dev/mem``) or remote address space (see also dereferencable :ref:`HTTP`, :ref:`HTTPS` :term:`URLs ` ) * CORBA objects can be looked up by reference (by :term:`URL`, or *NameService* (see also :term:`DNS`)) * "CORBA Objects are passed by reference, while data (integers, doubles, structs, enums, etc.) are passed by value" -- https://en.wikipedia.org/wiki/Common_Object_Request_Broker_Architecture#Features .. index:: Message Passing .. _message passing: Message Passing ````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Message_passing | https://en.wikipedia.org/wiki/Messaging_pattern | https://en.wikipedia.org/wiki/Message_passing_in_computer_clusters | https://en.wikipedia.org/wiki/Active_message * https://en.wikipedia.org/wiki/Message_passing#Synchronous_versus_asynchronous_message_passing * https://en.wikipedia.org/wiki/Dataflow_programming * https://en.wikipedia.org/wiki/Flow-based_programming * https://en.wikipedia.org/wiki/Spreadsheet * https://en.wikipedia.org/wiki/Reactive_programming * https://en.wikipedia.org/wiki/Actor_model_implementation * https://en.wikipedia.org/wiki/Factor_graph#Message_passing_on_factor_graphs * :ref:`BSP` .. index:: ESB .. index:: Enterprise Service Bus .. _esb: ESB ```` | Wikipedia: https://en.wikipedia.org/wiki/Enterprise_service_bus An ESB (*Enterprise Service Bus*) is a centralized distributed computing component which relays (or *brokers*) messages with or as a message queue (*MQ*). * ESB is generally the name for a message queue / task worker pattern in the :ref:`SOA` (particularly :ref:`Java`). * ESBs host service endpoints for message producers and consumers. * ESBs can also maintain state, or logging. * ESB services can often be described with e.g. :ref:`WSDL` and/or :ref:`JSON-WSP`. * https://en.wikipedia.org/wiki/Category:Message-oriented_middleware .. index:: MPI .. _mpi: MPI ```` | Wikipedia: https://en.wikipedia.org/wiki/Message_Passing_Interface MPI (*Message Passing Interface*) is a distributed computing protocol for structured data interchange with implementations in many languages. * Many supercomputing applications are built with MPI. * MPI is faster than :ref:`JSON`. * :ref:`IPython` ``ipyparallel`` supports MPI: https://ipyparallel.readthedocs.io/en/latest/ .. index:: XML-RPC .. _xml-rpc: XML-RPC `````````` | Wikipedia: https://en.wikipedia.org/wiki/XML-RPC :ref:`XML` Remote Procedure Call defines method names with parameters and values for making function calls with XML. * Python ``xmlrpclib``: https://docs.python.org/2/library/xmlrpclib.html https://docs.python.org/3/library/xmlrpc.client.html https://docs.python.org/3/library/xmlrpc.server.html See also: * :ref:`JSON-RPC` * ~:ref:`C` structs: :ref:`Protocol Buffers`, :ref:`Thrift`, :ref:`Avro` * :ref:`SOA` Web Services: :ref:`ws-`, :ref:`WSDL` * :ref:`ROA` Web Services: :ref:`REST` .. index:: JSON-RPC .. _json-rpc: JSON-RPC `````````` | Wikipedia: https://en.wikipedia.org/wiki/JSON-RPC | Specification: https://www.jsonrpc.org/specification .. index:: Avro .. index:: Apache Avro .. _avro: Avro `````` | Wikipedia: https://en.wikipedia.org/wiki/Apache_Avro | Homepage: https://avro.apache.org/ | Standard: https://avro.apache.org/docs/current/spec.html | Standard: https://avro.apache.org/docs/current/trevni/spec.html | Download: https://avro.apache.org/releases.html#Download | Docs: https://avro.apache.org/docs/current/ | Docs: https://avro.apache.org/docs/current/gettingstartedjava.html | Docs: https://avro.apache.org/docs/current/api/java/ | Docs: https://avro.apache.org/docs/current/gettingstartedpython.html | Docs: https://avro.apache.org/docs/current/api/c/ | Docs: https://avro.apache.org/docs/current/api/cpp/html/ | Docs: https://avro.apache.org/docs/current/api/csharp/ Apache Avro is an RPC distributed computing protocol with implementations in many languages. * Avro *schemas* are defined in :ref:`JSON`. * Avro is similar to :ref:`Protocol Buffers` and :ref:`Thrift`, but does not require code generation. * Avro stores *schemas* within the data. seeAlso: * :ref:`JSON-LD` maps to :ref:`RDF` * :ref:`5stardata` .. index:: Protocol Buffers .. _protocol buffers: Protocol Buffers `````````````````` | Homepage: https://developers.google.com/protocol-buffers/ | Src: https://github.com/google/protobuf | Docs: https://developers.google.com/protocol-buffers/docs/overview Protocol Buffers (*PB*) is a standard for structured data interchange. * Protocol Buffers are faster than :ref:`JSON` See also: * :ref:`Thrift` * :ref:`Avro` .. index:: Thrift .. _thrift: Thrift ```````` | Wikipedia: https://en.wikipedia.org/wiki/Apache_Thrift | Homepage: https://thrift.apache.org | Src: https://github.com/apache/thrift | Docs: https://thrift.apache.org/docs/ | Docs: https://thrift.apache.org/docs/idl Thrift is a standard for structured data interchange in the style of :ref:`Protocol Buffers`. * Thrift is faster than :ref:`JSON`. See also: * :ref:`Protocol Buffers` * :ref:`Avro` .. index:: SOA .. _soa: SOA `````` | Wikipedia: https://en.wikipedia.org/wiki/Service-oriented_architecture SOA (*Service Oriented Architecture*) is a collection of :ref:`web standards` (e.g :ref:`ws-`) and architectural patterns for distributed computing. .. index:: WS- Web Services .. _ws-: WS-* ~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/List_of_web_service_specifications There are many web service specifications; many web service specifications often start with ``WS-``. * https://en.wikipedia.org/wiki/List_of_web_service_specifications * Many/most WS-* standards specify :ref:`XML`. * Some WS-* standards also specify :ref:`JSON`. .. index:: WSDL .. _wsdl: WSDL ~~~~~~~~~~~~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Web_Services_Description_Language WSDL (*Web Services Description Language*) is a :ref:`web standard ` for describing web services and the schema of their inputs and outputs. .. index:: JSON-WSP .. _json-wsp: JSON-WSP `````````` | Wikipedia: https://en.wikipedia.org/wiki/JSON-WSP JSON-WSP (:ref:`JSON` Web-Service Protocol) is a :ref:`web standard ` protocol for describing services and request and response objects. * JSON-WSP is similar in function to :ref:`WSDL` and :ref:`CORBA` IDL. See also: :ref:`Linked Data Platform (LDP) ` .. index:: ROA .. index:: Resource-Oriented Architecture .. _roa: ROA ````` | Wikipedia: https://en.wikipedia.org/wiki/Resource-oriented_architecture .. index:: REST .. index:: Representational State Transfer .. _rest api: .. _restful api: .. _restful: .. _rest: REST ~~~~~~ | Wikipedia: https://en.wikipedia.org/wiki/Representational_state_transfer | Awesome: https://github.com/marmelab/awesome-rest REST (*Representational State Transfer*) is a pattern for interacting with web resources using regular :ref:`HTTP` methods like ``GET``, ``POST``, ``PUT``, and ``DELETE``. * A REST :term:`API` is known as a RESTful API. * A REST implementation maps Create, Read, Update, Delete (CRUD) methods for URI-named collections of **resources** onto HTTP verbs like ``GET``, ``POST``, ``PATCH``. * Sometimes, a REST implementation accepts a :ref:`URL` parameter like ``?method=PUT`` e.g. for :ref:`Javascript` implementations on browsers which only support e.g. ``GET`` and ``POST``. * There are many software libraries for implementing REST API Servers: * Java, JS: Restlet: | Wikipedia: https://en.wikipedia.org/wiki/Restlet | Src: https://github.com/restlet * Ruby: Grape: | Src: https://github.com/ruby-grape/grape * Python: Django REST Framework: | Src: https://github.com/encode/django-rest-framework * There are many software libraries for implementing REST API Clients: * Python REST API client libraries: * requests: | Src: https://github.com/psf/requests | Docs: https://requests.readthedocs.io/en/master/ + httpie is a CLI utility written on top of requests: | Src: https://github.com/jkbrzt/httpie * WebTest: | Src: https://github.com/Pylons/webtest | Docs: https://webtest.readthedocs.io/en/latest/ * https://pypi.python.org/pypi/webtest-plus/ (requests-auth) * https://github.com/django-webtest/django-webtest * | Docs: https://westurner.github.io/wiki/awesome-python-testing#web-applications .. index:: WAMP .. _wamp: WAMP ````` | Wikipedia: https://en.wikipedia.org/wiki/Web_Application_Messaging_Protocol | Homepage: https://wamp-proto.org | Specification: https://tools.ietf.org/html/draft-oberstet-hybi-tavendo-wamp | Src: https://github.com/wamp-proto/wamp-proto | Docs: https://wamp-proto.org/why/ | Docs: https://wamp-proto.org/faq/ | Docs: https://wamp-proto.org/implementations/ WAMP (*Web Application Messaging Protocol*) defines Publish/Subscribe (PubSub) and Remote Procedure Call (RPC) over :ref:`WebSockets`, :ref:`JSON`, and :term:`URIs ` Using WAMP, you can have a browser-based UI, the embedded device and your backend talk to each other in real-time: * WAMP Router = Broker (PubSub topic broker) + Dealer (RPC) * WAMP can run on other transports (e.g. :ref:`msgpack`) than the preferred :ref:`WebSockets` w/ :ref:`JSON`. * :ref:`JSON-LD` * Implementations: * https://wamp-proto.org/implementations/ * https://tools.ietf.org/html/draft-oberstet-hybi-tavendo-wamp#section-6.5 WAMP Message Codes and Direction .. index:: Data Grid .. _data-grid: Data Grid ++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Data_grid .. index:: Search Engine Indexing .. _search engine indexing: Search Engine Indexing +++++++++++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Search_engine_indexing * https://en.wikipedia.org/wiki/Web_search_engine * :ref:`information retrieval` * :ref:`semantic web` :ref:`graph ` of :ref:`linked data`, : :ref:`RDFa`, :ref:`JSON-LD`, :ref:`Schema.org`. .. index:: ElasticSearch .. _elasticsearch: ElasticSearch ``````````````` | Wikipedia: https://en.wikipedia.org/wiki/Elasticsearch | Homepage: https://www.elastic.co/products/elasticsearch | Download: https://www.elastic.co/downloads/elasticsearch | Source: git https://github.com/elastic/elasticsearch | Docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html | Docs: https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html | DockerHub: https://registry.hub.docker.com/u/library/elasticsearch/ ElasticSearch is an :ref:`open source` realtime search server written in :ref:`Java` built on Apache :ref:`Lucene` with a :term:`RESTful API` for indexing :ref:`JSON` documents. * ElasticSearch supports geographical (bounded) queries. * ElasticSearch can build better indexes for faster search response times when *ElasticSearch Mappings* are specified. * ElasticSearch mappings can be (manually) transformed to :ref:`JSON-LD` ``@context`` mappings: https://github.com/westurner/elasticsearchjsonld .. index:: Haystack .. _haystack: Haystack `````````` | Homepage: http://haystacksearch.org/ | Source: git https://github.com/django-haystack/django-haystack | PyPI: https://pypi.python.org/pypi/django-haystack | Docs: https://django-haystack.readthedocs.io/en/latest/ Haystack is an :ref:`open source` :ref:`Python` Django API for a number of search services (e.g. :ref:`solr`, :ref:`elasticsearch`, :ref:`Whoosh`, :ref:`Xapian`). .. index:: Apache Lucene .. index:: Lucene .. _lucene: Lucene ```````` | Wikipedia: https://en.wikipedia.org/wiki/Lucene | Homepage: https://lucene.apache.org/ | Download: https://lucene.apache.org/core/downloads.html | Source: https://github.com/apache/lucene-sol | Docs: https://lucene.apache.org/core/ | Docs: https://lucene.apache.org/core/5_2_0/ Apache Lucene is an :ref:`open source` search indexing service written in :ref:`java`. * :ref:`ElasticSearch`, :ref:`Nutch`, and :ref:`Solr` are implemented on top of Lucene. .. index:: ApacheNutch .. index:: Nutch .. _nutch: Nutch ``````` | Wikipedia: https://en.wikipedia.org/wiki/Nutch | Homepage: https://nutch.apache.org/ | Download: https://nutch.apache.org/downloads.html | Source: git git://git.apache.org/nutch.git | Source: git https://github.com/apache/nutch | Docs: https://nutch.apache.org/apidocs/apidocs-2.3/index.html | Docs: https://wiki.apache.org/nutch/ | Docs: https://wiki.apache.org/nutch/#Tutorials Apache Nutch is an :ref:`open source` distributed web crawler and search engine written in :ref:`Java` and implemented on top of :ref:`Lucene`. * Nutch has a pluggable storage and indexing API with support for e.g. :ref:`Solr`, :ref:`ElasticSearch`. .. index:: Solr .. index:: Apache Solr .. _solr: Solr ``````` | Wikipedia: | Homepage: https://lucene.apache.org/solr/ | Download: https://lucene.apache.org/solr/mirrors-solr-latest-redir.html | Docs: https://lucene.apache.org/solr/resources.html | Docs: https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ | Docs: https://wiki.apache.org/solr/ Apache Solr is an :ref:`open source` web search platform written in :ref:`Java` and implemented on top of :ref:`Lucene`. .. index:: Whoosh .. _whoosh: Whoosh ```````` | Homepage: | PyPI: https://pypi.python.org/pypi/Whoosh | Docs: https://pythonhosted.org/Whoosh/ Whoosh is an :ref:`open source` search indexing service written in :ref:`Python`. .. index:: Xapian .. _xapian: Xapian ```````` | Wikipedia: https://en.wikipedia.org/wiki/Xapian | Homepage: https://xapian.org/ | Docs: https://xapian.org/docs/ | Docs: https://xapian.org/docs/apidoc/html/inherits.html Xapian is an :ref:`open source` search library written in :ref:`C++` with bindings for many languages. .. index:: Information Retrieval .. _information retrieval: Information Retrieval ``````````````````````` | Wikipedia: https://en.wikipedia.org/wiki/Information_retrieval | Docs: https://nlp.stanford.edu/IR-book/information-retrieval.html * Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, *Introduction to Information Retrieval*, Cambridge University Press. 2008. https://nlp.stanford.edu/IR-book/ .. index:: Time Standards .. _time standards: Time Standards ----------------- .. index:: International Atomic Time .. _iat: International Atomic Time (IAT) ++++++++++++++++++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/International_Atomic_Time International Atomic Time (*IAT*) is an international standard for extremely precise time keeping; which is the basis for :ref:`UTC` Earth time and for `Terrestrial Time` (Earth and Space). .. index:: Long Now Dates .. _long now dates: Long Now Dates ++++++++++++++++ | Homepage: https://en.wikipedia.org/wiki/Long_Now_Foundation | Docs: https://en.wikipedia.org/wiki/Year_10,000_problem :: 2015 # ISO8601 date 02015 # 5-digit Y10K date .. index:: Decimal Time .. _decimal time: Decimal Time ++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Decimal_time * https://en.wikipedia.org/wiki/Decimal_time#Conversions * https://en.wikipedia.org/wiki/Decimal_time#Fractional_days * https://en.wikipedia.org/wiki/Leap_year (~365.25 days/yr) * https://en.wikipedia.org/wiki/Leap_second (rotation time ~= atomic time) .. index:: Unix Time .. _unix time: Unix Time +++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Unix_time Defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds Unix time is the delta in seconds since ``1970-01-01T00:00:00Z``, not counting leap seconds: .. code:: 0 # Unix time 1970-01-01T00:00:00Z # ISO8601 timestamp 1435255816 # Unix time 2015-06-25T18:10:16Z # ISO8601 timestamp .. note:: Unix time does not count leap seconds. https://en.wikipedia.org/wiki/Unix_time#Leap_seconds See also: `Swatch Internet Time` (`Beat Time`) .. index:: Year Zero .. index:: 0 (Year) .. _year zero: Year Zero ++++++++++ | Wikipedia: ``__ * The Gregorian Calendar (e.g. :ref:`Common Era `, `Julian Calendar`) does not include a `year zero`; (1 BCE is followed by 1 CE). * :ref:`Astronomical year numbering` includes a `year zero`. * :ref:`Before Present ` dates do not specify a `year zero`. (because they are relative to the current (or *published*) date). .. index:: Astronomical year numbering .. _astronomical year numbering: Astronomical year numbering ++++++++++++++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Astronomical_year_numbering * Astronomical year numbering includes a year zero: Tools with support for :ref:`astronomical year numbering`: * AstroPy is a :ref:`Python` library that supports astronomical year numbering: https://astropy.readthedocs.io/en/latest/time/ .. index:: Before Present .. index:: BP .. _bp: Before Present (BP) ++++++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Before_Present Before Present (*BP*) dates are relative to the current date (or *date of publication*); e.g. "2.6 million years ago". .. index:: Common Era .. _ce: Common Era (CE) +++++++++++++++++ | Wikipedia: https://en.wikipedia.org/wiki/Common_Era | Docs: https://en.wikipedia.org/wiki/Pax_Romana | Docs: :ref:`Year Zero` * BCE (*Before Common Era*) == BC * https://en.wiktionary.org/wiki/BCE * https://en.wiktionary.org/wiki/BC * CE (*Common Era*) == **AD** (*Anno Domini*) * https://en.wiktionary.org/wiki/CE * https://en.wiktionary.org/wiki/AD Common Era and :ref:`year zero`:: 5000 BCE == -5000 CE 1 BCE == -1 CE 0 BCE == 0 CE 0 CE == 0 BCE 1 CE == 1 CE 2015 CE == 2015 CE .. note:: Are these off by one? * :ref:`astronomical year numbering` -- you must convert from julian/gregorian dates to :ref:`astronomical year numbering`. * :ref:`year zero` -- they are off by one ("there is no year zero"). Common Era and :ref:`Python` datetime calculations: .. code:: python # Paleolithic Era (2.6m years ago -> 12000 years ago) # "2.6m years ago" = (2.6m - (2015)) BCE = 2597985 BCE = -2597985 CE 2597985 BCE == -2597985 CE ### Python datetime w/ scientific notation string formatter >>> import datetime >>> year = datetime.datetime.now().year >>> '{:.6e}'.format(2.6e6 - year) '2.597985e+06' ### Python datetime supports (dates >= 1 BCE). >>> datetime.date(1, 1, 1) datetime.date(1, 1, 1) >>> datetime.datetime(1, 1, 1) >>> datetime.datetime(1, 1, 1, 0, 0) ### Python pypi:arrow supports (dates >= 1 BCE). >>> !pip install arrow >>> arrow.get(1, 1, 1) ### astropy.time.Time supports (1 BCE <= dates >= 1 CE) and/or *Year Zero* ### https://astropy.readthedocs.io/en/latest/time/ >>> !conda install astropy >>> import astropy.time >>> # TimeJulianEpoch (Julian date (jd) ~= Common Era (CE)) >>> astropy.time.Time(-2.6e6, format='jd', scale='utc')