Different Conceptions of Learning
Function Approximation vs. Self-Organization
Table of Contents
A paper by Pei Wang. [pdf:] [Paper: cis.temple.edu] [Presentation Slides: csi.temple.edu]
Learning can be of two types (there might be other types):
- Algorithmic Learning
- Learning follows an algorithm
- Takes training data as input and then outputs a model
- The model is an algorithm that carries out domain task
- Inferential Learning
- System's knowledge bases is represented in a set of beliefs
- Learning is updating the topological structures as well as weights of those beliefs
- Updates is done through inference rules
- The system can be queried for answers.
Inferential Learning appeared in the early days of ML but now lost favor to Deep Learning. However AGI research still has many challenges and the authors think that compared to deep neural networks, inferential learning may provide better alternative as learning paradigm of AGI [Page 9 / Conclusions].
Differences:
Aspect | Algorithm Learning (NN) | Inferential Learning (NARS) |
---|---|---|
Representation | Vectors | Sentences of formal language |
Distributed | In-between local and distributed (Page 7) | |
Network | Layered network | Graph Network |
Fixed topology | Dynamic topology | |
Task | Input/Output Mapping | Any question can be asked |
Learning | Training phase | Life long learning |
Learning Algorithm | Backprop, Gradient Descent | Inference Rules of Logic |
Algorithmic learning is "using an algorithm to learn an algorithm". While in inferential learning there is a dynamic interaction between multiple algorithms which is more general but less predictible than the input-output mapping using a single algorithm paradigm of Algorithmic learning.
1. NARS - Non Axiomatic Reasoning System
NARS is an example of a system that does inferential learning. It is based on the following definition of intelligence:
“Intelligence” is the ability for a system to adapt given insufficient knowledge and resources. That is, the system must depend on finite resources to make real-time response while being open to unanticipated problems and events.
Consequently, the system’s solutions are usually not absolutely optimal, but the best the system can find at the time, and the system could always do better if it had more knowledge and resources. [Different Conceptions of Learning.pdf: Page 3]
NARS consists of a knowledge base, a collection of inference rules and a control mechanism that applies the rules and updates, queries the knowledge base.
- Its knowledge base is represented as a graph of
- Nodes = Representing terms
- Link = Statement about those terms (with weight = truth value)
- Along with priority values of the nodes and links that affect how the terms and statements are choosen for inference.
- As input comes
- new nodes, links are formed,
- the weights of old links are updated,
- and priority of statements are updated
- The statements of NARS have a truth value of belief assigned to them. Which is a pair of two numbers: Frequency, Confidence
- Frequency = ratio of positive evidence among total evidence
- Confidence = ratio of current evidence to future evidence that can arrive
Thus NARS is not just do purely deductive inference and but also other type of logical inference. It can also compose terms (using operation similar to set operations: union, intersection, difference) to create new terms, and do inteference on statments about them [Page 5].
An example of how generalization works in NARS is: Say the system gets an observatoin "Tweety flies". (Tweety is a cartoon character, a bird) Then the system can do following generalization:
- "Canaries fly" using the informmation that "Tweety is a canary".
- "Birds fly" using the information that "Tweety is a bird"
- "Animals fly" using the information that "Tweety is an animal".
The last is an over-generalization which will loose priority due to low frequency of evidence (or through negative evidence), the first is an under-generalization which will loose priority through low confidence (from less total evidence). The second "Birds fly" which is a proper generalization will get high priority due to higher frequency of positive evidence compared to other two.
Other properties of NARS are:
Statements are of Subject-copula-predicate format (copula 1 means connecting word). Statement can be of different form, denoting inheritance, equivalence, implication.
E.g. Some statements are of the form: \(S \rightarrow P \langle t \rangle\) which means `S` is a generalization of `P`. Here \(t\) is truth-value of belief, and \(\rightarrow\) is inheritance copula 1.
- Statements are themselves terms too. So there can be higher order statements and inferences.
- Since resources are not infinite, any real time system that is open to new information needs to forget. [Page 4]
- Absolute Forgetting: Some concepts are totally deleted
- Relative Forgetting: As some concepts are use infrequently, their priority value start decreasing and thus they are chosen less in inference process.
Footnotes:
Copula means a connecting word, in particular a form of the verb be
connecting a subject and complement.