Otoom home page

OCTAM v.2 - the artificial mind

Contents

Basics
Functional overview
Settings - and what they mean
Where to go from here
Tests and Error messages
Contact and bug report
Copyright and credits

 

Functional overview

This section serves to outline the program's operations from a functional perspective, what kind of actions OCTAM performs. How to interact with those actions is explained in Settings - and what they mean.

General
Video
Audio
Keyboard
Footnote - The current state of AI


anchor arrowGeneral

OCTAM represents a simulation of the mind's dynamics. See the footnote regarding the current state of AI.

Its AI Engine follows the same layout as in the prototype OtoomCM, albeit on a much larger scale and with different in- and outputs (please note that the comments there refer to that prototype, which includes the linked page Further developments). The input region (red) transmits the numbers into the inner matrix (green) to those nodes each input node is connected to. Each inner matrix node represents the parent node of a tree, and a cycle consists of accessing each tree in turn and following the branches until the designated depth is reached (a depth-first traversal). Then the next node with its tree is dealt with. Once the inner matrix nodes have been processed, each output region node (blue) receives the values from those inner matrix nodes it is connected to. The in- and output regions are divided into the number of input types (currently video, audio, and keyboard) designated as regions, and the number of nodes within each region can be set by the user.

The main matrix nodes stand for the brain's neurons, each node is another matrix (called element matrix here) which simulates a neuron's internal states, the connections between nodes follow the layout of neuronal connections, and the adjustment algorithm acting between the nodes represents the dynamics found in the synapses through the variance of electro-chemical processes there which allow information to pass between the neurons and/or the nodes.

Therefore, any interaction between sensory organs and the brain in humans, and the web cam, microphone and so on in OCTAM, is ultimately interpreted through the internal states of the neurons in the former and within the nodes in the latter. The result, in other words, what we as humans perceive through our senses and what OCTAM understands through its input, gives rise to the system of mind, natural or artificial.

The sound coming from the loudspeakers and the visual display on the screen are essentially an interpretation of the relevant node states inside the AI Engine. For example, the colours and shapes you see on the screen when the video signal is directed through the AI Engine are an interpretation using colour values and motion parameters. Strictly speaking, in themselves they are meaningless although the manner of interpretation is applied consistently. This is no different from say, the action of the muscles in the mouth and vocal cords which are triggered by the internal states of the brain's neurons. They have meaning for us because the sounds we produce (ie, the words) have become familiar through the feedback loops making up our experience. See ShapeWorld for some details on the geometry of those shapes and their accompanying audio.

The electro-chemical processes in the brain as well as the algorithm used in OCTAM are essentially the only drivers, regardless of input in either. There is no preformed grammar, nor data base, nor lookup table. Input to the brain is processed in terms of the formalisms evolved during evolution (based on the inherent properties of neurons, dendrites, etc, and ultimately on what the molecular compounds and the underpinning chemicals are capable of). In the program it is the adjustment algorithm that leads to the momentary state of the nodes.

What behaviour such a system presents to us in the end is a function of input across the system's timeline. Since input gives rise to internal states which are representative of the input, and which in turn are projected outwards as resultant behaviour, forming the input for any observer and thus to their internal states yet again, creates the feedback loop that has been in place ever since humans and animals roamed the earth.

As even a cursory observation will make obvious, the number of neurons as well as their connection density determine the eventual behaviour of the organism. A bee will not be as responsive to a variety of input compared to a mouse, a dog is more responsive than a mouse, and in humans there can be a productive response to a formula or symphony. Yet even bees show a remarkable sophistication when it comes to processing their environment in terms of what they need, eg, the identification of flowers, or how to navigate there and back. (It is interesting how, even when it comes to humans, intelligence is often under- as well as overestimated). See List of animals by number of neurons for a comparison across the species. OCTAM can have as many as over 80,000 main nodes (depending on the memory available at the time of loading). Note however that when it comes to the organic version a brain also needs to deal with the contingencies of the body, something that is absent in OCTAM.

Using OCTAM creates its own feedback loop between it and the user. What happens in front of the web cam and what sound reaches the microphone is what OCTAM processes to form its own internal states. Unlike a human, where 'unlearning' can be difficult if not impossible, with OCTAM sessions can be terminated, either discarded or saved to a file, and different types of sessions can be enacted so as to establish certain 'personalities' or 'accents' in relation to specific sessions (as much as there can be a 'personality' in a ~80k brain).

Therefore, how well OCTAM can process input depends on the size of its main matrix and the relationship between it and the size of the node matrices and the number of connections between nodes. Generally speaking, the larger the overall configuration, the deeper the processing that takes place; keeping in mind that the system operates all the time, whether there is input or not.

Although the program can be started using the parameters established during OCTAM's initialisation process (based on the memory available at the time of loading, not exactly the same every time), the settings can be changed. Either downwards, or using less main matrix nodes but many more nodes in each node matrix (ie, the element matrices). Or, decreasing both but having as many connections per node as possible. See Settings - and what they mean for more details.

How OCTAM behaves now is a matter of such a combination of parameters as well as what follows as input. Users can play around with different scenarios as long as they remain aware that OCTAM's mind has changed fundamentally. Note that loading a previous session from file also means setting OCTAM's parameters to their previous values. Once such a file is loaded the session continues where it left off before.

Above all, the program represents a complex, dynamic, that is nonlinear, system. It is fundamentally different from linear systems, such as a bicycle or a car engine. Nature, or reality if you will, is essentially nonlinear. Yet even in a bicycle its underlying nonlinear nature will become apparent as soon as its wheels are spun at very high speeds. See The mechanics of chaos: a primer for the human mind for a brief explanation of what nonlinearity means. CauseF is an interactive program demonstrating nonlinear behaviour in a number of contexts (it is not AI).

To learn more about the internal processes see the OtoomCM page, OWorm, as well as the FAQs and the links from there.

What follows is more detail about the inputs and how they are used by OCTAM.

 

anchor arrowVideo

Visual input is provided by a web cam. Frame by frame the signal is compacted in order to match the selected number of video input nodes by averaging the total number of bytes contained in each frame. From the video input nodes the data are sent to the connected nodes from where they are processed further, creating their own specific states in relation with each other. These are the mutual affinity relationships defining the overall state of the matrix.

The results of the internal states are sent to the respective output nodes for video (and audio), and these values are used to define the various parameters defining the behaviour of the shapes (as well as the sound directed to the speakers).

The visuals (the motions of the shapes displayed in their separate window) can be compared to the body language of an organism, and how nuanced such manifestations are depends on the number of neurons and their connections, what could be called the sophistication of the brain. Body language is first and foremost an expression of the organism (from insects to humans), and a commensurate interpretation can only be done at the level of cognition innate to that organism. On the other hand, the greater the sophistication, the easier it is for us humans to somehow 'make sense' out of what we see before us even if the perceived context is not shared by the observer - but it can still be tricky.

Hence what OCTAM and its ~80k brain comes up with can only ever represent the level of 'understanding' it is capable of.

 

anchor arrowAudio

Just as the video data are processed within OCTAM's nodes, so are the data from the web cam's microphone directed to the relevant recipients in the matrix. At this point the data are just that - integers which are used for the calculations in the adjustment algorithm (see General).

As they create the nodes' internal states, the resultant output needs to be formatted in order to enable sound based on the digital framework of a computer. There the volume of data, the duration of play, as well as the frequency and sample rate variables, they all are necessary to create some sound (something has to come out of the speakers, and it can't be just white noise either).

Especially the sample rate (how many regular 'snap shots' per second are taken of the entire set of data) plays an integral part in the result. See Of frogs and things for some audio samples where the sample rate is changed although the actual sound data remain the same.

In an organism the sound, the pressure waves transmitted through a medium (air, water, rock, ...), reaches the receptors and the nerves transmit that information to the brain. There, its configuration determines what exactly is heard by that organism. Over the evolutionary time line what is heard is sufficient to enable the organism to respond in terms of what it needs and how it can respond in turn. Therefore insects are receptive to a certain frequency range, the same goes for mammals and of course humans. What goes for the sample rate can be seen as the innate capacity of the brain to distinguish between the elements making up the sound. Frequency is one factor, but how lower as well as higher frequencies can be processed depends on the brain's overall capability. Under ordinary circumstances all goes well, but the article Auditory illusion features a number of examples demonstrating their illusory effects on us once the circumstances are outside of what has been made familiar. Note that the illusion comes from a combination of certain frequencies, but it is the sample rate which allows such frequencies to be present in the first place. Change the sample rate and the combination has changed as well.

What comes out of the speakers from OCTAM is played at the standard sample rate of 44100Hz through the sound card (notwithstanding the speaker properties on the computer), but the source usually contains different sample rates because their value is defined by the output from the matrix. The situation can be compared to us listening to those 'frogs' referred to in Of frogs and things - what we hear is not necessarily what the 'frogs' would hear. The same goes for real animals.

Therefore the sound OCTAM produces is first and foremost its own sound in a very real sense, and what the audio means to us is a matter of human-based subjective interpretation.

 

anchor arrowKeyboard

Input from the keyboard (the ASCII values representative of each key) is sent to the keyboard region of the matrix. It is a legacy from the previous versions, where such input was of higher significance.

By default the keyboard region with its nodes is kept to a minimum, but can be enlarged via the Settings window under the AI Engine tab.

 

anchor arrowFootnote - The current state of AI

At the time of writing (July 2024 and onwards), there are now a number of AI applications available to generate verbal and/or visual context (one example of the former would be OpenAI's ChatGPT, a Large Language Model); hence the label 'generative AI' given to this version of artificial intelligence. They essentially follow a top-down approach, which means their purposefully created algorithms use extensive data bases to realign and resequence their content in order to assemble an output. Since that selection process is a function of the initial prompts (which in turn become part of the data pool - an ongoing feedback process), targeted input in terms ideological leanings can be induced, as Dr Tim McIntosh has pointed out in his article The mind behind the machine (see Copyrights and credits). Such reorientation is somewhat similar to algorithms that steer social media users towards content they themselves have defined during their visits to those sites. While the code is necessary to apply some kind of discriminatory filter to an available data pool comprising up to hundreds of millions of entries, it can be misused when the purpose is to induce trending by a third party (many examples are found in Maria Ressa's How To Stand Up to a Dictator (see Copyrights and credits).

OCTAM is different in that it uses a bottom-up approach, that is output is generated by affinity relationships that emerge from within the AI Engine matrix (one may call that mode 'emergent AI'); so far it also functions on a much smaller scale. Note however that ultimately those relationships also depend on the input which is subject to a selection process. Therefore humans are as much influenced by tendentious information as machines.

But most importantly, Large Language Models and such are ultimately configured according to what we humans expect from them, in line with our perceptions and our interpretations. A truly artificial mind on the other hand has its own internal states, its own inner space if you will; it is alien in the literal sense of the word.

Some pertinent comments regarding social media and such. Concerns are being raised by governments about the negative effects posts featuring extremist or violent content can have on end users who may be influenced by them. For example, the Australian Government and relevant groups want the Online Safety Act modified in order to mitigate the presence of disrupting material while at the same safeguarding free speech (C Armstrong, Make it safe or pay price, The Courier Mail, Brisbane, 4 Feb 25). The concern is understandable but the question is how effective any measures taken by media companies can be.

Since generative AI uses algorithms which mine existing data sets, anything found there becomes a candidate for later use in some text or for the purpose of grouping when following the threads in social media posts. By definition that includes acceptable as well as unacceptable content. 'Unacceptable content' can be defined by humans as we explore the wider context within which particular words occur. The degree of unacceptability becomes apparent as we - humans - recognise the boundary beyond which the content starts to turn problematic. On this side of the boundary the problem is still unrecognisable (if not non-existent), and beyond we become aware of it. But what if we widen the context further?

Consider a paragraph containing the instructions for a terrorist attack. Suppose that paragraph is used as an example of how terrorists intend to operate. Clearly, the overall intent behind the text is a positive, although the example itself is a negative. Are there algorithms that can identify the wider picture? And in any case, the entire set of words is now part of the sample pool from which any subsequent response is generated. In the end the result may well be counterproductive. Of course, one solution would be to have a human peruse all this material in order to find out what to do with it - clearly not feasible given the sheer volume of data. The situation is not dissimilar to letting a group of children into a garden and saying to them, "Don't touch that box over there!". One can be sure that now that the box has been mentioned at least some of them would zero in on it, whereas otherwise virtually none of them would think about the box at all. A related example would be commentators who write at length that such and such isn't worth writing about - and thereby doing just that.

What this highlights is the potential incompatibility between different functional scopes; in this case the functionalities underpinning generative AI and those featuring in human cognition. AI-related problems can only be addressed within the scope of AI, and problems in human thinking need to be addressed by focusing on humans. In case of overlaps, when it comes to dealing with an issue one initiative is - hopefully - more efficacious than the other but there is no guarantee one does not undermine the other.

anchor arrow


© Martin Wurzinger - see Terms of Use