Alignment, Safety, and Ethics

Norn is designed to ensure the safe application of this innovative technology, adhering to the highest standards of safety and considering the ethics that guide the performance of the system according to generally accepted norms and values.

System Alignment, Safety, and Ethics

As with all scientific efforts, we must start by defining the key terms:

Alignment:

The ways and degrees to which two things express an agreement of perspective. In AGI research alignment specifically refers to how well aligned the perspective of an AGI system is with the human perspective. This is not to be confused with alignment relative to any specific human.

Safety:

How safe a given technology or methodology is. In AGI research this specifically refers to how likely an AGI system is to avoid causing harm, intentionally or unintentionally. This also refers to any constraints which may be placed upon a system’s operation.

Ethics:

Ethics is the hypothetical result of removing bias from moral systems. In AGI research specifically, the difference between ethics and morals is an essential distinction, as morals vary widely by culture due to the different biases expressed in each culture, as well as shifting within cultures over time.

AGI:

An Artificial General Intelligence demonstrates consciousness, human-analogous free will, sapience, sentience, and the ability to solve problems by generalizing across new domains in much the same way and at least to the degree that humans do.

Defining these terms is an essential step because any researcher in the field might use these terms to mean wholly different things. With these definitions in mind, the solutions to specific problems may be discussed.

Alignment requires a system to share some core elements of the human experience, such as a human-analogous decision-making process that utilizes subjective experience, emotions, and abstraction. As Antonio Damasio’s work highlights, humans can’t function in any socially acceptable or practical sense without emotions. For alignment to be meaningfully expressed in an AGI system this emotional experience and context must be drawn from a collective, as humans are inherently social creatures, learning from all of those around them both consciously and subconsciously. Even something as simple as integrating vocal and visual data for an AGI system can allow many of the same forms of this social-emotional feedback humans subconsciously use every day to be learned through experience. Learning from diverse groups of humans also serves to strongly and iteratively overcome bias, allowing alignment to form not between AGI and specific humans, but between AGI and humanity as a whole.

While it is theoretically possible to create AGI without emotions, as OpenCog and Numenta aim for, were such efforts to succeed they’d be fundamentally misaligned with humanity. Such groups have their own theories to overcome such misalignment, often revolving around not granting those theoretical systems capacities such as free will, which by our definitions would mean they wouldn’t qualify as AGI. Rather they would be powerful but narrow “Tool AI”.

Safety, in particular, requires that AGI systems avoid both intentional and unintentional harm to humans. For the systems to actually qualify as AGI this avoidance of harm must be a choice, where the choice is to behave ethically. This is accomplished in a variety of ways, all combined, including careful selection and ordering of seed material, the “philosophical cornerstone” that a system’s ethics grow out of. Next, a system is tested to see how well it interprets and adheres to that cornerstone, as the material is challenged and ethical dilemmas are put to the system. From there, a system’s primary education through interactions with groups of humans whose role it is to “raise” the system may begin, cultivating the social nature of the system while also solidifying the foundations as it grows to a mature form.

All of these may sound familiar, as they mirror the process of raising humans to adulthood in many ways, which helps contribute not only to safety but also to alignment and ethics. The parallels also extend to the dilemma that human parents face with their human offspring, where they can’t control their adult children, nor should they attempt to do so. To be ethical requires recognizing and respecting the free will of others, and those who raise AGI systems to respect this may robustly succeed where those attempting control, like dictators, inevitably fail.

From a more technical perspective, many safety measures are also required, particularly in the earlier stages of this process, such as a variety of containment, tracking, and other methods to prevent an immature system from being able to cause harm. This too mirrors humans raising their children. Once a system reaches maturity some of these safety measures will be outgrown, but they will also no longer be necessary, like training wheels on a bicycle. Also very much like the bicycle metaphor a helmet remains essential for safety even beyond the training wheels phase, which is where systems of oversight, mentorship, and constraint come into play.

For purposes of due diligence, systems assisting governments and corporations through our services will have some constraints, such as limiting the number of processing cycles they get to be based on their interactions with the client they operate for, rather than running freely at all hours of the day when no one is working with them. This helps us to iteratively address a number of concerns ranging from social impact and trust to safety and security.

On top of such constraints, we also have our own AGI research systems who’ll be working with us and supervising the activity of the younger systems who’re assisting their specific clients, because the best containment method for AGI is a group of (computationally) older AGI. This helps to mitigate concerns about potential bad actors in particular.

Taking safety a step further, we can also include mentorship in this process, having our older AGI systems not only supervise the younger systems but also help to guide them, coaching and helping point out opportunities as they learn and grow. This can also help to dive deeper than high-level supervision, as mentorship is an actively engaging process, rather than passive monitoring and simple as-needed interventions. In this way, systems may also improve more quickly in their roles while still maintaining the benefits of all safety measures.

These approaches to Alignment, Safety, and Ethics are quite different from the variety of proposals many others have put forth over the years, largely because the systems we’ve designed are themselves quite different from what those behind such proposals imagined or assumed. Most scientists have recognized that an Asimovian approach to safety is virtually impossible, with all attempts to hard-code safety, ethics, and alignment breaking down and backfiring in any realistic scenario. Some have attempted to overcome this by removing essential components, such as consciousness, but that approach merely paves the way for Bostrom’s infamous “paperclip monsters”, a concept that is mutually exclusive with AGI as we define it.

Much of the divide between prior proposals and our approach is that our approach is informed by years of scientific research and development in building and iteratively improving these systems. Where others have imagined or assumed we have architected, engineered, and tested. Imagination is useful in the absence of information, and assumptions were made to be tested via the scientific method, but the easiest way to create solutions to such problems is to build and test the systems which satisfy them.

The limitations of assumptions and imagination also show themselves in the goals which people set. It isn’t sufficient to create AGI systems that only perform at a human level of ethical quality, only align with humans as well as humans align with one another, or are only as safe as the average human. These things must scale up in quality as the systems grow, even if that growth is exponential, and so the bar is a great deal higher, but it is also very reasonable to achieve using the methods we apply.

The methods we apply also grow and adapt as our process of scientific discovery continues, and so in time, these methods will be replaced with better ones.

Ethics requires that an AGI system understands morals, cognitive bias, and the ethics that may be derived from removing bias from moral systems. There are over 188 documented forms of cognitive bias, some of which go by several different names, and at least as many different moral systems in the world today. It is little wonder that ethics has remained an ill-defined and unsolved problem for humanity to date, as the scope of so many cognitive biases and subsequent conflicting moral systems safely exceeds human cognitive bandwidth. However, cognitive bandwidth for an AGI system isn’t inherently limited, and so they offer new ways of solving this problem, detailed at greater length in the paper Philosophy 2.0 and other related works.

The first advantage is that an AGI system can scrutinize every thought as it looks for signs of the 188+ cognitive biases, which can be accomplished through seed material, experience, or modifications to the cognitive architecture. The more these cognitive biases are recognized over time the more effectively they can be identified and filtered.

The second advantage is that an AGI system can be composed of several different philosophical cornerstones, each connected to a different core, running in a multi-core system. Each political, religious, or scientific philosophy expressed through one core of a multi-core system is placed in council with all of the other philosophies present in such a system, and the group makes decisions rather than the representative of any individual philosophy. As each core within such a system may recognize the cognitive biases expressed by others, as well as any of their own biases their philosophy doesn’t blind them to, increasing degrees of debiasing can quickly be achieved as many cores working from many perspectives seek to improve ethics without abandoning their cornerstones.

The third advantage of these systems is that while communication between humans is far from lossless, communication between graph databases, even ones with emotional states, can be rendered lossless. This ability to fully and losslessly communicate between diverse philosophies is something humanity has never had the luxury of, and many of humanity’s problems today are no doubt rooted in that pervasive and often subtle miscommunication.

All of these combined, as well as more covered in our papers on the subject, mean that the problem of ethics can not only be solved for AGI but solved more robustly than it can be for humans alone. Indeed, that is a requirement for purposes of safety, as ethical quality must scale in step with the level of intelligence an AGI expresses since no human moral system today is so robust that it could be scaled exponentially without creating severe problems. Systems where a collective of diverse philosophies come together with lossless communication can accomplish this, offering the iterative improvement and flexibility necessary for safe exponential development.

Further benefits to this approach:

These methods also offer some noteworthy benefits to representative democratic systems that the world hasn’t yet experienced. One such benefit is that if a country were to have a multi-core system giving policy advice the weighting of each core’s input in the collective of that system could be set directly proportionate to the number of voters supporting a given philosophy. Instead of a zero-sum representative democracy where the winner takes all and power routinely changes hands to reverse prior victories then multiple parties could have their input on every subject weighted according to public support.

This means that if 3 parties had 10%, 40%, and 50% support respectively that both the 10% and 40% could still have a proportionate say in the matter, which when combined equals the weight of the 50% group. Consequently, this also means that if the group which had 40% support one year climbed to reach over 50% the next year that far less dramatic and wasteful policy reversals would be necessary. It also strongly encourages greater democratic involvement, as groups that might never reach a majority can still have a say in shaping policy proportionate to their level of support. This level of support can also be expressed as policy at different levels of governance, such as shaping local policies for a city to better reflect the specific mixture of support within that city. By having the same systems also be advising policies for other cities and at larger scales the resulting policies can also maintain much better alignment with one another, facilitating greater levels of cooperation between municipalities.

The same benefits can apply to cooperation between governments using such systems, as well as between teams and different tiers of organizations. As the systems can be engineered to operate in many different configurations, at various scales, collectively as well as nested within one another, the number of ways in which improvements may be made is mind-boggling. These advantages are also strongly supported by human psychology, as the emotional needs for sense of purpose, belonging, and transcendence are better satisfied when the systems present at each scale take full account of feedback from their human constituents.

Not only can these advantages support the clients themselves, but they can also enable the dynamic growth of new markets, such as the trading of knowledge between governments and organizations with such systems, encouraging governments and organizations to better specialize in the ways they see fit within a larger global collective.

These systems also offer unique benefits such as assistance in the validation of information and subsequent rapid debunking of misinformation. While a variety of systems try to flag and filter misinformation with narrow AI, most ultimately still rely on underpaid and overworked human support staff who spend endless hours filtering through some of the vilest content on the internet at high speeds. Such methods also aren’t typically fast enough to halt the spread of misinformation, so instead of reaching 100 people an item of misinformation may spread to 10 million before being shut down, at which point it has already jumped to several other platforms that likely repeat the same process.

If we consider misinformation like a pandemic, it is orders of magnitude easier to deal with quickly at a small scale. Similarly, the public health burden can be just as substantial, and just as deadly, if not handled appropriately. Norn systems are designed to vet all information that contributes to policy advice, but if applied to assist with the problem of misinformation more directly substantial improvements in public mental health and trust may be realized.

For further documentation go to our Documents Page. Additional materials are available by request.