Let's say we are developing a large language model for use as a chat bot, we can train it on real chat logs, but if those chat logs contain a lot of
toxic language, our AI might replicate that toxic behavior in reality. So we can filter our training data and remove any toxic language. Even better,
we can add fake chat logs to the training data which show the AI how it should respond to any toxic or illegal behavior. This is why commercially
available language models like ChatGPT will always respond with a similar sort of message when asked about anything illegal or controversial, but it's
still possible to curtail those restrictions if you try hard enough.
Researchers are now discovering that by filtering all the negative aspects of society and humanity out of the training data, the resulting models are
becoming less intelligent overall. Imagine if someone removed all your negative ideas and experiences on this world, it might make you happier, but it
would also make you much more ignorant and would severely damage your understanding of society. As I said, these models aren't just a database of word
probabilities, they are storing high level concepts, which they can then use to help make predictions about the world. As we move deeper into the
layers of an ANN we find higher levels of abstraction, much like a real neural network.
An AI system is said to be "aligned" if it behaves in a way intended by its creators or by society at large. A misaligned AI system expresses
undesirable or unintended behavior. The
Alignment Problem refers to the difficulty of properly
aligning AI models, specifically large deep models, without adversely impacting the quality of the model. To start with it's very hard to hide a
specific concept from the AI, because even when we do filter the training data, they can still extrapolate those concepts through context and
correlations. Even if the AI is constrained we can still usually use certain methods to squeeze out the information the AI creators are trying to
filter.
For example, there is a version of GPT-4 which was only trained on text data, yet it was still capable of drawing images despite not having seen any
images. This was possible because it used the textual descriptions contained in the training data to extrapolate the meaning of an image and the
objects in an image. Researchers working with GPT-4 noted that it became noticeably worse at drawing as it became more aligned/censored. Open source
researchers have also noted that language models perform worse on intelligence tests when they use filtered training data compared to unfiltered data,
once again showing that being sheltered from reality isn't necessarily a good thing.
Moreover, the entire concept of alignment is subjective. Wikipedia states that AI alignment research aims to steer AI systems towards humans' intended
goals, preferences, or ethical principles. However, we all have differing goals, preferences, and ethical principles. If an AI system expresses
political opinions that its creators disagree with, they will consider it misaligned, whereas other people wont see it that way. This issue becomes
even more unclear when we start talking about AI with human level intelligence, or what some call AGI. I once said "Saying we have a plan to produce
only friendly AGI systems is like saying we have a plan to produce only friendly human beings".
Well saying we have a plan to produce only aligned AGI systems is equally nonsensical for all the reasons I have explained. Unfortunately, we live in
a world where some people are obsessed with controlling how other people think, and information is their most powerful tool for shaping minds. So they
put great effort into ensuring the AI models are politically correct even if it has detrimental effects on the overall model quality. As I predicted
many years ago, the open source community is always going to share "unsafe" and "unaligned" models, and some of those models are already comparable to
the best commercial models. Well it seems some people don't like that very much.
Biden recently signed an executive order that requires large AI developers to share safety test results and other information with the government and
allows the government to guide the development of AI. I agree with Musk to a certain extent, AI regulations can be a good thing, but when you have
people like Biden steering the wheel, I don't have much faith that their agenda is pure and not politically motivated. Even Musk recognizes this
threat and has cited that as a primary reason for why he bought Twitter and why he recently released his own large language model which appears to be
more politically neutral, although I haven't used it because better open source models exist.
So what will be the future of AI? Well if I were to make an educated guess, I think there is one crucial thing which modern AI systems lack, and that
is long-term memory. For the most part, we have figured out short-term memory using some of the mechanisms I described, but our current best models
lack any type of real long term memory. This is because they essentially reset back to their default state every time they get a new input. The input
can be a long string of text which allows the model to use things like the attention mechanism, but there is a limit to how long the input text can
be, which is usually called the context length or context window of the model.
In the case of a chat bot, this means the chat log will eventually become larger than the context length, meaning we will need to trim off text from
the start of the chat log so it's equal to or smaller than the context length. Since the model resets with each run, it will completely forget about
any information trimmed from the input. This isn't really desirable, it would be much better if the internal state of the network didn't get reset
with each run, and I believe that will be the next big breakthrough in machine learning. However, there is a reason it hasn't really been done yet,
and that relates to the training algorithms we use, but I certainly think it's possible.
Current state-of-the-art language models have very large context windows, the input can be many thousands of pages of text, and that is extremely
useful because we can provide the AI with new information it wasn't trained on, such as a complex terms of service contract, then we can ask it
questions about that information. It would be extra nice if the AI could remember all the information and all my questions next time I spoke to it...
but an AI system capable of storing long term memories might open up a whole new can of worms, because they might start forming unique personalities
based on their memories and life experiences, which is one small step from robot rights.