The capability characteristics and development process of ChatGPT

ChatGPT should be the most popular topic of AI since AlphaGo.In short,it is a robot that can talk in natural language.You can ask it any question(of course,it may answer wrong,but you can guide and correct it).It will answer you in very fluent and standard natural language.Not only that,it can also answer code questions,math questions,and so on.You can chat with it on any question happily.

If we don’t tell you in advance that this is an artificial intelligence model,ChatGPT really feels like a real person with the ability of logical thinking and language communication.For the first time,it seems that AI can finally communicate with people normally.Although it sometimes makes mistakes,at least there are no language and logic obstacles in the process of communication.It can’understand’what you are saying and give you feedback according to human thinking patterns and language norms.This very intelligent sense of experience is the reason why it breaks through the industry circle and brings impact to the public.

I also want to emphasize the problem of this sense of experience again,because in the past,the industry may have neglected this point in order to complete specific tasks in the scene due to technical limitations.Now the appearance of ChatGPT represents that AI is no longer the’useful,but also quite stupid’form in the past.

In order to better understand how ChatGPT’s very intelligent feeling is generated,it is inevitable to start with the’stupid’AI in the past.To be precise,the natural language processing(NLP)technology is still used behind ChatGPT,but it has broken the original paradigm.

To understand this,we can first look at the current mainstream practice.Human communication relies on language,and many people even think that human thinking is also based on language.Therefore,understanding and using natural language has always been an important topic of AI.But the language is too complex,so in order to make the computer understand and use the language,the process is usually divided into many sub-items,which is often called”task”in the technical field.To give a few examples:

Emotional analysis task aims at understanding the emotional tendency contained in language;

The task of syntactic analysis is to analyze the linguistic structure of the text;

The entity recognition task is aimed at locating entity fragments from the text,such as address,name,etc;

The entity connection task is aimed at extracting the relationship between entities from the text;

There are many such tasks,all of which are to analyze and process natural language from a certain subdivision.This has many advantages.For example,with these splits,you can examine the segmentation ability of a natural language processing system from different dimensions;You can also design a system or model specifically for a subdivided item.From a technical point of view,splitting a complex task(understanding and using natural language)into many simple tasks(various NLP tasks)is indeed a typical way to solve complex problems,which is also the current mainstream practice.However,after the emergence of ChatGPT,from the perspective of hindsight,this split may not be the most effective way to let computers understand and use natural language.

Because excellent performance on a single task does not mean that the system has mastered natural language.People’s’sense of intelligence’for AI is based on their overall ability to apply natural language to it,which is clearly reflected on ChatGPT.Although OpenAI does not open the API service of ChatGPT,and the outside world cannot evaluate its specific effect on each subdivided NLP task,the past external testing of its predecessor models such as GPT-3 and InstinctGPT shows that for some specific tasks,a small model fine-tuned with special data can indeed achieve better results(for detailed analysis,please refer to”Understanding the Emergence Ability of Language Models”).However,these small models with better performance on a single task did not cause a large out-of-circle effect.In the final analysis,it is because they have only one ability.Individual outstanding abilities do not mean that they have the ability to understand and use natural language,and therefore cannot play a role in actual application scenarios alone.For this reason,in a real application scenario,many modules with single-point capability are assembled by artificial design,which is one of the reasons why the artificial intelligence system in the past made people feel unintelligent.

From the perspective of human understanding and use of natural language,this phenomenon is actually easy to understand.When ordinary people understand and use natural language,they will not divide it into many different tasks in their minds,analyze them one by one,and then summarize them.This is not the way human use natural language.Just like a person,when he hears a sentence,he will not analyze its syntactic structure,entity content and relationship,emotional tendency and other contents one by one,and then put together the meaning of the sentence.The process of understanding language is a whole process.Further,people’s overall understanding of this sentence will be expressed in the form of natural language and reply.This process is not like the artificial intelligence system,which splits a single task,then outputs the labels of emotional analysis,fragments of entity information,or the results of a single task one by one,and then uses these things to piece together a response.

With ChatGPT as the representative,what GPT series models do is really close to human’s ability to understand and use language-directly receive natural language,and then directly reply to natural language,and ensure the fluency and logicality of language.This is a way of communication between people,so we have a’very intelligent’experience of it.Maybe many people would think that it would be great if ChatGPT could be achieved.The split of tasks in the past was due to technical constraints.

Of course,we now see ChatGPT,so OpenAI has not abandoned the route of generative pre-training.In fact,the”return”of persistence loomed in the second year,that is,2019.OpenAI released the GPT-2 model with 48-layer Transformer structure.In the published paper(Language Models are Unsupervised Multitask Learners),they found that after unsupervised data and generative training,GPT showed the zero-shot multitasking ability.It is amazing that these multitasking capabilities are not explicitly and artificially added to the training data.To give an example in popular words,one of the abilities shown by GPT-2 is to do translation.But it is surprising that the model used for translation usually requires a large number of parallel corpora(that is,data paired between two different languages)for supervision training.However,GPT-2 does not use such data,but only conducts generative training on a large number of corpora,and then it will’suddenly’do translation.This discovery is more or less subversive.It shows people three important phenomena:

If you want the model to complete an NLP task,you may not need annotation data matching the task.For example,GPT-2 does not use labeled translation data during training,but it will do translation;

If you want the model to complete an NLP task,you may not need a training goal that matches the task.For example,GPT-2 training did not design translation tasks and related loss functions,but only did language model tasks.

Models that are trained only with language model tasks(i.e.,generative tasks)can also have the ability of multitasking.For example,GPT-2 shows the ability to translate,answer questions,read and understand.

Although from the current perspective,the various capabilities displayed by GPT-2 at that time are still relatively elementary,and the effect is still far from some other models after fine-tuning using the monitoring data,this does not prevent OpenAI from fully expecting the potential capabilities contained in it.As for the last sentence in the paper summary,they put forward their expectations for the future of GPT series models:“These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.”