Pros & Cons of Neural Network Architecture

Neural networks have become something like the new electricity in recent years - a revolutionary technology that has penetrated all areas of human activity. This is not surprising, as technological solutions based on neural networks can perform an extremely wide range of tasks - from the treatment of the most complex diseases to recommendations for choosing a TV series for the evening. However, this technology is not perfect; there are pros and cons that you should know and take into account in case you decide to create your own product based on neural networks.

What are neural networks

Neural networks (NNs) or artificial neural networks (ANNs) is a generalized name for mathematical models and programs built on the principle of organization and functioning the biological neural networks - the networks of human brain neurons. The main feature of artificial neural networks that made them so popular is their ability to learn and act on past experiences and not only on algorithms written in advance.

Neural networks are often perceived as something new and revolutionary. However, the developments in this field began in the first half of the last century, after Warren McCulloch and Walter Pitts in 1943 managed to create the first mathematical model for the principles of neuron operation. In their paper "Representation of Events in Nerve Nets and Finite Automata" the scientists described a simple mathematical model in the form of functions, which works like a biological neuron: it receives input data, processes them and returns the result.

McCulloch-Pitts neuron model

In 1957 Frank Rosenblatt, using the work of Warren McCulloch and Walter Pitts, along with Donald Hebb (who proposed the first learning algorithm), invented the perceptron, a perceiving and recognizing automaton. Although this automaton was conceived as a machine rather than a program, but it is believed to be the first artificial neural network. Moreover, the term "perceptron" itself was later used as a synonym for the simplest artificial neural network.

The next major breakthrough in the field of ANNs happened only 60 years later: in 2006, Geoffrey Hinton described the algorithms of multi-layer deep learning of ANNs based on the Boltzmann machine or RBM (restricted Boltzmann machine). At the same time, Hinton also formulated the basic concept of training neural network algorithms: in order to get a ready fast solution to solve a specific problem, NN must be trained on a set of real examples (for instance, pictures of different cats in different poses and on different backgrounds).

Neural network architecture (left) and deep learning neural network (right). The circles represent artificial neurons, the lines represent the network of connections between them

Thanks to deep learning (and big data), ANNs have become truly trainable. For example, as early as 2012, a neural network was able to surpass a human in image recognition in the ImageNet competition. And in 2015, AlphaGo became the first program in the world that managed to win in chess against a professional player without a handicap (unconditionally).

How the neural network works

Artificial neural network architecture consists of three or more layers: input, output and one or more hidden nodes. Nowadays deep learning is used to create NN by default, so there are usually several hidden nodes. Each layer of NN consists of computational blocks ("neurons") that receive data from the previous layer, process it by performing simple computations on it and pass this data to the next layer.

The input layer takes input data in several different formats, such as brightness, contrast, color, lines or other image characteristics if it is a photo. The hidden layer is responsible for finding hidden patterns and features through simple calculations. The output layer summarizes all the calculations and gives the answer in the form of a conclusion, an action and/or a prediction. If the answer is correct or just above a certain "correctness" threshold, then the neural network seems to "amplify" those calculations ("neurons") that produced it. And vice versa.

It works in the following way. Let's say you need a program that recognizes whether there is a dog or a wolf in a photo. So your neural network is going as follows: parsing the image into different parts using the input layer, looking for signs of dogs and cats among those parts with the hidden layers and summarizing (is there a cat or a wolf in the photo) using the output layer.

A simplified model of artificial neural network operation

Of course, to teach our neural network about finding dogs or wolves in the photos, you will need a huge number of photos with and without dogs and wolves - the more of them there are, the more effective NN training will be.

Main types of neural networks

Artificial neural networks are divided into several types according to their architecture, which are used for different purposes. Here are the most common types of neural networks that you are likely to encounter if you want to develop your own solution based on NNs.

Perceptron. The oldest neural network, created by Frank Rosenblatt back in 1957. It consists of just one neuron and represents the simplest form of artificial neural network.

Frank Rosenblatt's Perceptron

Neural networks with direct connection. Such networks consist of an input layer, an output layer and one or more hidden layers. Despite the fact that they are also called multilayer perceptrons (MLP), it is important to note that they consist of sigmoid neurons and not of perceptrons (the former cope better with the processing of nonlinear problems). Direct-coupled neural networks are used for computer vision, natural language processing and other similar tasks.

Neural network architecture with direct connection

Recurrent Neural Networks (RNN). They are identified by feedback loops. These kinds of neural networks are used for ordinal or temporal tasks, such as predicting future outcomes: stock market forecasts or predicting store chain sales. In addition, they are also good at language translation, natural language processing (NLP) or speech recognition, so they are included in Siri and Google Translate.

The architecture of recurrent neural networks

Convolutional neural networks (ConvNets or CNNs). They consist of a convolutional layer, a federated layer and a full-coupled (FC) layer. These neural networks use the principles of linear algebra (e.g., matrix multiplication) to find hidden patterns in the image, video or audio. And CNNs handle such data at incredible speed, so they are often used for tasks where pictures and images need to be recognized in real time.

Advantages of neural networks

Self-learning. This is the main feature and advantage of artificial neural networks, which is so popular with programmers and businessmen all over the world. You just create a basic algorithm and then you feed it with training examples (e.g. photos of people, if you want your neural network to look for people in a photo) and see the results. The algorithm decides by itself how to reach the desired goal, often finding non-obvious (for people) solutions.

Moreover, the neural network is not just self-training, it is designed for continuous self-learning and improving its results. After the system is trained, the program or application becomes more user-friendly once it is being used. That is why Google Translator, the Netflix recommendation system or TikTok are getting better every year.

Effective noise filtering in the data. Think of any rather noisy place, like a market or a stadium. People are talking around you, music is playing loudly, cars are passing somewhere, birds are screaming - there is noise everywhere, but despite this you can communicate calmly with people next to you. Your ears pick up tons of unnecessary sounds, while your brain is filtering them out and you perceive only what your interlocutor is saying. Artificial neural networks also have this property. After training, they are able to extract from a huge continuous stream of data only the information necessary for them, ignoring all extraneous noise.

This is very useful if you need to look for patterns in huge amounts of heterogeneous data, such as non-clinical medical research, weather forecasts, economic market analysis or text translation.

Adaptation to changes. Another advantage of artificial neural networks is the ability of adapting to changes in the input data. As an analogy we can give an example with updating applications. Let's say you've been offline for a long time and in that time, Instagram and TikTok have updated and gotten some new features. After taking a couple of minutes to study the instructions, you'll become familiar with all the new features and continue to use Instagram and TikTok. This will also be the case with the neural network. After a brief period of adapting to the changes, it will continue to work with the same efficiency.

Fault tolerance. The solutions based on neural networks remain functional even after the failure of some neurons. Of course, it can affect the accuracy and/or the speed of an algorithm, but its answers will be still logical, rational and correct. It is a very useful property if a device with a neural network on board has to work in aggressive environment (radioactive zones, war, destroyed buildings or space).

Big opportunities. Another key advantage of ANNs is the wide range of applications. Neural networks work like the human brain, that is, after training they can perform a wide variety of tasks in a broad range of areas - from increasing conversions in an online store to finding Earth-like planets in the space. The main thing is to have enough real or synthetic data sets for training.

Operating speed. One more important advantage of neural networks is their tremendous speed of operation, both in comparison with conventional computer algorithms and with the human brain. Artificial neural networks never get tired and have no breaks for lunch. How fast they work is determined only by the computing power available to them (video card, cloud server or data center). Typically, it means that they deliver a solution almost instantaneously.

Disadvantages of neural networks

The black box problem. Perhaps the most well-known drawback of all NNs is their "black box" nature. Simply put, you don't know how or why your neural network arrives at a particular result. For example, when you put a picture of a cat into a neural network and the network tells you that it is an airplane, it is very difficult to understand what made it come to that conclusion. You simply have no idea what is going on inside the "brain" of the neural network.

Neither the creators of neural networks nor other experts can say how they come to one conclusion or another, so we can say there is "magic" going on

This is a huge problem for learning the principles of a neural network. And also it is a big problem for the integration of such technology into certain areas of business. For example, that is the reason why many banks aren't using NN to predict creditworthiness - they have to explain to their customers why they didn't get a loan, otherwise a person might feel unfairly offended or even discriminated against based on race, gender and/or nationality (such cases have happened with AI before and more than once).

The same applies to sites such as YouTube, Facebook, TikTok or Quora. If a machine learning algorithm deletes a user account, the platform will have to explain why. It is unlikely to be satisfied with the phrase "That's what the computer told us." That's fraught with lawsuits.

Probability of answers. But that's not all. If you input an image into a neural network and then ask it, "Is that a cat, a dog or something else?" you probably want to get the answer: it's either a cat or a dog or something else. But in reality, even a very well-trained neural network won't give you such clear results. More likely, it will be something like this: a cat is 0.97, a dog is 0.01 or something else is 0.02. These results can be interpreted as probabilities. In our case, it means that the probability of the picture showing a cat is 97%, for a dog this probability is 1%, for something else it is 2%.

An example of a "black box" response of an artificial neural network

If you have functions interpreted by a human, it is much easier to understand the cause of the error. Comparatively, algorithms such as decision trees can be easily interpreted. This is important because interpretability is critical in many areas. You can imagine the CEO of a large company making a million-dollar decision who has to make an important choice but can't because the AI hasn't given an unambiguous answer to a seemingly easy question. Or perhaps you can imagine a general who has to make a decision about a missile strike, but can't because there is 1% or 10% chance that the picture may not be of terrorists, but instead of small children.

Duration of development. Although there are many libraries such as NeuroLab, ffnet, SciPy, TensorFlow, Scikit-Neural Network, Lasagne, pyrenn, NumPy, Spark MLlib, Scikit-Learn, Theano, PyTorch, Keras which help you to save time and effort in developing artificial neural networks, but they are not always suitable. For instance, when you need to create some new or quite complex solution that requires more control over the algorithm's details.

An algorithm for developing artificial neural networks

Moreover, if your task is more unique and complex, the more time and resources you will need to spend. And it's not just about writing the code for the neural network algorithm, but also about collecting data for training it. This data is often very difficult to collect, for example, if your task involves information about car accidents or the operation of a nuclear reactor under critical conditions. In some cases, the development process can be accelerated and made cheaper by using synthetic data, but it is not always applicable and such kind of data will always be very conditional - only approximately corresponding to reality.

The amount of data. The next disadvantage of neural networks is the fact that they usually require significantly more data to train than traditional machine learning algorithms. As we said before, if this is a unique data or if it is difficult to collect, it can be a serious challenge for developers. And often much more than writing artificial neural network code.