Thursday, May 11, 2023

Jetson AGX Orin setup

Default stats on Jetson AGX Orin

Auto Mounting M.2 SSD on Jetson AGX Orin

reference : https://gist.github.com/a-maumau/b826164698da318f992aad5498d0d934

sudo fdisk /dev/nvme0n1

You can see commands by m.
Choose “n” to create a new partition, then “p” then “1” to create a new primary partition.
Just use defaults, or just press enter when you asked about sector numbers.
Then “w” to write the data to the disk.
(in my case I did't need "w")

I didn't had to execute below instruction:

sudo mkfs -t ext4 /dev/nvme0n1

next set of instructions are about MAC permissions and fstab entry:

sudo mkdir /media/what_you_want
sudo chown -R <user>:<user> /media/what_you_want
sudo chmod 764 /media/what_you_want

mount /dev/nvme0n1 /media/what_you_want

# check uuid and file system type
sudo blkid

# write /etc/fstab like
UUID=uuid_of_your_device /media/what_you_want ext4 defaults 1 1

Python3.9 installation

wget https://www.python.org/ftp/python/3.9.16/Python-3.9.16.tgz

tar -xf Python-3.9.16.tgz

./configure --enable-optimizations

sudo make altinstall

Verify the CUDA and CUDNN installations

Verify that your Cuda installation exists at /usr/local/cuda* and then add below lines to your ~/.bashrc

export CUDA_HOME=/usr/local/cuda

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/Debugger/lib64

export PATH=$PATH:$CUDA_HOME/bin

reboot and then execute:

nvcc --version

Tuesday, May 2, 2023

Jetson Nano Developer Kit essential steps

A Jetson Nano Developer Kit with 3rd party active heatsink, Netgear Wifi adapter and protective case

Stats

Install jetson_stats to get information similar to below:

Power Modes and CPU cores

Jetson Nano has 2 power modes. It follows the philosophy "If you need more, you gotta juice me up". 5W power mode only enables 2 out of 4 CPU codes. The MAXN mode enables all 4 cores. The 5W mode surely feels sluggish GUI wise. To enable 4 cores: "sudo nvpmodel -m 0". Reboot isn't necessary.

jtop preview

To toggle back to 5W mode, execute "sudo nvpmodel -m 1"

jtop preview

See below conf file

sharath@sharath-desktop:~/Downloads$ cat /etc/nvpmodel.conf

..... // clocks, GPU information..

###########################

# #

# POWER_MODEL DEFINITIONS #

# #

###########################

# MAXN is the NONE power model to release all constraints

< POWER_MODEL ID=0 NAME=MAXN >

CPU_ONLINE CORE_0 1

CPU_ONLINE CORE_1 1

CPU_ONLINE CORE_2 1

CPU_ONLINE CORE_3 1

CPU_A57 MIN_FREQ 0

CPU_A57 MAX_FREQ -1

GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on

GPU MIN_FREQ 0

GPU MAX_FREQ -1

GPU_POWER_CONTROL_DISABLE GPU_PWR_CNTL_DIS auto

EMC MAX_FREQ 0

< POWER_MODEL ID=1 NAME=5W >

CPU_ONLINE CORE_0 1

CPU_ONLINE CORE_1 1

CPU_ONLINE CORE_2 0

CPU_ONLINE CORE_3 0

CPU_A57 MIN_FREQ 0

CPU_A57 MAX_FREQ 918000

GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on

GPU MIN_FREQ 0

GPU MAX_FREQ 640000000

GPU_POWER_CONTROL_DISABLE GPU_PWR_CNTL_DIS auto

EMC MAX_FREQ 1600000000

# mandatory section to configure the default mode

< PM_CONFIG DEFAULT=0 >

Installing Alpaca-Py

Enabling Wifi:

Find the necessary toolchain to install your Wifi Dongle's driver.

For my Netgear dongle, I referred GitHub - jurobystricky/Netgear-A6210: AC1200 High Gain WiFi USB Adapter Linux kernel driver and installed the driver. You can check the driver module listed in "lsmod" output.

Now that the Wifi dongle driver installation is verified, try lsusb and see if you can find your wifi dongle in the list. If lsusb doesn't emit output, it could be very well that the driver is unable to receive any DMA memory allocations due to small coherent_pool size. Run "dmesg" to check for related errors. If this is indeed the case, relevant kernel boot parameters are to be added.

The kernel boot parameters are in /boot/extlinux/extlinux.conf. Add vmalloc=512M cma=64M coherent_pool=32M to the end of the last line. The last line starts with APPEND and is part of the LABEL primary section of the config file. It is a very long line and probably wraps several times in your text editor. It should look like this after editing:

LABEL primary

MENU LABEL primary kernel

LINUX /boot/Image

...

APPEND fbcon:map0 console=tty0 ... ...

... vmalloc=512M cma=64M coherent_pool=32M

After above step, reboot and retry lsusb. If you see your wifi dongle listed, proceed to following instructions:

nmcli r wifi on

nmcli d wifi list

nmcli d wifi connect [SSID] password [PASSWORD]

Installing Kernel Headers:

What follows is just one long alternative way to obtain and configure the “equivalent” of headers. I use this sort of method because full source is almost always better than just headers (only it takes a lot more disk space…but not too much if you don’t build an actual kernel Image). Consider what follows to be a superset of headers, and more extensive than what the 2GB model source would provide. Keep in mind while looking at this that the key to the difference between 2GB and other modules is a combination of device tree and kernel configuration…the source itself is meaningless unless it is configured to match your running system.

Typically you would just download the full source for the kernel. This includes the headers, and if something wants to point at headers, then pointing at the full source also works. The trick is that the source needs to be configured for that particular config (including CONFIG_LOCALVERSION, which usually is “-tegra”).

L4T releases are listed here:
https://developer.nvidia.com/embedded/linux-tegra-archive

For R32.4.4 the URL is here:
https://developer.nvidia.com/embedded/linux-tegra-r3244

For the Nano of that release the URL of source code shows as:
https://developer.nvidia.com/embedded/L4T/r32_Release_v4.4/r32_Release_v4.4-GMC3/T210/Tegra210_Linux_R32.4.4_aarch64.tbz2
…but this probably does not contain everything the non-Nano sources have. Specifically, the source code of the kernel itself should be the same for the non-Nano versions, and the source for the Nano would be only a subset of what is shown for the Nano. The kernel of the Nano would probably be ok, and there is even a “.deb” package listed which might do what you want, but I suggest looking at this (the non-Nano source):
https://developer.nvidia.com/embedded/L4T/r32_Release_v4.4/r32_Release_v4.4-GMC3/Sources/T186/public_sources.tbz2

This is a tar archive, and the package path within this archive for “source->public->kernel_src.tbz2” would be the correct content once configuration is set to match the Nano’s configuration. Notice that you’d unpack the full kernel_src.tbz2 since for some NVIDIA components, when configured in the “kernel->kernel-4.9/”, that the config will require a relative path referring to the other subdirectories (most third party configurations would never need this).

If you have a running Nano, then you should see “/proc/config.gz”. If you save a copy of this somewhere (permanent for future use), and then edit just the “CONFIG_LOCALVERSION” (this would be set ‘="-tegra"’ in the file because “-tegra” is the suffix of the output of “uname -r”), then this should be an exact match to your running system and also be the correct “.config” to use with “make prepare” and “make modules_prepare”. Those prepare statements should make the kernel configure to the exact running system if the “.config” is in place, and this in turn should make it possible to treat the full source as if it were the headers only.

Once you have this, point a symbolic link from “/lib/modules/$(uname -r)/build” to the “source->kernel->kernel-4.9/” location and it should work as if it were headers. Example, assuming native build on the Jetson:

cd /lib/modules/$(uname -r)
sudo ln -sf /where/ever/it/is/source->kernel->kernel-4.9 build
# Then verify source exists at "ls /lib/modules/$(uname -r)/build/*".

Thursday, December 29, 2022

Beta cluster

I've prepared this cluster to consolidate all individual SBCs to one spot. The left tower serves as functional extension to RPI 8GB variant through breakout board and USB hub.

Monday, September 12, 2022

Statistics part-2: Regression Analysis

This article serves to explain all of necessary fundamentals for Regression Analysis reference for posting formulae:

https://www.calculatorsoup.com/calculators/statistics/

Average (or) Arithmetic Mean of the data:

A dataset constitutes several data points and can have several properties that are described by several parameters. Every dataset has a central location around which all of its data points cluster or scatter around. This central location is called the "Average" or "Arithmetic Mean" of that dataset. "Arithmetic Mean" or simply "Mean" ( for this discussion context, we can shortly call it as Mean) is one of measures of central tendency. As all data points center around this statistical term "Mean", it is considered to be the single point general description of all data points in the given data set. "Mean" has to be computed as it doesn't physically exist. Sometimes, one of data points can have exact value as that of "Mean" of that entire dataset.

Consider a hurricane that has storm winds swirling around its center. Without its eye, a hurricane doesn't exist. We can think of the eye of a hurricane as its mean around which its winds swirl. In other words, its winds have the tendency to center around or cluster around its eye. The eye is a calm zone with no chaos. Just as the "Mean" that doesn't exist in the given dataset unless explicitly computed, a hurricane's eye has no visible presence if you are plotting hurricane's destructive power throughout its body on a destruction scale. Despite lacking the visible destructive trait of a hurricane, its eye acts as its "Mean" to describe the hurricane size, magnitude and other properties. Average (or) mean is calculated by dividing summation of all data points in a data set by its size n.

Mean=x¯¯¯=∑ni=1xin

Variance

Tuesday, September 6, 2022

Statistics Part-1 : Probability basics, Expectation value & Distributions

Combinatorics helps us to count things. A simple operation of counting things can have several aspects such as orderings, combinations and other concepts. Within this field of Combinatorics, "Permutation" concept exists. Permutation is a way of ordering things. The notation P(n) denotes total possible orderings for "n" things. Following understanding of concepts and ideas are adapted from "Probability: For the Enthusiastic Beginner by David Morin." Please purchase his book for good reference on these concepts and encourage him. He is awesome!

General ordering of n items:

Consider a case study of alphabet arrangement.

Case-1:

There is only one way to arrange a single alphabet A.

So, the total number of Permutations for 1 entity is P(1) = 1 as { A }

Case-2:

Two alphabets A &B can be arranged in 2 ways: AB & BA

So, the total number of Permutations for 2 entities is P(2) = 2 as { AB, BA }

For each alphabet, there is only way to arrange the other alphabet. There are 2 alphabets in total. So it is 2xP(1). Using dot notation, case-2 is written as P(2) = 2.P(1).

Case-3:

The total Permutations of 3 alphabets are :

P(3) = 6 as { ABC, ACB, BAC, BCA, CAB, CBA }.

For each alphabet, rest of the alphabets can be arranged in 2 ways. For A, BC & CA are possible orderings for left out alphabets B &C just as Case-2 which further leads to Case-1. So ABC, ACB are possible permutations if A is the first selected alphabet. For B as the first alphabet, CA & AC are possible orderings for left out alphabets C & A. So BCA, BAC permutations are possible. Similarly we get CAB & CBA if we select C as first alphabet. Thus, we got 6 in total. Thus, Case-3 can be written as P(3) = 3 times Case-2 = 3.P(2) = 3.2.P(1) = 3.2.1.

If we expand this to dataset of 4 alphabets A,B,C & D, we get P(4) = 4.P(3). For 5, we get 5.P(4). To generalize, we have

P(n) = n.P(n-1)

n.P(n-1) is mathematically notated as factorial of first n numbers, n!

1! = 1

2! = 2.1=2

3!=3.2.1=6

4!=4.3.2.1=24

5!=5.4.3.2.1=120 & so on.

So we write above equation as

P(n) = n.P(n-1)=n!

Analogy wise, this experiment of arranging numbers or items can be questioned in another way: Calculate the number of way you can pickup n objects from n available objects.

Ordering of outcomes of identical trials:

Let us consider drawing a ball from a bag as a trial. Trial. Both of these trials are called identical trials as configuration and sequence of actions doesn't vary.

Ordered sets with replacement & repetition allowed:

After picking up a ball from bag, I shall note down the alphabet on it & return the ball back to its bag. This is called replacement. While writing down outcomes from both trials, if I get A from trial 1 and another A from trial 2, I shall allow it and write down as {A,A}. This is called repetition allowed. Let us say I drew the ball A in trial 1 and the ball B in trial 2. I got {A,B}. Consider another combination {B,A}. I shall write down these outcomes as distinct sets wherein order mattered and both these outcome sets are considered. Following is the complete set of all possible outcomes, noted down as ordered sets (order matters for distinction) for n=2 trials. That means trial 1 has a ball picked up from the bag, noted down & placed back into bag for subsequent repetition. For each ball drawn from bag, there are 5 possibilities ( equal to total number of distinct balls in bag ). As there are 5 balls, it boils down to 5.5=5²=25 ordered outcomes.

{A,A} {A,B} {A,C} {A,D} {A,E}

{B,A} {B,B} {B,C} {B,D} {B,E}

{C,A} {C,B} {C,C} {C,D} {C,E}

{D,A} {D,B} {D,C} {D,D} {D,E}

{E,A} {E,B} {E,C} {E,D} {E,E}

If I arranged a third trial, we get a total of 5.5.5=125 ordered outcomes. They look something {A,B,D} , {A,C,E}, {B,B,B} and so on. So the total combinations of N unique objects in n trials can be given as Nⁿ.

If you think this in terms of slots to be filled, the result is the same. For example, I have 2 slots:

X1 X2

These 2 slots can be filled by any of those alphabets {A,B,C,D,E}. X1 if filled by A, can have 5 possible combinations of X2 being filled by either of A or B or C or D or E. Now rotate the chance of X1 being filled by other alphabets other than A and repeat the same possible combinations for X2. You find yourself in 25 (Nⁿ, N=5 & n=2) outcomes.

Side notes:

Inherently, you can see if A & B are chosen to fill these slots, we have {A,B} & {B,A}. This is again similar to general ordering of 2 objects in 2!. Every outcome has its mirror reflection due to this inherent 2! ordering of 2 slots. If you exclude repeated outcomes ( {A,A}, {B,B}, {C,C}, {D,D}, {E,E} ), you have 20 outcomes which include mirror copies of unique outcomes. So if you divide 20 by 2!, you get 10 unique outcomes which you would see later. Repeated outcomes can't be consider mirror duplicates because both {A,A} of left-to-right orientation & {A, A} of right-to-left orientation are not visibly distinguishable and thus only a single copy of them is maintained.

Number of possible outcomes = Nⁿ

You can also treat Nⁿ outcomes for possible ways to pick up n objects from complete set of N objects, if you think n objects as equal to slots to be filled as per previous slot analogy.

Ordered sets with NO repetition allowed:

In this case, we are saying that we will not tolerate exact same outcome seen in trial 1 for subsequent trials. Meaning, if trial 1 witnessed A then trial 2 and further trials shouldn't get A as outcome. We do this by not replacing the picked up ball in any trial. So for picking up ball A from the bag in trial 1, there are only 4 ( which is 5-1) combinations possible from the same bag in next trial. So if N objects (outcomes) are possible in trial 1, (N-1) outcomes are possible in trial 2 for each of N in trial 1. Mathematically, this is summarized for 2 trials as N(N-1) total possible outcomes WITHOUT repetitions. In terms of slot analogy, if first slot is filled with A, 2nd slot has to be filled by either B, C or D but NOT A which is N-1 possible combinations. Repeat this for rest of alphabets and we have N.(N-1) combinations.

~~{A,A}~~ {A,B} {A,C} {A,D} {A,E}

{B,A} ~~{B,B}~~ {B,C} {B,D} {B,E}

{C,A} {C,B} ~~{C,C}~~ {C,D} {C,E}

{D,A} {D,B} {D,C} ~~{D,D}~~ {D,E}

{E,A} {E,B} {E,C} {E,D} ~~{E,E}~~

So for 3 trials, we get 5.4.3 = 60 outcomes. This is generalized as N.(N-1).(N-2). For 4 trials, we have N.(N-1).(N-2).(N-3) = 5.4.3.2 = 120 ordered sets with replacement and NO repetitions allowed. This goes on for n trials as N.(N-1).(N-2).(N-3).....(N-(n-1)) where n = number of trials and n<N. The last trial (n) has N-(n-1) because (n-1) objects have to be eliminated from total N due to its preceding (n-1) trials to eliminate redundancy.

Let us try to derive a general formula for this expression. If you recollect earlier n! definition for first n things, it is applicable for any non negative number. Even for 0, it has output 1 (just go with for now without thinking much). For n=5, 5! = 5x4x3x2x1=120

Now for 4 trials of above experiment, we have 5.4.3.2 outcomes. 3 trials resulted in 5.4.3 outcomes. What can be done to 5! (which is a 5x4x3x2x1 product) to give correct outcomes for n=4 trials and n=3 trials? The answer is division. If 5! is divided by 2! (which is 5-3) to indicate 3 trials, we have 5x4x3 = 60. Similarly, if 5! is divided by (5-4)! indicate 4 trials, we have 5x4x3x2 = 120.So N! must be divided by (N-n)! to give total outcomes for n identical trials for N objects with no repetitions allowed.

Number of possible outcomes =N!/(N-n)!

This also means that there are N!/(N-n)! ways to pickup n objects from N total objects to have ordered set outcomes with no repetitions.

Unordered sets with NO repetition allowed:

Unordered sets are those which treat {A,B} and {B,A} of equal value. This means we shouldn't account {B,A} outcome if {A,B} already occurred. If you take above total outcomes for N=5 objects and n=2 trials for ordered sets with NO repetitions allowed, we would eliminate more outcomes to avoid double counting for Unordered sets. This gives us 10 outcomes.

~~{A,A}~~ {A,B} {A,C} {A,D} {A,E}

~~{B,A}~~ ~~{B,B}~~ {B,C} {B,D} {B,E}

~~{C,A}~~ ~~{C,B}~~ ~~{C,C}~~ {C,D} {C,E}

~~{D,A}~~ ~~{D,B}~~ ~~{D,C}~~ ~~{D,D}~~ {D,E}

~~{E,A}~~ ~~{E,B}~~ ~~{E,C}~~ ~~{E,D}~~ ~~{E,E}~~

We have already considered this unique outcome derivation from sides notes in "Ordered set with replacement and repetition allowed" category above. As the number of slots (trials) to be filled are 2 and these slots can be arranged in 2! as per our first section on general ordering of n items, we need to divide the Ordered set outcomes without repetition by 2! to get Unordered set outcomes (all unique with no repetition) . For n trials wherein each trial can be visualized as a slot which should be filled with an object or a ball to be picked up, the division happens by n!

Number of possible outcomes =N!/(N-n)!/n!=N!/n!(N-n)!

This also means that there are N!/n!(N-n)! ways to pickup n objects from N total objects to have unordered (unique) set outcomes with no repetitions.

Probability:

Probability states the chance or "how likely?" a desired outcome can happen out of all possible outcomes of an event. The event in context can be something as trivial as tossing coins, rolling a die, picking up a playing card from a deck and so on. Probability concepts are expanded into more intricate concepts which are vital to Machine Learning.

Probability of a desired outcome = (total number of possible desired outcomes of an event)/(total number of all possible outcomes of an event)

Yes, it is expressed as a fraction. Take a coin for instance. There are two possible outcomes. "Heads" & "Tails". Let us pick "Heads" as our desired outcome. Then by formula, Probability (P) of Heads can be given as:

P(Heads) = 1/2

There can be only one Head in a single coin toss. Same goes for Tails.

P(Tails) = 1/2

Let us take another example. A dice has 6 upward facing sides, each with distinct number of dots. When dots are counted, these faces can be called as {1,2,3,4,5,6} outcomes. Total possible outcomes as "6".

Here the probability of occurrence of any of these outcomes in a single event "rolling the dice for once" is 1/6. This is because there are no duplicate numbers/faces on the dice. This explanation can be summarized as

P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6

If we combine probability of each of possible outcomes, we get "1".

P(1)+P(2)+P(3)+P(4)+P(5)+P(6) = 1/6+1/6+1/6+1/6+1/6+1/6=1

"1" denotes the completeness that accounts the occurrence of all possible outcomes. We can think that each of possible outcome owns a probability slice in this large pizza, denoted as "1".

Apply the same to a "Single toss of a coin" event. We get:

P(Heads)+P(Tails)=1/2+1/2=1

If Heads show up in a coin toss, we say "Heads" outcome occurred & "Tails" outcome didn't occur. The probability of an outcome to NOT occur is derived by subtracting the probability of that outcome from "1", the completeness. This gives the probability of occurrence of this event to NOT produce your interested outcome from the set of all outcomes. Mathematically,

P(NO Tails)=1-P(Tails)

Let us pick the probability of "3" not occurring in an event "Single roll of dice". We get as below:

P(NO 3) = 1-(Probability of outcome "3") = 1-P(3) = 1-(1/6) = 5/6

Literally, it says the chance of "3" NOT occurring is "5 out of 6 TIMES".

Normal Distribution:

Consider an event of flipping a single coin 5 times. Each flip outcome has 2 possible outcomes: a Head or a Tail. So this event can produce a total of 2 to power of 5 = 32 possible outcomes. In order to store the probability of occurrence of desired outcome, we select a variable called "Random Variable (X)".

Let us select Heads as our desired outcome. The probability of Heads not occurring at all can be given as P(X=0) = (Probability of no Heads (or) all Tails)/(Probability of all possible outcomes) = 1/32

We can visualize the slots to be filled with individual outcomes of flips in this event as below:

X1 X2 X3 X4 X5

If you think of no Heads at all, all these slots have to be filled by tails as T T T T T which is just a single possible rare outcome with a probability of 1 out of 32 outcomes of flipping a coin 5 times. Let us now consider probability of occurrence of 1 Head out of 5 flips, indicated by P(X=1). That can be either X1, X2, X3, X4 or X5 slots bearing the Heads and rest are tails. For instance, we can pick X3 slot witnessing Heads as outcome and rest of slots are all tails. This is visualized as below:

T T H T T

X1 as Tails has probability of (1/2). Other slots X2, X4 X5 have same probability (1/2) as coin flip in each turn is not affected with earlier outcome and each flip is an independent event. However, if you like to have the sequence in this exact manner but order not mattered, you need to multiple all these individual probabilities together as : (1/2).(1/2).X3.(1/4).(1/5). X3 is also (1/2) as probability of Heads occurring out of Head & Tails outcome of single coin flip is again a (1/2). So,

P(X=1) = (1/2).(1/2).(1/2).(1/2).(1/2) = 1/32

Friday, July 29, 2022

Introduction to neural networks for everyone

Humankind has been inspired by Nature in several of its cutting-edge technologies. From airplanes, and submarines to sonar communications, evident plagiarism (ahem.. inspiration!) of "Nature" is seen. Few measure the advancement of technology on how well we mimic the complexity of the natural order we all have been dwelling in. It is logical to think that if the boundaries of our scientific evolution have to push beyond, we imitate the most complex computing system nature has ever provided. Our brains are those complex biological computing machines representing "Natural Intelligence.". As this mimicry leads to an artificial product created by humankind, it is called "Artificial Intelligence."

Simplifying a complex architecture to learn basics:

Let's not think about super complicated categorization or classification of branches of Artificial Intelligence or Machine Learning. We came here to talk about Neural Networks & we shall diligently do that. Just think of Neural Networks as an utter simplified model of the brain.

A general approach ( from a handful of folks I talk to) to learning Neural Networks is not to be self-confused by a complex mixture of mathematical terms, models & technical articles, or publications all at once, especially if you are a beginner with the below average mathematical background. Rather, try to visualize the brain as a huge network of fundamental nodes called Neurons that are interconnected, as seen from the above simplification of the brain to a bunch of neurons at the end. Using this simple notion of interconnected neurons, we shall weave in several concepts and simple math at a reasonable pace to establish an information processing mechanism in it. We shall use "Neuron" & "Node" terms interchangeably.

Overview of the mechanism:

Input data is converted into several sub-questions, each given to a separate node to answer. As each node receives a simple "True" or "False" question (not always that simple but play along for now!), a choice is made based on the learning it gained. This assignment of sub-questions to several nodes happens parallelly.

Next, we shall explore how distinguished learning is imparted to each of these nodes & then examine the notion of layers (groups of nodes). As a layer attempts to answer all your sub-questions, only a handful of them are solved while the rest remains unsolved. So we need to pass them to the next layer. This process continues until a reasonable decision has arrived at the output layer. This process is comparable to rainwater seeping into several layers of earth and reaching the water table.

The above explanation might have triggered this imagination of layers. This is partly true except that these layers are visualized from left to right, instead of top to bottom.

Neurons & types:

A neuron accepts input information from its noodles-like structures called Dendrites on its head and outputs decisions through a long stem-like canal called the Axon. When a neuron likes to output info, it is said to be activated. It fires an electric impulse called "Action Potential" through Axon to its neighbors. Unless a neuron's input information is considerable level (threshold), it can't output any action potential. All inputs below this threshold fail to get neurons to activate and fire. This threshold is called the "Activation Threshold." An artificial neuron model called "Perceptron" is modeled after this behavior.

A Perceptron neuron can output either 0(False) or 1(True). So for a question like "Is this a hat ?", Perception can say either "Yes" or "No." If you chose any deceptively looking headcloth/bandana fashioned into a hat, your Perceptron outputs "Yes, it's a hat" if it is not trained enough (or) says "No, that ain't a hat!" if it had trained well to outsmart this deception. There isn't a benefit of doubt expressed through "Maybe" or "Perhaps" answers because neither "0" nor "1" can fit these words. "Activation Threshold" of your Perceptron can be programmed to output a "1" value by training it. Geez, this Perceptron can either be nothing..

Due to its primitive functionality, the Perceptron model cannot have a middle ground & it is not easy to train every possible picture of a hat in this universe through single/teamed human efforts. So a reasonable benefit of the doubt can fill those gaps. You can refer to publicly available datasets such as https://scale.com/open-datasets.

Think of a rating system from "1" to "10", with "1" being the lowest (or) NO and "10" being absolutely (or) YES. You now have the freedom to choose several points within this range to play middle ground for identifying a deceptively looking headcloth. This "1" to "10" rating allows you to guess close enough to the actual answer. A "Sigmoid" neuron do this "sort of" guess.

Its answer range has more intermediate values between and inclusive of "0" and "1," instead of just "0" & "1." For an input such as a deceptively looking hat, it can provide a confidence score of "0.866" instead of "1" to say "most likely!". However, there is a limit to this freedom as well. Why? There should be consistency in the output behavior of any model. Without any output behavioral pattern, it is impossible to control the behavior of your models and train them as you may be in constant fear, thinking how wild it can go with its guesswork without knowing what it is about to say next. A Sigmoid neuron output always follows a sinusoidal function on a 2D graph. Can you guess the output graph of Perceptron? An all-or-nothing kind of output always follows a step function.

https://ai-master.gitbooks.io/logistic-regression/content/weights.html

Does x mark the spot?

Let's try understanding the graph with more insight. The X-axis (horizontal line) represents the data input our neural network has been receiving. Y axis (vertical line) represents the result that our neural output has been saying. The above graph says that until X values "0," y values remain "0.". In other words, until there is a positive or "0" valued input to the neural network, its result is always "0." "0" and positive x-values are activation thresholds for this neuron (if it is a perceptron and its threshold is tuned to "0").

A sinusoidal function shows positive Y values even for a few negative values of x. So it is a bit more forgiving than Perceptron. The negative value of x from where the graph starts rising to positive y values is the activation threshold for your sigmoid modeled neural network!

Getting serious, but not that serious...

I introduced you to two neuron models. Please don't hurry yet to think of what those "x" values are in the real world. They are not the pictures to be identified. "x" values here represent intermediate/middle stage processed information derived from a piece of the complete picture that a given neuron has. We shall further explore other stages and then discuss them comprehensively at the end.

The output shape from the neuron determines its type. The output from a Perceptron neuron looks like a step function on a 2D graph while a Sigmoid neuron outputs a sine or an "S" shaped function. There are mathematical expressions for all these graphs. These output functions are "Activation Functions." We shall follow the "σ(z)" notation for the Activation function.

You can glimpse (Don't seriously read its literature yet! You are not ready) https://vitalflux.com/different-types-activation-functions-neural-networks/ to get the feel of the wide variety of Activation Functions available in the world of the neural network.

We follow a method to train a neuron to process input information. And that method is from Master Oogway:

"Through experience, you shall learn to give relative importance(s) to several facts you see of a situation before reaching a final conclusion"

Think about it. Oogway's wisdom is good advice for all scenarios in our lives (You are welcome for Philosophy 101). Let's think from a robot's eye perspective. Through its camera lens, it is trying to understand whether a birthday party is happening in the cafeteria corner of the office.

There are several inputs ( for a total of "100" points, "100" is 100% surety) to be considered here:

a) Is there a moderate to a large group of people? Any event needs a group of people. ( "10" points )

b) A cake on a table. It could be a birthday cake or a successful launch celebration cake. ("30" points)

c) A center person near the cake who could be celebrating a birthday ("20" points)

d) Decorations or decorative writings with inclusive words "Happy Birthday!" ("40" points)

Do you see where I am going with this? I gave the highest points to decorations as they are the most important clues for our robot to deduce what's going on. In comparison to decorations, other hints come next in preference as these are present for any other celebration. If decorative writings are missing & other clues are present, those clues are accepted. That gives a 60% chance that it can be a birthday. If our robot is not programmed to engage in social conversations, it should conclude with a deduction of 60% probability of a birthday without speaking to people. So an error margin of 40% has crept here due to a lack of details. Algorithms such as these play on the level of probabilities all the time.

In the neural world, we call this relative importance "Weights." Treat the above checklist as inputs for our robot. A checklist of inputs with weights is called "Weighted Inputs.". For example, an input d (called "Decorations") has a maximum weight of "40" points! A weighted input d carries the value "d" multiplied by "40." Relatively, it has more weight when compared to others as this is the most valuable input of all inputs our robot can get to conclude whether it's a birthday party or not. If all inputs are present, we get a score of 100 points to confirm a birthday party. This score of "100" points is the "Weighted sum" of all inputs. The Weighted Sum is the addition of all weighted inputs.

The calculation "ax10+bx30+cx20+dx40= Weighted Sum" provides a number to conclude the "Whole" situation by giving relative importance to "several clues" in the situation. Though these inputs are presented here in statements, their presence can be represented by either 0 (absent) or 1 (present) by logic. So "1x10+1x30+1x20+1x40 = 100" says all inputs present. No decorations? "1x10+1x30+1x20=60" is the adjusted weighted sum.

So the final score of "100" or "60" represents the input here but wait! We as humans know that "60" out of "100" is a decent probability. How should we tell the robot that this is a decent probability of being a birthday celebration? Our old friend "Activation function" comes to the rescue. So by equation,

Activation function (Weighted Sum) = Result

This formula says that a weighted sum must feed the Activation function to give a result (Yay or nay) as the final result. This concludes the two stages of processing within any neuron. Few neuron models might employ a scheme other than the weighted sum scheme. Concept-wise, they are all attempting to do the same, providing a single score as an input to the activation function that can represent the whole situation ( multiple inputs ). This "Weighted Sum" scheme or another scheme to arrive at a representative score is called "Transfer Function". A weighted sum is represented as Z.

Below are important formulae:

Stage-1: Transfer function Z = ΣXn.Wn
Stage-2: Activation function σ(z) = σ(ΣXn.Wn)

where Xn is the nth input, ranging from 1 to N. N is the total number of inputs & Wn is the weight of Xn input. If N=1, then X1 is the first input and W1 is X1's weight. "." means dot product or multiplication of X & W. Ideally, this dot product should be used for vectors in a 2D or 3D space. I chose the dot product symbol in place of "x" to avoid confusion as there is already an "X" in the above formula. "Σ" is the summation symbol. It denotes the addition of parameter(s) that follow it in the formula.

Take an input X1 and multiple it by its weight W1 and keep it aside. Get another input X2 & multiple by its weight W2 to get "X2.W2" and add to the previous product "X1.W1". Repeat this procedure for all inputs and corresponding weights. This would give you "X1.W1 + X2.W2+......+Xn.Wn" Now feed this as input to your activation function.

Above is an x-ray of Perceptron neuron internals. A Perceptron neuron has a Threshold output function. Meaning, that any weighted sum which is less than "1" when fed to Perceptron will give a "0" result. Any Weighted sum value greater than or equal to "1", when given to it, will yield a "1" value. "1" or ">1" is the Threshold value for the Perceptron neuron to fire the output 1. You might be wondering why a constant "1" input is fed with weight W0. This is sometimes referred to as Bias or Bias Shift(b). So weighted sum of that linkage is "1".W0 = W0 = b. This is a trainable weight to shift your output graph in any direction you need in a 2D or 3D coordinate system (X, Y & Z axes). Useful to identify several similar cases in one neural network while training it. Don't think too much about bias now as it is reserved for intermediate learners. If you are curious about a 3D space graph with a bias, refer to the below animation or skip over it.

https://stackoverflow.com/questions/2480650/what-is-the-role-of-the-bias-in-neural-networks

For the Sigmoid neuron, we have the below standard formula:

where e is a mathematical constant approximately equal to 2.71828. Just substitute your ΣX.W in above "x" and output is derived. The value of the total weighted sum (whether a positive or negative value, depending on the scale of x-axis you chose) for which the sinusoidal wave starts emitting positive values (or) starts to rise can be thought of as the threshold value for the sigmoid neuron. Irrespective of your weighted sum value, the output always follows a sinusoidal wave.

As you see, the sine wave rises for negative values of the x-axis (weighted sums). Now let's ask ourselves. If we use a sigmoid neuron to model our neural network, there are positive outputs for our above-weighted sums equal to "0" & this graph even shows a negative valued weighted sum on X-axis. How is this even possible in the real world? The answer lies in the scale of numerical values & polarities we chose for our weights. A weight can be "+0.5" or "-0.9" or "-20." It's your call.

Layers:

So far, we have studied two legacy neuron models. Modern neural networks have neurons with far more sophistication than these models. A Deep Neural Network has several neurons (billions of them!). These neurons are arranged into several layers depending on either their role or the type of activation function they use. For example, the above diagram shows blue-colored neurons that accept input data. These blue neurons constitute the input layer. Neurons of the input layer convert input information to an understandable form and pass it to a hidden layer. Any layer between the input layer & output layer is called the "Hidden Layer." They perform several computations. There can be more than one hidden layer ( Deep Neural Networks: DNN ). All connections going from a neuron of the input layer to a neuron of an adjacent hidden layer carry corresponding weight values. Similar weighted connections happen among all hidden layers. The last hidden layer passes its results to the output layer. We have an output layer (green bubbles) to provide final results in a compatible format to an interface ( a software, a display, or other hardware ).

Hidden layers are created based on the type of neurons in them. Let's say you want a group of neurons acting as Perceptrons & another group as Sigmoids. A Perceptron hidden layer comprising of perceptron nodes followed by a Sigmoid hidden layer comprising of sigmoid nodes can be your total hidden layers between input and output layers. The depth of DNN is equal to its total number of hidden layers.

Let's say your neural network needs to identify a type of hat. Your Neural network divides the image of a hat into four quadrants (pieces) and feeds each piece to individual neurons in its input layer. Input layer neurons convert those pieces to a form understandable by neurons on the hidden layer. The topmost neuron in the first hidden layer may have the role of identifying whether there is a curved edge in the piece it received. Another neuron of the same layer identifies another part of the hat through the input it receives. This divide & conquer process of the actual problem is called "Decomposition." Finally, the result from the last hidden layer passes to the output layer.

Training:

Dr. Hermann Gottlieb from Pacific Rim 2013: Numbers don’t lie. Politics, poetry, promises… these are lies. Numbers are as close as we get to the handwriting of God

Dr. Hermann warned Stacker Pentecost of the double event ( 2 Kaiju monsters emerging from tectonic plate fissure portal on ocean bed ) to occur in 4 days while being competed by Dr. Newton Geiszler at Shatterdome. He relied on statistical data & in a way, created a prediction model similar to a neural network model through meticulous calculations. With our neural network schema assembled, we can start training it with lots of data to be reliable similar to Hermann's model that went on to save humanity. Huge props to Newton too!

We begin by assigning random & yet reasonable scaled numerical values to our weights & feed already available data with results to our neurons. They process the info & output results depending on their activation function. You can compare the model's output with the actual results in data & get the difference, the error margin. Later, we adjust the weights of our inputs to provide us with a different weighted sum & a different output is seen from our activation function but with a much lesser error margin than before. This process is continued until the error margin is either negligible or saturates. We can then conclude that our training is complete & our model is ready to face the world.

The training sequence of a neural network is called "epochs" or "iterations." Depending on other nuts & bolts you chose for your model, there may be a few parameters that shouldn't change during training & have to be fixed before epochs begin. They are called "Hyperparameters" & the process of setting them is called "Hyperparameter tuning.". Your neural network adapts with training. A neural network can perform three types of learning. With supervised learning, your neural network model is given several "Hat" pictures with an indication or a label that they are all hats. A network is supposed to adapt its weights eventually to recognize a hat from an unlabeled picture post-training. Unsupervised learning removes the labels from all "Hat" pictures and lets the network model recognize a pattern to identify a hat post-training. Semi-supervised learning is the combination of both styles.

That's a lot. I get it. But hey, you are now a graduate of Neural networks basics. In my eye, you are no longer a beginner. The next article shall be at an intermediate level. Maybe watch an episode of Friends on HBO Max or Youtube Movies and come back later for the upcoming article. But if you think you are done at this point, well, it's been nice having you here. Take care ...