Let’s implement a regression example where the aim is to coach a network to predict the worth of a node given the worth of all other nodes i.e. each node has a single feature (which is a scalar value). The aim of this instance is to leverage the inherent relational information encoded within the graph to accurately predict numerical values for every node. The important thing thing to notice is that we input the numerical value for all nodes except the goal node (we mask the goal node value with 0) then predict the goal node’s value. For every data point, we repeat the method for all nodes. Perhaps this might come across as a bizarre task but lets see if we are able to predict the expected value of any node given the values of the opposite nodes. The info used is the corresponding simulation data to a series of sensors from industry and the graph structure I even have chosen in the instance below relies on the actual process structure. I even have provided comments within the code to make it easy to follow. You will discover a duplicate of the dataset here (Note: that is my very own data, generated from simulations).

This code and training procedure is much from being optimised however it’s aim is for example the implementation of GNNs and get an intuition for the way they work. A difficulty with the currently way I even have done that ought to definitely not be done this fashion beyond learning purposes is the masking of node feature value and predicting it from the neighbours feature. Currently you’d must loop over each node (not very efficient), a a lot better approach to do is the stop the model from include it’s own features within the aggregation step and hence you wouldn’t have to do one node at a time but I assumed it is less complicated to construct intuition for the model with the present method:)

**Preprocessing Data**

Importing the mandatory libraries and Sensor data from CSV file. Normalise all data within the range of 0 to 1.

`import pandas as pd`

import torch

from torch_geometric.data import Data, Batch

from sklearn.preprocessing import StandardScaler, MinMaxScaler

from sklearn.model_selection import train_test_split

import numpy as np

from torch_geometric.data import DataLoader# load and scale the dataset

df = pd.read_csv('SensorDataSynthetic.csv').dropna()

scaler = MinMaxScaler()

df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

Defining the connectivity (edge index) between nodes within the graph using a PyTorch tensor — i.e. this provides the system’s graphical topology.

`nodes_order = [`

'Sensor1', 'Sensor2', 'Sensor3', 'Sensor4',

'Sensor5', 'Sensor6', 'Sensor7', 'Sensor8'

]# define the graph connectivity for the info

edges = torch.tensor([

[0, 1, 2, 2, 3, 3, 6, 2], # source nodes

[1, 2, 3, 4, 5, 6, 2, 7] # goal nodes

], dtype=torch.long)

The Data imported from csv has a tabular structure but to make use of this in GNNs, it should be transformed to a graphical structure. Each row of information (one statement) is represented as one graph. Iterate through Each Row to Create Graphical representation of the info

A mask is created for every node/sensor to point the presence (1) or absence (0) of information, allowing for flexibility in handling missing data. In most systems, there could also be items with no data available hence the necessity for flexibility in handling missing data. Split the info into training and testing sets

`graphs = []`# iterate through each row of information to create a graph for every statement

# some nodes won't have any data, not the case here but created a mask to permit us to cope with any nodes that do not need data available

for _, row in df_scaled.iterrows():

node_features = []

node_data_mask = []

for node in nodes_order:

if node in df_scaled.columns:

node_features.append([row[node]])

node_data_mask.append(1) # mask value of to point present of information

else:

# missing nodes feature if mandatory

node_features.append(2)

node_data_mask.append(0) # data not present

node_features_tensor = torch.tensor(node_features, dtype=torch.float)

node_data_mask_tensor = torch.tensor(node_data_mask, dtype=torch.float)

# Create a Data object for this row/graph

graph_data = Data(x=node_features_tensor, edge_index=edges.t().contiguous(), mask = node_data_mask_tensor)

graphs.append(graph_data)

#### splitting the info into train, test statement

# Split indices

observation_indices = df_scaled.index.tolist()

train_indices, test_indices = train_test_split(observation_indices, test_size=0.05, random_state=42)

# Create training and testing graphs

train_graphs = [graphs[i] for i in train_indices]

test_graphs = [graphs[i] for i in test_indices]

**Graph Visualisation**

The graph structure created above using the sting indices may be visualised using networkx.

`import networkx as nx`

import matplotlib.pyplot as pltG = nx.Graph()

for src, dst in edges.t().numpy():

G.add_edge(nodes_order[src], nodes_order[dst])

plt.figure(figsize=(10, 8))

pos = nx.spring_layout(G)

nx.draw(G, pos, with_labels=True, node_color='lightblue', edge_color='gray', node_size=2000, font_weight='daring')

plt.title('Graph Visualization')

plt.show()

**Model Definition**

Let’s define the model. The model incorporates 2 GAT convolutional layers. The primary layer transforms node features to an 8 dimensional space, and the second GAT layer further reduces this to an 8-dimensional representation.

GNNs are highly vulnerable to overfitting, regularation (dropout) is applied after each GAT layer with a user defined probability to stop over fitting. The dropout layer essentially randomly zeros a number of the elements of the input tensor during training.

The GAT convolution layer output results are passed through a completely connected (linear) layer to map the 8-dimensional output to the ultimate node feature which on this case is a scalar value per node.

Masking the worth of the goal Node; as mentioned earlier, the aim of this of task is to regress the worth of the goal node based on the worth of it’s neighbours. That is the explanation behind masking/replacing the goal node’s value with zero.

`from torch_geometric.nn import GATConv`

import torch.nn.functional as F

import torch.nn as nnclass GNNModel(nn.Module):

def __init__(self, num_node_features):

super(GNNModel, self).__init__()

self.conv1 = GATConv(num_node_features, 16)

self.conv2 = GATConv(16, 8)

self.fc = nn.Linear(8, 1) # Outputting a single value per node

def forward(self, data, target_node_idx=None):

x, edge_index = data.x, data.edge_index

edge_index = edge_index.T

x = x.clone()

# Mask the goal node's feature with a worth of zero!

# Aim is to predict this value from the features of the neighbours

if target_node_idx shouldn't be None:

x[target_node_idx] = torch.zeros_like(x[target_node_idx])

x = F.relu(self.conv1(x, edge_index))

x = F.dropout(x, p=0.05, training=self.training)

x = F.relu(self.conv2(x, edge_index))

x = F.relu(self.conv3(x, edge_index))

x = F.dropout(x, p=0.05, training=self.training)

x = self.fc(x)

return x

**Training the model**

Initialising the model and defining the optimiser, loss function and the hyper parameters including learning rate, weight decay (for regularisation), batch_size and variety of epochs.

`model = GNNModel(num_node_features=1) `

batch_size = 8

optimizer = torch.optim.Adam(model.parameters(), lr=0.0002, weight_decay=1e-6)

criterion = torch.nn.MSELoss()

num_epochs = 200

train_loader = DataLoader(train_graphs, batch_size=1, shuffle=True)

model.train()

The training process is fairly standard, each graph (one data point) of information is passed through the forward pass of the model (iterating over each node and predicting the goal node. The loss from the prediction is gathered over the defined batch size before updating the GNN through backpropagation.

`for epoch in range(num_epochs):`

accumulated_loss = 0

optimizer.zero_grad()

loss = 0

for batch_idx, data in enumerate(train_loader):

mask = data.mask

for i in range(1,data.num_nodes):

if mask[i] == 1: # Only train on nodes with data

output = model(data, i) # get predictions with the goal node masked

# check the feed forward a part of the model

goal = data.x[i]

prediction = output[i].view(1)

loss += criterion(prediction, goal)

#Update parameters at the top of every set of batches

if (batch_idx+1) % batch_size == 0 or (batch_idx +1 ) == len(train_loader):

loss.backward()

optimizer.step()

optimizer.zero_grad()

accumulated_loss += loss.item()

loss = 0average_loss = accumulated_loss / len(train_loader)

print(f'Epoch {epoch+1}, Average Loss: {average_loss}')

**Testing the trained model**

Using the test dataset, pass each graph through the forward pass of the trained model and predict each node’s value based on it’s neighbours value.

`test_loader = DataLoader(test_graphs, batch_size=1, shuffle=True)`

model.eval()actual = []

pred = []

for data in test_loader:

mask = data.mask

for i in range(1,data.num_nodes):

output = model(data, i)

prediction = output[i].view(1)

goal = data.x[i]

actual.append(goal)

pred.append(prediction)

**Visualising the test results**

Using iplot we are able to visualise the anticipated values of nodes against the bottom truth values.

`import plotly.graph_objects as go`

from plotly.offline import iplotactual_values_float = [value.item() for value in actual]

pred_values_float = [value.item() for value in pred]

scatter_trace = go.Scatter(

x=actual_values_float,

y=pred_values_float,

mode='markers',

marker=dict(

size=10,

opacity=0.5,

color='rgba(255,255,255,0)',

line=dict(

width=2,

color='rgba(152, 0, 0, .8)',

)

),

name='Actual vs Predicted'

)

line_trace = go.Scatter(

x=[min(actual_values_float), max(actual_values_float)],

y=[min(actual_values_float), max(actual_values_float)],

mode='lines',

marker=dict(color='blue'),

name='Perfect Prediction'

)

data = [scatter_trace, line_trace]

layout = dict(

title='Actual vs Predicted Values',

xaxis=dict(title='Actual Values'),

yaxis=dict(title='Predicted Values'),

autosize=False,

width=800,

height=600

)

fig = dict(data=data, layout=layout)

iplot(fig)

Despite an absence of tremendous tuning the model architecture or hyperparameters, it has done a good job actually, we could tune the model further to get improved accuracy.

This brings us to the top of this text. GNNs are relatively newer than other branches of machine learning, it should be very exciting to see the developments of this field but in addition it’s application to different problems. Finally, thanks for taking the time to read this text, I hope you found it useful in your understanding of GNNs or their mathematical background.

*Unless otherwise noted, all images are by the creator*