Home Artificial Intelligence Graph Data Science for Tabular Data

Graph Data Science for Tabular Data

0
Graph Data Science for Tabular Data

Graph methods are more general than it’s possible you’ll think

Towards Data Science
Photo by Alina Grubnyak on Unsplash

Graph data science methods are frequently applied to data that has some inherent graphical nature, e.g., molecular structure data, transport network data, etc. Nonetheless, graph methods will also be useful on data which doesn’t display any obvious graphical structure, equivalent to the tabular datasets utilized in machine learning tasks. In this text I’ll display simply and intuitively — with none mathematics or theory — that by representing tabular data as a graph we will open latest possibilities for the right way to perform inference on this data.

To maintain things easy, I’ll use the Credit Approval dataset below for instance. The target is to predict the worth of Approval based on the values of a number of of the opposite attributes. There are various classification algorithms that may be used to do that, but let’s explore how we’d approach this using graphs.

Credit Approval dataset, created by creator

Graph representation

The very first thing to contemplate is the right way to represent the info as a graph. We would love to capture the intuition that the greater the variety of attribute values which are shared between two instances, the more similar are those instances. We’ll use one node to represent each instance (we discuss with these nodes as instance nodes), and one node for every possible attribute value (these are attribute value nodes). The connections between instance nodes and attribute value nodes are bi-directional and reflect the knowledge within the table. To maintain things really easy, we’ll omit attributes Employment and History. Here is the graph.

Graph representation for Credit Approval dataset. Image by the creator.

Message Passing

Given attribute values for some latest instance, there are several ways wherein we’d use graph methods to predict some unknown attribute value. We’ll use the notion of message passing. Here is the procedure we’ll use.

Message-passing procedure

Initiate a message with value 1 on the starting node, and let this node pass the message to every node to which it’s connected. Let any node that receives a message then pass on the message (dilated by an element k, where 0 <k<1) to one another node to which it's connected. Proceed message passing until either a goal node is reached (i.e., a node corresponding to the attribute whose value is being predicted), or there are not any further nodes to pass the message to. Since a message can't be passed back to a node from which it was received the method is guaranteed to terminate.

When message passing has accomplished, each node within the graph may have received zero or more messages of various value. Sum these values for every node belonging to the goal attribute, after which normalize these (sum) values in order that they themselves sum to 1. Interpret the normalized values as probabilities. These probabilities can then be used to either predict the unknown attribute value or to impute a random value drawn from the distribution. Dilating message values at each pass reflects the intuition that longer paths should contribute less to the probability estimate than shorter paths.

Example 1

Suppose that we want to predict the worth of Approval on condition that Income is Low. The arrows on the graph below illustrate the operation of the message-passing procedure, with the thickness of every arrow representing the message value (dilated by factor k = 0.5 at every hop).

Estimating the distribution of Approval given Income is Low. Image by the creator.

The message is initiated at node Income:Low (coloured green). This node passes messages of value 1 to each Instance 1 and Instance 2, which then each pass the message (with a dilated value of 0.5) to nodes Education:College and Approval:No. Note that since Education:College receives messages from each Instance 1 and Instance 2, it must pass each of those messages to Instance 3, with a dilated value of 0.25. The numbers at each node of the goal variable show the sum of message values received and (in parentheses) the normalized values as percentages. We’ve the next probabilities for Approval, conditional on Income is Low:

  • Prob (Approval is ‘Yes’ | Income is Low) = 20%
  • Prob (Approval is ‘No’ | Income is Low) = 80%

These probabilities are different to what we’d have obtained from a count-based prediction from the table. Since two of the five instances have Income Low, and each of those have Approval No, a count-based prediction would result in a probability of 100% for Approval No.

The message-passing procedure has taken into consideration that the attribute value Education College, possessed by Instances 1 and a couple of, can be possessed by Instance 3, which has Approval Yes, thus contributing to the full message value received at node Approval:Yes. If we had incorporated the extra attributes Employment and History in our graph, this is able to likely have further increased the variety of paths connecting the beginning node to the goal nodes, thereby utilizing additional contextual information, and improving the estimate of the probability distribution.

Example 2

The message-passing procedure will also be used when conditioning on multiple attribute. In such a case we simply initiate a message at each of the nodes corresponding to the attribute values we’re conditioning on and follow the identical procedure from each. The graph below shows the results of predicting the worth of Approval given Income is Low and Education is Graduate. The sequence of messages originating from each starting node are shown in numerous colours.

Estimating the distribution of Approval given Income is Low and Education is Graduate. Image by the creator.

Instance 4 has Education value Graduate, and due to this fact contributes to the sum of message values received at node Approval:Yes. An extra contribution to Approval:Yes is made by Instance 5, which shares Income High with Instance 4.

[Note: in each of these examples we have used Approval as our target variable; however, we could have estimated the probability distributions for Income or Education in exactly the same way].

LEAVE A REPLY

Please enter your comment!
Please enter your name here