Home Artificial Intelligence Now You See Me (CME): Concept-based Model Extraction

Now You See Me (CME): Concept-based Model Extraction

0
Now You See Me (CME): Concept-based Model Extraction

Leveraging Semi-Supervised Concept-based Models with CME

CME relies on an analogous remark highlighted in [3], where it was observed that vanilla CNN models often retain a high amount of knowledge pertaining to concepts of their hidden space, which could also be used for concept information mining at no extra annotation cost. Importantly, this work considered the scenario where the underlying concepts are unknown, and needed to be extracted from a model’s hidden space in an unsupervised fashion.

With CME, we make use of the above remark, and consider a scenario where we have knowledge of the underlying concepts, but we only have a small amount of sample annotations for every these concepts. Similarly to [3], CME relies on a given pre-trained vanilla CNN and the small amount of concept annotations in an effort to extract further concept annotations in a semi-supervised fashion, as shown below:

CME model processing. Image by the creator.

As shown above, CME extracts the concept representation using a pre-trained model’s hidden space in a post-hoc fashion. Further details are given below.

Concept Encoder Training: as an alternative of coaching concept encoders from scratch on the raw data, as done in case of CBMs, we setup our concept encoder model training in a semi-supervised fashion, using the vanilla CNN’s hidden space:

  • We start by pre-specifying a set of layers L from the vanilla CNN to make use of for concept extraction. This could range from all layers, to simply the previous few, depending on available compute capability.
  • Next, for every concept, we train a separate model on top of the hidden space of each layer in L to predict that idea’s values from the layer’s hidden space
  • We proceed to choosing the model and corresponding layer with the perfect model accuracy because the “best” model and layer for predicting that idea.
  • Consequently, when making concept predictions for an idea i, we first retrieve the hidden space representation of the perfect layer for that idea, after which pass it through the corresponding predictive model for inference.

Overall, the concept encoder function could be summarised as follows (assuming there are k concepts in total):

CME Concept Encoder equation. Image by the creator.
  • Here, p-hat on the LHS represents the concept encoder function
  • The gᵢ terms represent the hidden-space-to-concept models trained on top of the various layer hidden spaces, with i representing the concept index, starting from 1 to k. In practice, these models could be fairly easy, corresponding to Linear Regressors, or Gradient Boosted Classifiers
  • The f(x) terms represent the sub-models of the unique vanilla CNN, extracting the input’s hidden representation at a specific layer
  • In each cases above, superscripts specify the “best” layers these two models are operating on

Concept Processor Training: concept processor model training in CME is setup by training models using task labels as outputs, and concept encoder predictions as inputs. Importantly, these models are operating on a far more compact input representation, and may consequently be represented directly via interpretable models, corresponding to Decision Trees (DTs), or Logistic Regression (LR) models.

LEAVE A REPLY

Please enter your comment!
Please enter your name here