Home Artificial Intelligence Extending PAC Learning to a Strategic Classification Setting

Extending PAC Learning to a Strategic Classification Setting

0
Extending PAC Learning to a Strategic Classification Setting

Why Strategic Classification Is Useful: Motivation

Binary classification is a cornerstone of machine learning. It was the primary topic I used to be taught after I took an introductory course on the topic; the real-world example we examined back then was the issue of classifying emails as either spam or not spam. Other common examples include diagnosing a disease and screening resumes for a job posting.

The essential binary classification setup is intuitive and simply applicable to our day-to-day lives, and it might probably function a helpful demonstration of the ways we are able to leverage machine learning to unravel human problems. But how often can we stop to think about the proven fact that people often have a vested interest within the classification final result of such problems? Spammers want their emails to make it through spam filters, not everyone wants their COVID test to return back positive, and job seekers could also be willing to stretch the reality to attain an interview. The info points aren’t just data points — they’re energetic participants within the classification process, often aiming to game the system to their very own profit.

In light of this, the canonical binary classification setup seems a bit simplistic. Nevertheless, the complexity of reexamining binary classification while tossing out the implicit assumption that the objects we wish to categorise are uninfluenced by external stakes sounds unmanageable. The preferences that might affect the classification process are available so many various forms — how could we possibly take all of them into consideration?

It seems that, under certain assumptions, we are able to. Through a clever generalization of the canonical binary classification model, the paper’s authors show the feasibility of designing computationally-tractable, gaming-resistant classification algorithms.

From Data Points to Rational Agents: Preference Classes

First, if we would like to be as realistic as possible, now we have to properly consider the wide breadth of forms that real-world preferences can take amongst rational agents. The paper mentions five increasingly general categories of preferences (which I’ll call preference classes). The names I’ll use for them are my very own, but are based on the terminology utilized in the paper.

  1. Impartial: No preferences, identical to in canonical binary classification.
  2. Homogeneous: An identical preferences across all of the agents involved. For instance, inside the set of people who find themselves willing to fill out the paperwork mandatory to use for a tax refund, we are able to reasonably expect that everyone seems to be equally motivated to get their a reimbursement (i.e., to be classified positively).
  3. Adversarial: Equally-motivated agents aim to induce the other of their true labels. Consider bluffing in poker — a player with a weak hand (negatively classified) wants their opponents to think they’ve a robust hand (positively classified), and vice versa. For the “equally-motivated” part, imagine all players bet the identical amount.
  4. Generalized Adversarial: Unequally-motivated agents aim to induce the other of their true labels. This isn’t too different from the plain Adversarial case. Still, it needs to be easy to grasp how a player with $100 dollars on the road can be willing to go to greater lengths to deceive their opponents than a player betting $1.
  5. General Strategic: Anything goes. This preference class goals to encompass any set of preferences possible. All 4 of the previously mentioned preference classes are strict subsets of this one. Naturally, this class is the principal focus of the paper, and a lot of the results demonstrated within the paper apply to it. The authors give the wonderful example of faculty applications, where “students [who] have heterogeneous preferences over universities […] may manipulate their application materials in the course of the admission process.

How can the canonical classification setup be modified to account for such wealthy agent preferences? The reply is astoundingly easy. As an alternative of limiting our scope to (x, y) ∈ X × { -1, 1 }, we consider data points of the shape (x, y, r) ∈ X × { -1, 1 } × R. Some extent’s r value represents its preference, which we are able to break down into two equally essential components:

  • The sign of r indicates whether the information point desires to be positively or negatively classified (r > 0 or r < 0, respectively).
  • The absolute value of r specifies how strong the information point’s preference is. For instance, an information point with r = 10 can be way more strongly motivated to control its feature vector x to make sure it finally ends up being positively classified than an information point with r = 1.

What determines the preference class we operate inside is the set R. We will formally define each of the aforementioned preference classes when it comes to R and see how the formal definitions align with their intuitive descriptions and examples:

  1. Impartial: R = { 0 }. (This makes it abundantly clear that the strategic setup is only a generalization of the canonical setup.)
  2. Homogeneous: R = { 1 }.
  3. Adversarial: R = { -1, 1 }, with the added requirement that each one data points prefer to be classified as the other of their true labels.
  4. Generalized Adversarial: R ⊆ ℝ (and all data points prefer to be classified as the other of their true labels.)
  5. General Strategic: R ⊆ ℝ.

Giving Preference Magnitude Meaning: Cost Functions

Clearly, though, R by itself isn’t enough to construct a whole general strategic framework. The very idea of an information point’s preference having a certain magnitude is meaningless without tying it to the associated fee the information point incurs in manipulating its feature vector. Otherwise, any data point with a positive r, regardless of how small, would don’t have any reason not to control its feature vector ad infinitum. That is where the concept of cost functions comes into play.

Let c: X × X → ℝ⁺. For simplicity, we are going to assume (because the paper’s authors do) that c is induced by seminorms. We are saying that a test data point (x, y, r) may transform its feature vector x into z X with cost c(z; x). It’s essential to notice on this context that the paper assumes that the training data is unmanipulated.

We will divide cost functions into two categories, with the previous being a subset of the latter. An instance-invariant cost function is similar across all data points. To place it more formally:

∃ℓ: X × X → ℝ⁺ . ∀(x, y, r) ∈ X × { -1, 1 } × R . ∀zX . c(z; x) = ℓ(zx)

I.e., there exists a function ℓ such that for all data points and all potential manipulated feature vectors, c(z ; x) simply takes the worth of ℓ(zx).

An instance-wise cost function may vary between data points. Formally:

∀(x, y, r) ∈ X × { -1, 1 } × R . ∃ℓ: X × X → ℝ⁺ .zX . c(z; x) = ℓ(z – x)

I.e., each data point can have its own function,, and c(z; x) takes the worth of ℓ(z – x) for every individual data point.

As we are going to see in the ultimate article on this series, while the difference between the 2 varieties of cost functions could seem subtle, instance-wise cost functions are significantly more expressive and harder to learn.

Preference Classes and Cost Functions in Motion: An Example

Let’s take a have a look at an example given within the paper to assist hammer home the elements of the setup we’ve covered thus far.

Image by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from PAC-Learning for Strategic Classification (use under CC-BY 4.0 license).

In this instance, now we have a call boundary induced by a linear binary classifier and 4 data points with individual preferences. General strategic is the one applicable preference class on this case.

The dotted perimeter around each xᵢ shows the manipulated feature vectors z to which it might cost the purpose exactly 1 to maneuver. Since we assume the associated fee function is induced by seminorms, all the things inside a fringe has a price of lower than 1 for the corresponding data point to maneuver to. We will easily tell that the associated fee function in this instance varies from data point to data point, which suggests it’s instance-wise.

As we are able to see, the leftmost data point (x₁, -1, -1) has no incentive to cross the choice boundary because it is on the negative side of the choice boundary while also having a negative preference. (x₄, -1, 2), nonetheless, desires to be positively classified, and because the reward for manipulating x to cross the boundary (which is 2) outweighs the associated fee of doing so (which is lower than 1), it is smart to undergo with the manipulation. (x₃, 1, -2) is symmetric to (x₄, -1, 2), also deciding to control its feature to realize its desired classification final result. Lastly, (x₂, -1, 1), the associated fee function of which we are able to see is predicated on taxicab distance, opts to remain put no matter its preference to be positively classified. It is because the associated fee of manipulating x₂ to cross the choice boundary can be greater than 1, surpassing the reward the information point would stand to achieve by doing so.

Assuming the agents our data points represent are rational, we are able to very easily tell when an information point should manipulate its feature vector (advantages outweigh costs) and when it shouldn’t (costs outweigh advantages). The following step is to show our intuitive understanding into something more formal.

Balancing Costs & Advantages: Defining Data Point Best Response

This leads us to define the data point best response:

So we’re searching for the feature vector(s) zX that maximize… what exactly? Let’s break down the expression we’re aiming to maximise into more manageable parts.

  • h: A given binary classifier (h: X → { -1, 1 }).
  • c(z; x): As stated above, this expresses the cost of modifying the feature vector x to be z.
  • 𝕀(h(z) = 1): Here, 𝕀(p) is the indicator function, returning 1 if the predicate p is upheld or 0 if it isn’t. The predicate h(z) = 1 is true if the vector z into account is positively classified by h. Putting that together, we discover that 𝕀(h(z) = 1) evaluates to 1 for any z that’s positively classified. If r is positive, that’s good. If it’s negative, that’s bad.

The underside-line is that we would like to seek out vector(s) z for which 𝕀(h(z) = 1) ⋅ r, which we are able to call the realized reward, outweighs the associated fee of manipulating the unique x into z by as much as possible. To place it in game theoretic terms, the information point best response maximizes the utility of its corresponding agent within the context of the binary classification into account.

Putting It All Together: A Formal Definition of the Strategic Classification Problem

Finally, we’ve laid all of the mandatory groundwork to formally define the strategic classification problem.

A diagram illustrating the formal definition of the strategic classification problem. Image by creator.

Given a hypothesis class H, a preference class R, a price function c, and a set of n data points drawn from a distribution D, we would like to seek out a binary classifier h’ that minimizes the loss as defined within the diagram above. Note that the loss is solely a modification of the canonical zero-one loss, plugging in the information point best response as an alternative of h(x).

Conclusion

Ranging from the canonical binary classification setup, we introduced the notion of preference classes. Next, we saw learn how to formalize that notion using an r value for every data point. We then saw how cost functions complement data point preferences. After that, we broke down an example before defining the important thing concept of data point best response based on the ideas we explored beforehand. Lastly, we used the information point best response to define the modified zero-one loss utilized in the definition of the strategic classification problem.

Join me next time as I define and explain the strategic VC dimension, which is the natural next step from where we left off this time.

References

[1] R. Sundaram, A. Vullikanti, H. Xu, F. Yao. PAC-Learning for Strategic Classification (2021), International Conference on Machine Learning.

LEAVE A REPLY

Please enter your comment!
Please enter your name here