Can We Predict Police Misconduct?

Steven Davis speaks to Jens Ludwig about his recent work on "Predicting Police Misconduct.” They delve into whether we can predict which police officers are most at risk of serious misconduct. The potential benefits of accurate prediction are large, but it also challenging to develop and implement robust prediction models.

Learn more

Steven Davis:

Can we predict which police officers are at high risk of serious misconduct on or off the job? In particular, can we develop a prediction model informed by data that accurately identifies which police officers are most likely to engage in misconduct in the future? That's the question for today.

Welcome to Economics Applied. My name is Steven Davis, host of the show and senior fellow at the Hoover Institution. Our guest today is Jens Ludwig. He's a distinguished professor at the Harris School of Public Policy at the University of Chicago where he directs the university's crime lab. That's a research outfit, not a criminal enterprise. He also co-directs the working group on the economics of crime for the National Bureau of Economic Research. Welcome, Jens. It's really great to have you on the show.

Jens Ludwig:

Great. Thank you so much for having me.

Steven Davis:

I've been reading this really interesting recent working paper that you have. It's titled, Predicting Police Misconduct, co-authored with Greg Stoddard and Dylan Fitzpatrick. So there's a lot to get into in this paper and in discussing it, I'd like to adopt three stipulations, and this is just kind of to set the boundaries of our discussion so the audience knows what we're about today.

First, let's stipulate that police misconduct is a serious issue. We don't need to spend time arguing about that. Second, a premise of your analysis, which I think is pretty reasonable and I'd also like to kind of proceed with, is that it would be easier to reduce police misconduct if we could accurately predict which police officers were at the greatest risk of misconduct. So we're going to take that as a premise in today's discussion.

And the third stipulation is that whether and how to use prediction models in practice to reduce police misconduct is an important question. It raises a whole host of its own issues. We will largely leave that question aside from today's discussion, except as it intersects with the question of building models to predict police misconduct because your paper's really about advancing the ball on the prediction front. That's its contribution to the literature, and that's where I want to focus our discussion. That work for you?

Jens Ludwig:

That's terrific. Terrific. Thank you, Steven.

Steven Davis:

Okay, great. So maybe you can start by telling us about the context of your study. What's the setting in history? What's the focus? A particular city, which I'll let you tell us about. People can guess. What does police misconduct mean exactly in your analysis so we all have a clear common understanding of that going forward? And what data are available to you as you try to build a statistical model to predict police misconduct? So take it away, Jens.

Jens Ludwig:

Yeah, great, thank you. So let me provide a little bit of a big picture background context for starters, which is the research center that I started, the University of Chicago Crime Lab, we started 15 years ago after a graduate student got shot and killed a block off campus after defending his doctoral dissertation in chemistry. I think you were probably still here-

Steven Davis:

Yes, I was. I remember that episode. Very, very sad episode.

Jens Ludwig:

Yeah. Very unfortunate. It led to a lot of conversation about how the University of Chicago can be more helpful to the city of Chicago, how universities, in general, can be helpful to cities overall. All of our cities struggle with similar problems. And the thing that great research universities are really particularly good at is using data to solve problems.

So Chicago is a city like many American cities, no shortage of problems. We have a gun violence rate in Chicago that no other rich country outside of the US has anything like. We have longstanding concerns about police-community relations and the list goes on and on and on. Segregation of neighborhoods, poverty, you name it. So at some point as part of our work with the city to try and use data and be a helpful R&D partner, we went to the police department and said one of the things that we can use data to predict... you collect all of this administrative data on your 10 to 13,000 police officer force in Chicago, we think that there might be some signal in all of that data about which officers are on a trajectory that might lead them to harm other people and ruin their own careers. And to the extent to which we can predict this, maybe we can.

Now, I don't think I need to tell you that being the superintendent of police in a city like Chicago is an extraordinarily difficult job. And I think it's like drinking from a crisis fire hose, it's always one crisis after another. And when we pitch them on this idea, they were like, this sounds like it could really be useful, but we are just so underwater on all sorts of other...

And then at the end of 2015... so they basically said, don't call us, we'll call you once we've got the bandwidth to do this. And then unfortunately at the end of 2015, a video was released that made clear the full depth of the misconduct problem here in Chicago of Chicago Police officer Jason Van Dyke shooting a teenager named Laquan McDonald, I think 16 times here on the south side of Chicago. Just a couple miles west of the University of Chicago campus.

And that led to a civil rights investigation of the Chicago Police Department by the Department of Justice, which eventually led to a consent decree between the Chicago Police Department and the Illinois Attorney General. And as part of that, a monitor of the consent decree, the Chicago Police Department, the Civil Rights Division of Department of Justice, they came to our research center and said, we have heard that you guys think that you might be able to predict police misconduct. Would you help us?

And I think one of the reasons I should just also provide some additional context is after the Jason Van Dyke shooting of Laquan McDonald, the Chicago Tribune, I think published a very heartbreaking in hindsight article with Jason Van Dyke's wife who talked about how she could see real changes in her husband year after year as he was on the job working in this really difficult circumstance.

And I think the big picture hope for what we've been trying to do with the Department is to give the Department some of the same visibility that family members and friends have when an officer is really struggling, that we can sort of step in and do something supportive and productive and hopefully preventive.

Steven Davis:

Okay. So that's really useful, Jens. So just to make sure I have the timing right, so initially they told you, don't call us, we'll call you. And then, but you'd been running the crime lab or its predecessor for about five years when they finally called you back. Is that right?

Jens Ludwig:

Yeah, we'd been going seven, eight years, something like that. We've done a bunch of different things with this.

Steven Davis:

Okay. So more than five years. But the point is, you were doing this research or working in this space, you and others for a long time, and the opportunity arose and you jumped on it.

Jens Ludwig:

The opportunity arose and we jumped on it. And as I mentioned, there was reason to believe... we can see in lots of other applications that data prediction tools are able to see patterns out in the world that humans can't. And then this interview with Jason Van Dyke's wife and the Chicago Tribune I think also highlighted that some of these signals even are so stark, even people can see it. We just need a way to make it visible to the department.

So what we did as part of the consent decree between the police department and the Illinois Attorney General is we got officer-level data from 2010 through 2018. There's 10 to 13,000 police officers in the Chicago Police Department depending on what year you look at. We have something like 110,000 person-years of data over that eight-year period. And what we basically do is, at a given point in time, we predict a police officer's risk of misconduct over the next two years using data from their all of their on-the-job information over the past five years.

And so we're saying if you just look back over the last five years, can we tell which officers are at above-average likelihood of engaging in misconduct over the next two-year period?

Steven Davis:

Okay. There's one thing I want to clarify about your out-of-sample prediction because it arose as I read the paper, and maybe I got it wrong, but as you just described, when you're trying to predict the misconduct likelihood over the next two years for a given officer, you only make use of that officer's data in the previous five years or something, correct?

Jens Ludwig:

Yeah.

Steven Davis:

But you're actually fitting the model, as I understand it, using data for other officers or in different partitions of your sample that includes the future. Is that correct or did I get it wrong?

Jens Ludwig:

No, yeah, no, sorry. So for people who are not used to these sorts of prediction exercises, one of the big concerns is that you inadvertently fool yourself about how accurate your prediction model is if you're accidentally using data from the future to predict the future. And so what we're basically doing is trying to be very careful in only using information from the past five years to predict what's going to happen into the next two years. So we are-

Steven Davis:

Including other... so the data you fit the model with for other officers that's also only passed.

Jens Ludwig:

Yeah, exactly.

Steven Davis:

Okay, great. Yeah. Okay. So you can just take it as for one reader was struggling to figure out whether that was the case as I was going through the paper.

Jens Ludwig:

Yeah. So we're trying to be very careful in showing what happens with the model on new officers that... we build a model with these officers in this time period, and then we take it over to a different set of officers, get a prediction from this five year period and then say what happens for these new officers that the algorithm hasn't seen before in a time period that the algorithm hasn't seen before as well.

Steven Davis:

That's the key thing. Okay. So yeah, my concern was exactly the one you mentioned earlier that it's easy to fool yourself about your out-of-sample predictive performance if you're not very careful to avoid using data from the out-of-sample period.

Jens Ludwig:

So I might be an unclear writer, but I am a careful statistician, so we didn't-

Steven Davis:

Okay, well you can go back and I didn't get that. I was struggling to get that, the way that you describe how you partitioned a sample, maybe just a sentence to make that completely-

Jens Ludwig:

Yeah, perfect.

Steven Davis:

Okay. But let's go ahead and I interrupted you there, but tell us a little bit more. Well, I want to get two things on the table at the outset. What exactly do you mean by police misconduct? And second, then tell us what are the baseline rates of police misconduct as you define it.

Jens Ludwig:

So there are two different outcomes or two different types of misconduct that we wind up looking at. And we look at these two different types of outcomes or types of misconduct because both are socially important in different ways, and both potentially have different kind of underlying causes or drivers. And so the right sort of prediction model or prediction pattern might be different.

And so the most obvious type of misconduct we look at is on-duty misconduct. And this can range from everything from a minor technical violence... someone doesn't enter their hours correctly that they worked last week or something to conduct unbecoming an officer if a police officer uses disrespectful, rude, profane language with the citizen, up to and including out of policy use of force. So anything that happens on the job that is a violation of department policy.

And we look at that in two ways. One is, we look at every sort of misconduct that generates a complaint against an officer, whether or not it turns out to be founded. And then we also look at it at just complaints that wind up being sustained. And the fraction of complaints that wind up being sustained is much lower and it winds up taking a long time for these things to get sustained. And so if you are only looking at sustained complaints, one of the practical challenges is you wind up sitting on a lot of information about an officer until the very slow investigatory process sort of plays out.

And then the other type of misconduct that we look at is off-duty misconduct. And that's something that we should care about because police officers themselves are people, and if they're struggling, we want to know that so we can give them the... this is a job that exposes people to an enormous amount of deeply traumatic experience over and over and over again. So we know that police officers are at very highly elevated risk for all sorts of mental health problems and suicide attempts and suicide completions.

It turns out in the data, we also see that off-duty misconduct winds up being correlated with on-duty misconduct. And maybe that's not surprising, right? If you have an officer with PTSD, for instance, untreated PTSD, undiagnosed PTSD, you wouldn't necessarily be surprised that that's going to manifest itself in off-duty things like drunk driving or whatever it is and also on the job as well.

And so in terms of the prevalence rates, if you look out over a two-year period, the prevalence rate for on-duty misconduct is about 2%. That is about 2% of the police officers on the Chicago force will have a complaint against them for on-duty misconduct. And that's about 5% for off-duty misconduct.

Steven Davis:

That's the rate at which complaints are sustained or that's the overall rate irrespective of whether the complaint is sustained.

Jens Ludwig:

Oh, sorry, that is the... so for off-duty, let me make sure that I get this right. I think for on-duty, that is the sustained rate and for off-duty, that is the overall rate. And so it's 2% on-duty prevalence rate over a two-year period and 5% off-duty over a two-year period.

Steven Davis:

Let's go back, I want to separate between prediction inputs into the statistical model, and you've already made a case somewhat for why we might want to use more than sustained complaints as inputs into the statistical model. I won't repeat your argument, but you've made that argument already and I suspect you'll say more about that. But then there's the outcome which you're predicting. Are you focused on predicting sustained complaints on-duty or all complaints on-duty?

Jens Ludwig:

So we do the exercise both... so there's on-duty and off-duty misconduct and they're sustained and overall. Let me not say not sustained, there's sustained and all complaints for either on-duty or off-duty misconduct. And so you have four possible accommodations. We do the prediction exercise all four ways. The results are qualitatively similar in each of the four cases. And so I can talk a little bit more about the [inaudible 00:17:29] different ways.

Steven Davis:

We probably won't have time to go through all elements of the two.

Jens Ludwig:

Yeah, no, I'm not going to. But just the-

Steven Davis:

What do you want to focus on? What do you want to focus on?

Jens Ludwig:

I want to focus on sustained on-duty complaints. And that's the 2% prevalence rate over two-year period. And the reason that I want to focus on sustained on-duty complaints is sustained on-duty complaints wind up being much less strongly correlated with different measures of police officer activity than overall complaint rates are. And so it looks like, part of what's going on is that when police officers are assigned to busier police districts, they wind up accruing a bunch of complaints that are disproportionately likely not to be sustained. That seems like it's not really the thing that we're trying to identify.

And we don't have quite that same concern with off-duty complaints. In fact, there are fewer off-duty complaints against police officers. And so looking just at sustained complaints off-duty, just there aren't enough positive Y=1 cases there for us to successfully predict.

Steven Davis:

Okay.

Jens Ludwig:

So that's the 2% and 5% two-year prevalence rates.

Steven Davis:

Okay, great. All right. So how do you go about trying to predict sustained on-duty complaints and the overall off-duty?

Jens Ludwig:

And maybe one high-level point that I want to make here is that I think one of the big differences between what we're doing in this paper and what police departments normally do in practice is to recognize this question, which officers are at elevated risk for misconduct to recognize that as an empirical question? So a standard thing that the Department of Justice does with police departments when they adopt a consent decree is to require what's called an early intervention system that has prediction of misconduct as part of it.

But the way that that prediction, often that prediction model or prediction model often gets determined is the police union and some negotiator for the police department will sit in a room and negotiate over what behavioral patterns in the past rise to the standard of setting the flag up so that someone does something. Right.

And I think to economists like us, we look at that and say, what in the world is happening here? And we can see in our data that way of doing things just does an incredibly poor job of identifying which officers are really at elevated risk. And one of the interesting, I think reasons for that is I think there's a very deep behavioral science explanation for that, which is that human intuition about patterns in the world is very, very poor. And if you think about even a police supervisor will only see a handful of serious police misconduct cases over their career and then a bunch of low-level stuff that's just going to fade into the background, it's going to be very, very difficult for any police professional to be able to see these patterns across the whole department over extended periods of time just from sort of casual empiricism looking around.

I think most people in policing, and you can see this in negotiation of the quote prediction models that are used in these early intervention systems, most people gravitate towards the understandable but wrong intuition that the best predictor of a serious misconduct event in the future is a serious misconduct event in the past. Does that make sense?

And it turns out that when we use... so what we do instead is we build a statistical model using methods from a part of artificial intelligence called machine learning where essentially we, the researchers, are not making any assumptions about what things in the officer's past are most predictive. We're just letting the data itself tell us what patterns are most predictive.

Steven Davis:

If I understand correctly, there is an assumption here which I want to put on the table. You can tell me if you think I've got it wrong. But, you are in your analysis treating sustained assessments of police misconduct as true misconduct that's indicative of true... not all true misconduct, but it seems you aren't allowing for the possibility that some sustained misconduct might be falsely sustained. It's not really misconduct. Have I got it right?

Jens Ludwig:

Yeah, I think that you are right that there is an assumption there. I think that the sustained rate is... I'm trying to remember the last time I checked, the sustained rate is less than 5%.

Steven Davis:

Yeah, it's very low. That's quite striking. But the reason I flagged this, it seemed to have gone unstated through the paper and it may be a perfectly reasonable assumption in your context. But there are other contexts where it might not be such a great assumption. It's just important to recognize.

Jens Ludwig:

Fair enough. What I can say is that most people who've read the paper have been much more worried about false negatives than false positives. That is to say, they're much more worried about actual-

Steven Davis:

I'm sure there are many false... Here's a larger point I want to make on this. I'm sure there are many false negatives in your setting, but the prevalence of false negatives and false positives is going to depend on the incentive environment.

Jens Ludwig:

Yeah. Let me come back to that important point in one minute. Because as you know, that is something that we think about in the paper and try and address to the degree that we can. Well, here's another sort of deeper point that you're making, which of course is completely true, which is we don't know the truth, we only have the data. And the data are an approximation of reality but not reality itself. And there are going to be false positives and there are going to be false negatives. And the only thing that we can do is interrogate the data as cleverly as we can to try and get some sense for the relative magnitudes of those things and account for it the best we can. But at the end of the day, that's what empirical work is.

Steven Davis:

I agree with all, we can do all that. But as economists, I think there's at least one more thing we should do, which is to emphasize to people that the prevalence of false positive, false negatives in a given data set, a given setting may not necessarily be a good guide to another setting. And there are reasons to think that both the incentives facing the police officers, their supervisors, their whatever the accountability mechanism is, and even the local populace, how they view the police, those are factors that can change over time in space and that can affect the extent to which the data accurately reflect or inaccurately reflect the reality.

Jens Ludwig:

Yeah-

Steven Davis:

That's another level of challenge in this-

Jens Ludwig:

I totally agree. So let me spend a minute talking about what we did to explore this false positive, false negative issue in the outcome variable. And then I'll circle back to what actually predicts misconduct. I think people who work in local government in Chicago, and not just Chicago, probably every city around the country, whatever agency they work in, they will almost surely say that they're underfunded. And the people who work in the agency, it's had different names over different times, but the agency that investigates allegations of police misconduct, they were very much in this camp of saying, we are understaffed, we're overworked, we need more resources, and so on.

And on the heels of the release of the video of Jason Van Dyke shooting Laquan McDonald on the south side of Chicago, the operation of that department changed dramatically, you can see in the data. There was a big increase in the sustained complaint rate. And I think most of the readers of the paper look at that and say that is on net, a reduction in false negatives, there could be some increase in false positives as part of that as well.

So it's clear that there was a change in the incentive for the investigatory agency and a change in the incentives for the city to put more resources into that agency and help them investigate more quickly and more thoroughly than they had. We see this big increase in the sustained rate. And what we can do is a way to get a sense for how sensitive our analysis is to false positives and false negatives is we can build statistical models in the low sustain environment and export them into the high sustained rate environment and vice versa and see to what degree there's overlapping signal.

Because I think a very reasonable sort of concern that you might have here is maybe what looks like predictive signal in the statistical model is really just an artifact of something, something going very wrong in the police administrative data. These are not constructed as research data. This is the actual operational data that the police department collects. And if there's something deeply wrong with that, there's going to be something deeply wrong with the statistical model.

And so as best we can tell, even when you predict in this sort of low sustained rate environment, those predictions wind up capturing useful signal when the data quality we believe on net gets better. And that gives us a little bit of confidence in what we're doing.

Steven Davis:

Yeah. So I thought that was one of the most interesting parts of the paper. The messages you took away from that... First of all, we should tell the audience that in this context where the sustained complaint went up quite a bit, it doubled or more than doubled as I recall, this was happening in a context as you make clear in the paper where other measures of the use of force and police activity level and so on, were kind of on a downhill slide.

So I think you make a compelling case in the paper that this is predominantly, this increase in the sustained complaint rate is most likely due to a reduction in false negatives. Okay. So I think you make a good case for that, I buy all that. But there's a larger message that I took away from this episode which is it's a reminder and one that is worth stating explicitly that both the environment defined as actual misconduct, true misconduct, which we aren't likely to observe perfectly, but the relationship between measured misconduct or inputs that help us predict misconduct and actual misconduct are changing over time. There's this non-stationarity. And this will always be a challenge when we try to implement a prediction model in practice.

Jens Ludwig:

Yeah, let me, in fact, double down on what you just said, I think you mentioned this before as well, is it's not just that the relationship between the inputs and the thing and the misconduct that we're predicting changes over time, but it very well may differ across place as well. And let me come back to that important implementation point at the end and let me just say a little bit more about what actually predicts.

So as I mentioned as a common feature of these consent decrees with police departments, they have to implement these early intervention systems. A lot of this is sort of negotiated in a smoke-filled backroom or whatever it is. A lot of this winds up being based on the intuition that if we're trying to predict serious misconduct in the future, we should be counting the number of serious misconduct events that an officer has in the past.

And it turns out that the strongest predictor of serious misconduct in the future is simply the number of complaints against an officer in the past, right? And I think that is a really, really important insight and an important cautionary tale against this kind of purely political human intuition, whatever you want to call it, sort of standard practice that we have in policing.

I think one of the big lessons here is that taking the data and the statistical opportunities that we have in using the data seriously in an environment in which cities are looking at terrible budget problems over the future and everybody's really worried about doing everything that we can to improve policing, most police departments already have these big electronic data infrastructures. They're just not using the data. This is a dirt cheap, relatively speaking way to try and get some more social good out of all the money we're already spending on the police department.

Steven Davis:

Yeah. So I take the point but and this is part of your analysis in the paper, which I appreciated where you're asking, okay, we have a bunch of potential predictors of police misconduct, which ones seem to do a lot of work. And you made the point that it's overall complaints actually outperforms sustained misconduct, sustained serious misconduct. What I didn't see in the paper, and if it's there and I missed it, you can tell me. But what I didn't see that I wanted to see is to push to the next step and to say, okay, among the things that are in these administrative data sets that have high value as predictors, which of them are less susceptible to manipulation in another setting?

So you can imagine a setting where if it becomes known in a particular setting where there's a hostile relationship between the police department and some segments of the local community, which we've lived through that rather intensively in the last few years so this is not just some made of hypothetical, but if there's hostility and it's known that there's a prediction algorithm that the police department is using that will target certain officers if they're subject to a certain number of complaints, there are groups out there that will organize to make the complaints.

Now when I think about the complaints, there's everything from claims that the officer engaged in verbal harassment. That might be pretty hard to investigate with a clear definitive resolution as opposed to complaints about the excessive use of force with a firearm. Well, that strikes me as one that's harder to manipulate from the outside because at least you can figure out whether the firearm was fired and all that kind of stuff. So I didn't see that in the paper and I'd like to see it. I'd like to see you try to make this distinction between predictors that are resistant to manipulation and those that aren't.

Jens Ludwig:

It's a great point. And I would say that that is the frontier science question to be solved next sort of beyond the scope of... but we've already given. So let me say a couple things about that. One is, it's a super important point and I totally agree that this needs to be figured out as part of realizing the full potential of this. But as I say, right now, standard practice is very far even from recognizing this is an empirical data question.

Steven Davis:

Understood, understood.

Jens Ludwig:

Let's try and move us in that direct.

Steven Davis:

One step at a time.

Jens Ludwig:

One step at a time. And then I think unfortunately... there's a very large number of ways that most of these sorts of measures can be manipulated inside and part of it is changes in the incentives and strategic behavior of people out in the community, but also a lot of these complaints are generated by supervisors internal to the department as well. And that also is gameable. So it's just a complicated sort of strategic mess in a world in which they're not using the data for very much, we think that we've got sort of a reasonable look. But you're right that moving forward if you start to do this, you would worry.

My guess would be that the solution to this at the end of the day is probably going to involve making much greater use of the body-worn camera footage that police departments collect right now. Because I think that at the end of the day is probably the richest source of administrative information the departments have that would genuinely be difficult to manipulate. You always have to worry about officers not turning on their cameras and police departments, but conditional on officers using their cameras the way they should, that's a great source of information, hard to manipulate. Artificial intelligence is just getting better and better over time and making use of that. I think that's where this whole thing has to go.

And I think I hope that our paper convinces the field that this is very much... that there's enough useful signal even in the lower dimensional data, the less rich data that we've been looking at to make it worthwhile to do something really I think potentially socially useful with the body-worn camera footage.

Steven Davis:

Right. So on that point, we should make it clear to your audience, the audience and your paper does that there is quite a bit of predictive content in the statistical models you build in the sense that if you take the top 5 or 10% of officers with the highest risk assessment according to your model, depending on which one of the misconduct measures we're looking at on-duty or off-duty, those officers in that high-risk category are six to eight times as likely as I recall to engage in some serious misconduct over the next two years as the average officer. So that's like the good news that comes out of this. And maybe I was too quick in jumping on some of the challenges going forward, but there is that.

Jens Ludwig:

There is that. Exactly. If you look at the top 1% of the predicted risk distribution, for on-duty, it's like 2% of all officers engaged in on-duty misconduct over two-year period and it's like 12% among the very top of the predicted risk distribution. And for off-duty, it's like 5% for all officers and 30% for the top 1%.

And so in a world in which departments don't have enough money to provide support services to every police officer, making sure that we are at least giving those support services to the people at the very top of the predicted risk distribution... this is not minority report where we're punishing people for the prediction of something in the future. Ideally, this would be giving people supportive services that would make their lives better, save their careers as police officers, and also produce some behavior that's harmful to people in the community.

Steven Davis:

I see that, but to be fair, in some cases, the right resolution might be to pat these people on the back and tell them this is not the right line of work for them.

Jens Ludwig:

Yeah, well, I think legally you can certainly do that in a backward-looking way if they've engaged in behavior. I'm not a labor lawyer, so I don't know what you can do in a forward-looking sort of prediction way. I'll leave that to the-

Steven Davis:

Well, that's a legal question. It's a tricky matter. But when we are talking about someone who is empowered to engage in the lawful use of force with firearms and so on, it's not clear to me we want the same standard as we do in general. But that's a question for-

Jens Ludwig:

Maybe the only other thing that I would just say on that point, Steven, if I can, my dad, when he worked at RCA in Camden, New Jersey and they wanted to rightsize their workforce and offered people very generous severance packages. And that would be another way that you could imagine you have someone with a 12% risk of engaging in on-duty misconduct in the future. There are lots of different ways-

Steven Davis:

They don't have to necessarily be punished.

Jens Ludwig:

Yeah. So different ways of solving the problem. I totally take the point, yeah.

Steven Davis:

Take that point. So we stated there's the positive, there's a lot of predictive content including predictive content in pretty simple versions of your models, simpler versions. You don't need the state-of-the-art rocket science machine learning model to get a lot of value. You make that predictive value, you make that point in your paper. You also make the point, important one, we should get it on the table is even though the model is successful at predicting a set of officers who are disproportionately likely to engage in misconduct in the future, most misconduct episodes, most sustained misconduct on-duty episodes, and I gather just most off-duty misconduct complaints in general are accounted for by officers who don't show up in this particularly high-risk category. So it's diffuse. The incidence of serious misconduct is diffuse.

Jens Ludwig:

Yeah, exactly. And the way that I might say that is, so people in the top, like 5% of the predicted risk distribution account for about a fifth of all misconduct. So it's clear that they're accounting for a very out-sized share of misconduct. It's clear that this goes back to your first or second stipulation. It's clear that there's a huge amount of social good in making sure that we prioritize support services for those high-risk officers. It's also clear that this is not a panacea for the problem. This is one part of a larger effort to address the police misconduct challenge.

Steven Davis:

So I'm curious, what has been the reaction? I presume you've been in extensive consultations before the fact and after your study with and while you're doing your study with people in the Chicago Police Department. I'm curious what's their reaction to it.

Jens Ludwig:

Yeah, I think the Chicago Police Department, everyone here recognizes that this is the future because of the consent decree. I think one of the interesting questions is how other departments around the country think about this, where this is like opt-in for them if they want to prioritize this. And I think this goes back a little bit to some of the implementation points that you raised before.

So I think to me one of the most important practical findings that we had in the paper is that even a very simple sort of predictive model like rank by complaints winds up being as predictive as it is because it means that even small police departments can take advantage of prediction. We built this prediction model for the Chicago Police Department, we're the third or fourth-biggest city in the country. It took us several hundred thousand dollars of private philanthropic support and a bunch of data scientists and lots of time to make sure that we got everything right.

You can do that for a city of 2.7 million people, you can't do that for Oak Park that has a population of a few tens of thousands of people. And so the fact that something is simple... and that's really important because something like 60% of all police shootings in the US every year happen in these smaller departments all around the country. So we want to make sure that we have some way of identifying risk even in the large number of small departments. And I think this gives us a practical thing to do there.

And I think the other kind of thing that has come up over and over again in our conversations with different departments around the country goes to a different point that you raised before, which is to what degree is the world itself the same versus different across different cities and how fast is the world changing over time?

Because if you think about building a machine learning model, this is a thing... the great thing about technology is there's a big fixed cost and once you've incurred that big fixed cost to build a piece of technology for something like an algorithm, the marginal cost of running this are super low if not close to zero. And so if I can build this model once in Chicago and then use it at the extreme in every other city around the country forever, you amortize those fixed costs across a bunch of different cities in perpetuity. And this is an insanely cost-effective thing.

If on the other hand, either the world is changing very rapidly so we need to rebuild this very frequently, or if the cities are very different from one another with respect to incentives or institutional context or whatever else, then every city needs to build its own model. And then I think sort of the power to fully capitalize on what the technology and principle would let us do becomes a very different sort of cost-effectiveness calculation. And I think that's in the category of to be determined. I don't think that we know the answer to this. That's part of the next chapter to be written in this story.

Steven Davis:

Right. So I guess I'm still groping. If I'm the police officer on the street and I know that I might get flagged by a prediction algorithm and I'm thinking about... it's going to really matter to me that this thing doesn't unfairly or inaccurately peg me as requiring some kind of intervention because no matter what you tell me, I'm going to interpret that as that's kind of a negative mark on my career progress. You can tell me that no, it's just going to maybe feed... I'm not going to believe you.

So there's this. You had some discussion in the paper. It's back to this change in the data-generating process. I'm using a jargon here, sorry for that. When the sustained misconduct went right way up, that's what I meant by a change in the data-generating process. It's clear that the environment had changed in a way that affected the fraction of complaints that were sustained.

So in the course of that analysis, and you reviewed some of this earlier, you showed that the prediction models built on data before this change in the nature of the frequency of sustained complaints was much less predictive, was much less successful in predicting actual misconduct than models built on the later data. And you kind of took the good news, you took the glass half full perspective and you said there was still a lot of predictive content even in the earlier period with maybe lower quality data in terms of screening out... be careful here, data that were more accurately identifying true misconduct. I think that was how you interpreted it.

But the glass half empty is if I was an officer that got flagged by this kind of crummier statistical model in the earlier case and inappropriately identified as needing an intervention, I'd be pretty upset and I'd do everything in my power to avoid getting flagged that way. So I'm worried here about you need a really good statistical model and you need people to be confident in it if you don't want it to distort their behavior. What are your thoughts about that?

Jens Ludwig:

Yeah, and I think that I agree with everything that you just said and I think maybe the third condition that I would add to what you said is I think that there's also a lot of value in trying to communicate what is going on underneath the hood to frontline police officers so they understand. And part of that is building an internal system that really to the extent to which you genuinely can makes it supportive rather than punitive. There will always be a question of what officers believe. But also with respect to things like if you're worried about disincentives as you should be, making sure that officers understand that this is not a high-activity police officer flagging system. That your likelihood of being flagged is not driven by you making lots of arrests and getting lots of guns off the street. This is not what that is. And I think... Yeah-

Steven Davis:

That's a critical case to be made and it could be a challenging one to be made because people-

Jens Ludwig:

It's a-

Steven Davis:

All kinds of rumors will float around as to what this algorithm's doing.

Jens Ludwig:

It is a factually true argument, but, exactly, whether you can effectively communicate that to frontline police officers and get them to really believe it, I think is one of the other open questions.

Steven Davis:

I want to close the conversation here, but before we do that, let me see if there's anything really important from your paper you don't think we've touched upon yet that you really want to get on the table.

Jens Ludwig:

No. Maybe the one other thing that I want to just emphasize is the fact... so I think the thing that's driving the policy conversation is concerns about on-duty misconduct. And I think the fact that in our data we're seeing that off-duty misconduct risk is correlated in a very meaningful way with on-duty misconduct, I hope gets us to think about this as an issue of all of this is sort of the... preventing misconduct and officer wellness, these are flip sides of the same coin.

And I think that whatever side you are on in the current sort of political whatever, I think that there really is an opportunity for sort of win-win here to protect city residents from misconduct that nobody wants and also to save officer careers. And in a city where it's so heartbreaking, however many months in Chicago to read about the latest suicide by a Chicago police officer to prevent that officers losing their lives and families being devastated, I think this really is something that should sit in this sort of a bipartisan sweet spot in that criminal justice conversation.

Steven Davis:

Yeah, I think that's a great point and that is a nice theme that comes out of the paper. And the way I think about it is a lot of the cost of the off-duty misconduct falls on the officer, him or herself, and their families. So if you care about that, which many people do often maybe on a different part of the political spectrum, then this is another reason to try to predict the risk of this misconduct and head it off through interventions, which as you say in many cases are about officer wellness. It's not that these people necessarily entered the business to do bad things, it's just extremely stressful and sometimes people do things that they wish they hadn't and that we wish they hadn't.

Jens Ludwig:

Yeah, exactly. Exactly.

Steven Davis:

All right. Jens, thanks so much. Really super interesting line of research. The paper's great. And I'm impressed by what you're doing along these lines. There's a lot more to do, but I'm sure you'll keep at it.

Jens Ludwig:

Yeah, thanks so much, and maybe version 2 in a couple of years we'll come back and report on how we've been able to push this forward.

Steven Davis:

Okay, that'll be great.

Jens Ludwig:

Thanks so much for having me on, Steven. A real pleasure.

Steven Davis:

Okay, thank you, Jens. Bye-bye.

Jens Ludwig:

Thank you. Take care. Bye-bye.

Show Transcript +

ABOUT THE SPEAKERS:

Jens Ludwig is the Edwin A. and Betty L. Bergman Distinguished Service Professor at the University of Chicago, Pritzker Director of the University of Chicago’s Crime Lab, codirector of the Education Lab, and codirector of the National Bureau of Economic Research’s working group on the economics of crime. His research is published in top scientific journals across various disciplines, including the American Economic Review and New England Journal of Medicine.

Jens co-founded the Crime Lab and the Education Lab, which help government agencies use insights from behavioral science and data science to address real-world problems. Examples include working with the Chicago Police Department to implement data-driven management changes to reduce gun violence, and partnership with the Mayor’s Office in New York City to help build and implement a new pretrial risk tool as part of the city’s goal to close Riker’s Island.

Ludwig holds a BA in economics from Rutgers and an MA and PhD from Duke University. He was a public policy professor at Georgetown University and is on the editorial board of the American Economic Review. In 2012, he was elected vice president of APPAM and received the David N. Kershaw Prize in 2006. He was elected to the Institute of Medicine of the National Academies of Science in 2012.

Steven J. Davis is the Thomas W. and Susan B. Ford Senior Fellow at the Hoover Institution and senior fellow at the Stanford Institute for Economic Policy Research. He is a research associate with the National Bureau of Economic Research, consultant to the Federal Reserve Bank of Atlanta, advisor to the Monetary Authority of Singapore, past editor of the American Economic Journal: Macroeconomics, and an elected fellow of the Society of Labor Economists. He co-founded the Economic Policy Uncertainty project, the US Survey of Working Arrangements and Attitudes, the Global Survey of Working Arrangements, the Survey of Business Uncertainty, and the Stock Market Jumps project. He co-organizes the Asian Monetary Policy Forum, held annually in Singapore. Previously, Davis was on the faculty at the University of Chicago Booth School of Business, serving as both distinguished service professor and deputy dean of the faculty.

RELATED RESOURCES:

About Hoover

Fellows

Research

Commentary

Support Hoover

What is MyHoover?

Forgot Password

What is MyHoover?

Support Hoover

Make a Gift

Can We Predict Police Misconduct?

Can We Predict Police Misconduct?

Join the Hoover Institution’s community of supporters in ideas advancing freedom.

About Hoover

Fellows

Research

Commentary

What is MyHoover?

Log In to MyHoover

Forgot Password

What is MyHoover?

Log In to MyHoover

OR

Forgot Password

Support Hoover

Make a Gift

Can We Predict Police Misconduct?

Can We Predict Police Misconduct?

Join the Hoover Institution’s community of supporters in ideas advancing freedom.