Using Statistics to Advance Social Justice & Human Rights @StatMegan @hrdag (Episode 47) #DataTalk

– Hello and welcome to
Experience Weekly Data Talk Show featuring some of the
smartest people working in data science today. Today we’re talking with
Dr. Megan Price about how data scientists are using statistics to advance social
justice and human rights. Just a little background
on Megan, she received her Bachelor of Science and
Master of Science degrees in Statistics from Case Western
Reserve University and then went on to get her PhD in
Biostatistics at Emory University. Megan, it is a pleasure to have
you on our broadcast today. – Oh, well thanks so much. It’s fun to be here. – So, I always like to start
off these episodes by asking our guests, how did you get
started with data science? What did your journey look
like, both academically and then what you do now? – Yeah, so I had a fairly linear
approach to becoming a data scientist because as you just
described all my degrees are in statistics or biostatistics
and I was always a math nerd. I was a math nerd my whole life. My fourth grade science
fair project was about math. – That’s awesome. (laughing) – Yeah, it was. I have to say, I still
think it’s kind of neat. And so I always knew I wanted
to do something with math and with data analysis. And I was very fortunate when
I was in undergrad to have some really great mentors
who exposed me to some statistical consulting
work in public health, looking at clinical trials,
looking at various treatments. And so when it came time to
look at more graduate school, they really advised me,
they said, we think that you might enjoy a public
health school experience more than just a straight PhD program. And then it was in
public health school that really everybody I
interacted with wanted to do some kind of social justice work. And so I was really
surrounded by this idea that that was the best use of our skills. And then it was there
in public health school that I first learned
about my organization, The Human Rights Data Analysis
Group, and specifically the work using statistics
to analyze human rights. And that was kind of the end of the story. I was very fortunate from there to go on and get the job that I have now. So I’ve been very lucky to
always get to work in this field. – That’s really cool. So I’m curious about when
you were working on your PhD what were some projects you worked on that were really exciting to you? – Yeah, so my advisor in my
PhD program worked primarily on clinical trials that
involved stroke victims and victims who’d had
traumatic brain injuries. – Hmmmm. – And so that was largely
what I was studying for my methodological work. I mean the substantive question
was, what kind of treatment are these patients getting
and is it working and how can we use our statistical analysis
to answer those questions? – Wow, it seems like there’s
obviously the math component to it but then there’s also,
as a researcher, there’s also the emotional side that you
wanna find out solutions, right? – Yeah, yeah, and I think
that grad school was really one of the first places
that I learned that lesson, both of how important it is to care about the substantive question
and to tackle important substantive questions but
then also how to balance that with the day-to-day technical
and methodological work. Because you do have to be
able to come in everyday and analyze that data and
try and answer your question using your best technical skills. And so finding ways to balance
that and to cope with that I think started in grad school. – How, as you’re like
getting into the data, I mean these are people’s lives, so how did you kind of
separate, like not get to close, you know what I mean,
like not get to close and be objective during
that whole process? – Right, yeah, it is a tricky balance because you never wanna forget
that these are people’s lives but you do need to stay objective. And in grad school I would
say that I really just learned from my advisor and I sort
of modeled after her work. And then once I started my
current job, I come back again and again to this quote from my colleague in HRDAG’s co-founder, Dr. Patrick Ball, that we have a moral
obligation to do the best work that’s technically possible to do honor to these individual lives who
are represented in our data. And so for me that’s really
what I come back to every day when I’m kind of down in the
weeds of a technical problem or a methodological problem,
is remembering that focusing on that piece of the
work isn’t disrespectful and it isn’t putting distance between me and what the data is about,
it’s enabling me to do the best work that I can to respect the individual lives
that the data are about. – Was it difficult for
you to leave academia to then begin to work for
non-profit or did you feel like what you were doing in
academia transferred over so easily you felt like you
were doing the same thing? – Yeah, so I would say
that it was not difficult for me to leave because I was
the kind of person who knew that I was never going
to stay in academia. – Okay, okay. – So I knew kind of from
the get-go that I was not on a path to become a professor. But on the other hand it was
not a seamless transition. I definitely did not feel like
I was doing the same thing. But the way that I
think about grad school, the skills that I think
grad school gives you, are ways to think and
ways to learn things. And so even thought the work that I do now is methodologically quite
different from the work that I did in my graduate research. Everyday I’m using the skills
that I developed there, learning how to read papers
and figure out if some new technology or some new
method that someone has developed applies to my problem and then how to apply it to my problem. And so that, I think,
was a direct transfer. – And I think what’s also
interesting is that I think, you know, in this show
I’m able to interview lots of different data
scientists and I feel like, as a data scientist, you’re
constantly a student. – Yes. – Like you’re constantly learning. So you’re always in grad
school, it never ends. – Yes. No, I think that’s very true. And I think that that is part of what makes it a good fit for me. It was something that I
worried about a little bit when I was in grad school
because I did school for so long and I was really good at
it and I really liked it and I did kind of worry a
little bit if I was not going to formally stay in school
and be a professor, how was I gonna go do something else? And so, yeah, that feeling
that I’m always learning and that I’m always in
school is comforting. – That’s awesome. So you know, today’s
topic is around advancing social justice and human rights
and how you’re leveraging statistics and data science to do this. Can you share some different use cases, some things that your
organization has done or work you’ve done in the past to
help advance social justice? – Yeah, so I think that the most visible and directly linked way
that our work influences social justice is when
we’re asked to testify in court cases as expert witnesses. – Hmmm. – And so I think those are
some of the clearest examples and most of this work has been
done again by my colleague, Dr. Patrick Ball, he typically
is the one who testifies and presents our analysis in court cases. And so some of the ones
that have, I think, the best outcomes are in 2013,
he testified in Guatemala in the case against
General Efraín Ríos Montt, who was the de facto leader
of Guatemala in the early ’80s and was charged with acts of genocide. And Patrick presented some
analysis that our team did that showed specifically that members of the Mayan population in
Guatemala during this time period in some specific regions
had a five to eight times higher risk of being killed by the army than non-Mayan members of the population. And what the lawyers argued was that that statistical pattern to the violence is consistent with ethnic
targeting and with genocide. And ultimately, the judges agreed, the judges found Ríos Montt guilty. And in fact, they referenced Patrick’s analysis in their verdict. And again, this is one of
the most positive possible outcomes for us because what
the judges said was that the analysis Patrick presented
confirmed numerically the stories that the individual
victims were telling. And we really view that
as our role, is to affirm and to amplify the voices of
victims and victim communities. So we’re one piece of the
puzzle, but we’re hopefully a piece that strengthens
some of that advocacy work. So in the case of
Guatemala, unfortunately, that verdict only stood for 10 days. The Constitutional Court
in Guatemala overturned the verdict on a legal technicality. And a retrial is still working it’s way through the court system. So we’re all still waiting to
see what’s gonna come of that. – So, I’m curious as you are
working on a case like that, you’re dealing with a
lot of sensitive data. – Yes. – You’re dealing with people’s
lives, you’re dealing with. So I’m kind of curious about
how, as you’re working on these very sensitive issues, how
are you kind of protecting the privacy of that data, at the same time being transparent with the courts? Can you kind of share a
little bit about that process ’cause I think a lot of data
scientists who are listening to you would be very curious
like, okay, how does this work? – Yeah, so we have very
strict in-house protocols around keeping the data secure and they’re largely what you would expect. All of us have encrypted machines. We keep the data locally on our own server that’s also encrypted. All of the conversations, all
of the movement back and forth between our machines to do analysis and the server where the
data is stored are encrypted. And so that’s our in-house
data protection policy. And then in terms of transparency,
we do worry about that a great deal because all
of our work has to be transparent, replicable, auditable. But for us that’s much more
about the code than the data. And so we write all of our
code in Open Source platforms so that anyone an run it, can
access it, can interpret it. And we do all of our coding and all of our version controlling of
code through GitHub. But all of the data, again, stays locally. So we separate out those two pieces. And then when it comes to
things like court cases and things we might need to
share with lawyers or judges, I mean, that all has
very specific regulations depending upon the specific jurisdiction. And similarly, each of
our projects has different parameters depending upon, in most cases, we have not personally collected the data, in most cases we partner
with a local organization that’s done the data collection. And so then we’ll write
a data sharing agreement with that organization
that not only outlines the security measures
that we’re gonna follow to protect their data, but
also the terms under which we perhaps would share their
data in circumstances like if a lawyer or a judge needed to see it. Or in other circumstances
we might have negotiated our partners might be
interested in research that other groups we work with are doing. And so they might be comfortable
having their data used for research projects or they might not. So it really depends on the
specific case and the details of the data and what
they’re being used for. – Yeah, and I’m curious also,
how do you prep the attorney that you’re working with
because you’re working in statistics, you’re working
with algorithms and models and you have to be able to
translate what you’ve done to help the lawyer build the case who may not have a
strong background, right? In the maths, but he needs
to be able to defend it because the other lawyer,
the opponent, might have some criticism about what you’ve
done to manipulate the data or say anything to manipulate the data. – Mmm hmm, absolutely. That’s one of, I think, one
of our biggest challenges and something that I think over the years we have gotten significantly better at. And one piece of that
is just the conversation with whatever lawyer we’re
testifying for or with. And that conversation, I would
say, is a lot like any other consulting conversation
that any data scientist has. Where you have that sort
of iterative conversation where a partner or a
client or a colleague says, I think I have this question. Or I think have this data. And you say, well, you
could probably answer this other question or if this is
the question you wanna answer this is what the data
would have to look like. Or the kind of data you
would have to have access to. So that very familiar kind
of in depth conversation to help them understand
the link between their substantive expertise and
their substantive questions and our methodological expertise and to bring those together. So that’s one piece of it. And then another piece of
it obviously is the judge who we don’t have a chance to have those kinds of conversations with. And so preparing through
practice, ourselves and the lawyer we’re working with, to
present our analysis in a way that’s very accessible and
very readily understandable and very specifically applicable. Again I mean, legal questions
are so specific and so narrow so making sure that our
analysis are applicable to those questions is another
thing that we’ve really worked on and gotten better at I think. And then the last piece is
there is an ongoing challenge for us and for anyone
introducing evidence, specifically in a courtroom,
about what gets considered to be evidence and how expert witnesses are evaluated and are determined. And this was something that
we actually struggled with at the International Criminal Tribunal for the former Yugoslavia
where the defense brought in their own expert with a
different specialty who, in our opinion, was not
qualified to evaluate the analysis that Patrick was presenting. But that’s really up
to the judge to decide who is considered an expert witness and who gets kind of that weight. And that’s, I would say, slightly
outside of our wheelhouse, but it is something
that we worry about is, how can we help to strengthen
this community more broadly by helping judges and lawyers
to establish standards around what get introduced as evidence
and how it gets evaluated. – Yeah, I think just
hearing you talk, I mean, the terminology maybe doesn’t
translate so you have to do a really good job of kind of
explaining how all this works, maybe even providing
visuals, data visualizations to help explain and also
thinking like a lawyer. Like okay, what could be
arguments against my data, right? – Yes, yes. Yeah, absolutely. – So are there any other, oh,
I think I just lost Megan. Oh, there you are. You’re back. (laughing) – Technology. – I know, technology. So are there any other
use cases that you like, examples of how data science has been used to help resolve some society’s ills? – Yeah, so actually, one
other example from our work and then a couple of others
that are not from our team that I’d love to talk about. One is, some new work
we’re doing where again this is work Patrick is
doing in Mexico using machine learning models to
predict municipios in Mexico that are likely to contain hidden graves that have not yet been discovered. – Wow. – Yeah, and the thing
that’s so interesting to me about this particular
project is for our partners working on the ground, when
we show them the results of the model and said,
these are the municipios that we think probably
contain hidden graves, all the experts kind of said,
well that’s not surprising, that’s where the violence is happening. And in this case, that’s
actually the most valuable result is the fact that it was not surprising. Because it’s given the
advocacy groups another tool to go petition the
government to go investigate and look for these graves. Because understandably,
in the midst of upheaval, it can be very difficult and
often very dangerous to go look for and to go investigate
these potential hidden graves. And so we’re in the midst of that project. I don’t have sort of a good
conclusion for you about success on that but I think it is a
really interesting component to the advocacy when these groups can say, but the statistical model also
says we should go look here. – Yeah, no doubt. I love that.
– Yeah. And so then the couple of
other examples that are not our work but that I would
love to draw attention to is Microsoft research actually
did some really cool stuff a couple of years ago
around flash flooding. And looking at the gauges
that are in rivers and streams and that sometimes can get
overwhelmed when there’s a flash flood and so instead
of triggering a warning, they instead say that the
water level has gone down because they’re overwhelmed
and they’re broken. And so Microsoft Research
developed these machine learning models to leverage the input
from neighboring streams and the sensors that were still
working in the other streams to get much better data and
much better warning systems about when there might be flooding. So I think Microsoft Research
is actually doing some really. – Awesome. – Yeah, like some unexpected social good. – Yeah, that’s wonderful. That’s wonderful. Oh, and we just got a question, let me put it up on the screen, from Christina asking about
the first use case about what are hidden graves? – Sure, so they are
basically any unofficial, so not inside a cemetery
or other sort of place that you would expect to find graves. And they are unmarked in
some way so essentially to talk about them kind of bluntly, it’s basically a place
where bodies get dumped. – Hmmmm. – It’s a place where
there’s more than one victim and it’s unmarked and it
is in some way undiscovered or hidden in some fashion. – Are a lot of those from
wars, from illegal activity? – Kind of all of the above. In Mexico in particular, it’s
very difficult to tease out drug related violence and other
sort of political violence and who the perpetrators might be. So yeah, it’s kind of all of the above. – And it’s fascinating
that you have leveraged machine learning to uncover that. Can you kind of explain how that happened because that’s amazing that you used a machine learning process
to help uncover that. – Yeah, I mean it’s one
of those applications that I think our team excels at. At recognizing that these
partner organizations, Data Civica, which is a
Mexican, non-governmental organization and
Iberoamericana University, the human rights program at
a university there in Mexico. They had information about graves that had been found this way, so essentially locations of
bodies that had been found and attributes of those municipios. And this is kind of the thing that machine learning excels at, right? Is that humans could maybe look
at some of those attributes and subjectively try to
draw some conclusions but the machine learning
algorithm doesn’t really care what any of that information is. It’s just going to do a really
good job of classifying areas into likely to contain
or not likely to contain. And that’s essentially
what we did was we used the information we had from
these groups about graves that had been discovered
and the municipios that they were pretty sure
did not contain graves. And so that was our classification problem and then we fed all of the attributes that we could possibly think
of about these locations into a random forest and then
we let it predict for us. And it predicted with our testing data it had perfect results. – Wow. – It exactly predicted the testing data. So we’re continuing to update that. Every year we get more
information from Data Civica and Iberoamericana so we’ll see. We may sort of tweak the
model, we may use a different classifier but for right
now that’s how it’s working. – And I’ve got one more
question about that. So int that particular
study, what percentage was structured data versus unstructured? – To be perfectly honest, I don’t know. I’d have to go back and look at it. My guess is that almost
all of it was structured but I don’t actually know. – Okay. So that’s a fascinating case study and then you mentioned the Microsoft one. Did you have another one
that you wanted to share? – No, I think that sort of covers it. – Okay, yeah, those are beautiful examples of how data is used. And I think it’s beautiful
when you see data scientists coming together to work on
problems in society, human rights issues, using data to help
solve it and uncover it. And the examples you just
gave and in Mexico using machine learning to help
find these unmarked graves is a beautiful example ’cause
it helps build the case for reasons why maybe
certain groups need to go in and protect a certain area
to uncover what’s going on. – Yes, for sure. – So I’m kinda curious about
are there any other issues that are kind of like on your
mind, things you’d like to work on in the future that
you think maybe there’s not enough data yet but
you’re thinking, hey, this is something on the horizon that
you would love to work on? – Yeah, a lot. It’s a weird way to
kind of think about this but my wish list is very long. And I would say that at
HRDAG a lot of our projects are in conflict and
post-conflict countries. – Hmmm. – And unfortunately there’s
no shortage of those. And so in terms of things
that I want to work on or things that I’ve been thinking about, basically anywhere in the
world where bad things are happening, I want us to be working. And we can’t for various reasons. But that’s sort of the short answer. The more specific answer is,
we do have a couple of projects where the data we’ve used has
specifically been archives. They’ve been documents left over by various bureaucratic institutions. And I happen to know that in
both Egypt and the Ukraine during various moments of
political upheaval recently documents were abandoned and
I have no connection to these. I know nothing about the
details of what happened to those documents or
where they’re being stored. But I’m so interested, I would
love to gain access to them. I think that that is somewhere
where some of the analysis that we’ve honed on these other projects could really help shed
some light on things that were happening in those countries. And then things that we don’t
have enough data on right now but that I, you know, if
I could wave my magic wand and get data, is really any
questions around sexual violence and human trafficking. I think those are really,
really hard questions to answer quantitatively and even with
current, really creative uses of ways to get data and creative
methods, I just don’t think those are problems that have
good analytical solutions yet. And I wish we did. – What are, I mean, I
know that we have to go in about four minutes, but what
are some of those challenges because that is a huge issue. What challenges right now do you have with gathering that really sensitive data? – Yeah, well you’ve gone right to it, especially in terms of sexual violence, one of the biggest
challenges is that not only is some of the data hidden
because data is always hidden. We always have incomplete or missing data. But specifically some of the
data is hidden because the victims themselves don’t want
to disclose what happened. For a wide variety of reasons. And there isn’t a good
solution to that right now. There’s isn’t a way that I know
of to use analytical methods to estimate that missing piece. So I think the sensitivity
of it is the big challenge. And there are a lot of methods
for handling sensitive data and for trying to reach what,
in public health is called, hard to reach populations. And so this is definitely a
problem folks are working on but it’s a hard one. – Yeah, we did an episode a
while back based on the book, Everybody Lies, and the premise
of that book was around how people use Google as a confessional. How, during surveys, they
won’t admit to things but to Google they will search of things. – Oh, interesting. – So it was very fascinating to hear how, through using Google data,
they were figuring out where a lot of racism
where you wouldn’t expect. Because I always thought
that racism was like a northern southern issue in the U.S. And the racism that was
uncovered through Google was more of an east west
issue which was fascinating. And they were using, you
know, racist joke queries were happening in the
northeast and like, shocking, shocking things that
were being searched for. So now I’m kind of wondering
about people that are victims of sex crimes, are they maybe
searching for certain things in Google that might, you
know, help uncover that. – Yeah, yeah, that’s a
really interesting question. I have a colleague who’s
thinking about, she calls it, data exhaust, and thinking
about exactly that. These sort of breadcrumbs
that people might leave, yeah. – Yeah, that’s like the hardest thing. Those are issues no one
wants to talk about. No one wants to admit to
it and thankfully we have a MeToo movement happening
where things are starting to be discussed which is beautiful
but there is so many other things that are not
being uncovered, right? – Absolutely. – Just a small percentage right now that’s being put out there. So anyway, before we
go, we have kind of like the final four questions we ask everybody. – Sure. – The first one is, what is your favorite programing language and why? – Yeah, so I live in R and
Python and I probably should like Python better but R is the
first programming language I learned and so it’s like
any other native tongue, it’s the one that I think in. – Yeah, from what I’ve read,
everyone who is in statistics, R is the language. Is that pretty fair to say? – Yeah, I think so and it’s definitely what I used more in school. – Cool, okay. Second question is, what
advice to you have for people that want to become data scientists? – Yeah, I love that
question and I think I have a somewhat contradictory answer
because one of the reasons why I love statistics and
data science is because it can be used to answer so
many different questions. But I do actually think if
you’re looking to get into data science it’s worth
starting with your question. It’s worth starting with
what is it you wanna do? And not necessarily
like one specific thing, but what’s the category of
thing you’re interested in? Are you interested in
better understanding clients and customers or are you
interested in better understanding sports or some of these
social justice questions? Because those are gonna lead
you toward fairly different skillsets and as much as you
do always need to be learning, it’s useful, I think, to start in a place that’s related to your motivation. – Nice, I love that. Okay, and then last question
is, what advice do you have for leaders who are looking to
build a great data science team? – I would say that my advice to leaders is really very similar, is
to think about what your goal and your motivation is and
be really clear about that in recruiting and forming your team. Because, at least in our experience, it’s often quite possible to
teach the technical skills, but it’s very difficult to
get the commitment to mission. And so starting with someone
who understands what it is you’re trying to achieve
and then spending the time to make sure that they
have the technical skills to help you achieve that I
think is the right way to go. – Wonderful. Well, Megan, thank you so much for being our guest on Data Talk. Where can everyone learn
about you and your work? – and of course
we’re also on Facebook and we’re HRDAG on Twitter as well. – Okay, wonderful. So, for those that are watching the video, either on Facebook or YouTube, we’ll put the URLs in the
comments so you can go there. And also, if you’re
listening to the podcast, we’ll have a full transcription
along with links on our Experion Blog and the short
URL is just I wanna thank everyone for tuning in to this week’s broadcast of Data Talk. We’ll be back next week. If you want to learn
about upcoming episodes and also past episodes, you can
always go to Dr. Price, thank you
again for your time today. – Awesome, thank you so much. – Thank you. – (chuckling) Bye.

3 thoughts on “Using Statistics to Advance Social Justice & Human Rights @StatMegan @hrdag (Episode 47) #DataTalk

  1. As the Executive Director of the Human Rights Data Analysis Group, Dr. Megan Price designs strategies and methods for statistical analysis of human rights data for projects in a variety of locations including Guatemala, Colombia, and Syria. Her work in Guatemala includes serving as the lead statistician on a project in which she analyzes documents from the National Police Archive; she has also contributed analyses submitted as evidence in two court cases in Guatemala. Her work in Syria includes serving as the lead statistician and author on three reports, commissioned by the Office of the United Nations High Commissioner of Human Rights (OHCHR), on documented deaths in that country.

  2. Dr. Megan Price is a member of the Technical Advisory Board for the Office of the Prosecutor at the International Criminal Court, on the Board of Directors for Tor, and a Research Fellow at the Carnegie Mellon University Center for Human Rights Science. She is the Human Rights Editor for the Statistical Journal of the International Association for Official Statistics (IAOS) and on the editorial board of Significance Magazine.

    Megan earned her doctorate in biostatistics and a Certificate in Human Rights from the Rollins School of Public Health at Emory University. She also holds a master of science degree and bachelor of science degree in Statistics from Case Western Reserve University.

  3. It was awesome learning about ways Dr. Price is using data to improve our world. You can learn more about her data philanthropy at

Leave a Reply

Your email address will not be published. Required fields are marked *