Social Sciences: Researching Us


So now we’re going to have a round table
on the impact of technology on social sciences, disciplines, and research and we’re very pleased to have
four wonderful scholars. Santiago Berreda, Assistant Professor
in Linguistics, Amanda Guyer, Associate Professor in Human Ecology,
Faculty in the Center for Mind and Brain and Chancellor’s fellow,
Duncan Temple-Lang. Professor of Statistics and Director
of the Data Science Initiatives, Ms. Cecelia Su Associated Professor
in History. So, I’ve asked each one to lead off
by describing some aspects of how technology has changed their
discipline and research, or how it has come up with in
their interactions with others. My provocation to them included
the following, I suggested things like changes in research methods where new
questions can be asked to competing power, large databases, new types of analytics. Second changes in evidence indexed by new
techniques leading to new subfields or fights within larger fields. Three new forms of presentation,
such as visualization, in which some fields have led to
questions of authorship and even tenure. One of my projects as an anthropologist
of science was that in biology, the person who did the visualizations
often becomes an author on the paper even if they don’t have any wet lab
experience, and then this has shifted to, you know,
can that person get tenure in biology. And fourth struggles over resources
due to things like fMRI machines, in a school of education
where I once collaborated. It was said that one could only get
funding for projects that they involve brain imaging, leading to resentment and
grudging use of it. Just the power of what grunting
agencies are interested in, can then drive research questions or
shape them. Curriculum changes in undergrad and
grad as these technologies and other tools supplant older ones or
acquire a new space in curriculums. Where there is no space. And new types interdisciplinary
collaboration emerging, both big opportunities and big challenges
in grants, publications, and promotion. So, Cecilia, we will start with you and
just move down the line.>>Okay, well thank you. I am really delighted to be here and
it might be appropriate to start with History, because I think compared to many
of the other social science disciplines, I think historians are unabashedly
low tech or on the low tech side. Historians are mostly self-taught
when it comes to using technology and statistical analysis and so forth. And Ph.D programs and History generally
do not provide training in this. And our work does not, for the most part,
require specialized software or equipment or lab space and
there are exceptions. And there have been pockets of scholars
within the field of History who have long been interested in
technology and quantitative methods. And like demographic historians for
example. And things are changing. So, I think it is an exciting time for
historians to incorporate new methodologies in their research, and,
I like what David Henkin said earlier at the beginning of his presentation
about the revolutionary promise of technology for historians, and how, you
know, we don’t want to miss that party and so I think this is yeah, there are ways
in which this is a good time to have this conversation from
the discipline of History. So how I thought I’d approach this
topic is to talk briefly about technologies that I think facilitate our
research as historians versus technologies that change the kinds of
questions we pose in our research. And then I’ll just provide some
examples from my own research that and illustrate this. So in the first category I think about
technologies that have made doing historical research easier basically. So things as simple as digital cameras so
we can take images of documents and archives that allow us to go
through materials faster, not rely on prohibited photocopy requests. And so these are technologies that
are important for our work and widely used across the fields. They allow us basically to be
more efficient and productive. And I would put in this category also
things like a searchable digitized databases, which include census
records and that are really vital for social historians like myself
who are trying to recover basic information about people who did
not leave much of a written record. Like, Asian immigrant farmers and farm laborers in the Santa Clara Valley
who are the subject of my first book. And I’m thinking of a tool like
ancestry.com that you might have heard of, it’s used by genealogists,
it’s developed for them in large part. And that came of age when I
was in graduate school and it revolutionized how I collected data on
Asian immigrants in Santa Clara County. I was looking at the periods
roughly 1870 to 1930. And I started out in grad school going
through microfilm of what’s called the manuscript census, the decennial
census taken by the federal government. And so I had two reels that
pertained to Santa Clara County and I would go through them and
look for, by country of birth. I would look for
Asian last names and I made this, I created this data base of,
Asian residents of the county in that way. And this includes also information like,
occupations. I was interested in farming and they, and sometimes it would list also
what crops they were farming and so forth. And so obviously with the advent
of something like ancestry.com, that made things a lot, faster and easier
to get through once that was digitized. And, it also opened up the possibility of
searching for specific individuals, and families, by, by name, that would have
been virtually impossible with microfilm. And so, with using the census and then other sources like lease records,
crop mortgages, property records certain Japanese language sources like
ethnic newspapers and directories. That allowed me then to construct very
detailed family histories and chronologies to examine family formation, patterns of
farming and how they changed over time. In response to legislation and
changing concepts of race and so forth. And then another way, example of digitized databases that have
facilitated historical research that I think are used very widely by historians
are digitized print media databases. These are things like
ProQuest Historical newspapers. So and, so you’re able to then search for exact terms as they appeared
in the original sources. And so I could plug in Santa Clara
county or Japanese immigrants or something like that and
then come up with all the search results. And the range of periodicals that
are covered in these database is really growing by the moment and and,
it’s really in my new research is, is on the 1970s, 1980s, and so
I definitely I’ve taken advantage of, of sources like, along with ProQuest
things like Access World News for local and regional US newspapers,
LexisNexis Academic Factiva. And so, with these databases, historians can cover a wider range of
sources that, a ray of sources and find more relevant documents and not
be chained to microfilm as in the past. So, a couple of caveats about this. I think the qualitative
aspect of historical research in my opinion still cannot
be replaced by technology. So just to give a quick example. I think, I, I find that I’m still
missing things in these target, targeted digitized searches, where the results
are only as good as these search engine. And I know there are reference librarians
especially have various tricks for optimizing searches, but I still find that
I often miss things and I have no idea why certain obvious pieces do not come up in
search results and then I found them. Some other way like in an archive,
or referenced in another source. And, and so, and so there, and then,
also I think there are ways that this, the old-fashioned method of combing
through newspapers, on microfilm or in bound volumes, is still beneficial for
getting a feel for the locale and the time period for
coverage of other issues and seeing things that you don’t you
didn’t know that you were looking for. So, I know when I was looking through
these non-digitized trade journals, so these are published by California
farmers in the 19th, 20th centuries. I was looking through them for mentions
of Santa Clara valley or Chinese and Japanese labor. And in that, I came across
multiple references to Hawaii. That led to another research project,
a side project, on the connection between the
anti-Japanese movement on the mainland and the depiction of Hawaii as the prototype
of a, the ominous, majority Asian place. And so I, I probably would not have found
these references if I’d only done targeted searches in volumes and gotten results
that, where these terms appeared. I wouldn’t have been looking for Hawaii. So, so anyway so
that’s the first category. Technology that I think doesn’t
necessarily change the type of research we do and or how we approach our subjects,
but certainly makes our work easier. Then the second category I
want to talk about is how, is technology that I think has
changed the kinds of questions historians ask in our research and allow
us to ask different kinds of questions. And there are a number
of these technologies. And I don’t think there’s one that
everyone is using or going towards using. And I just wanna give two examples that
I’m currently exploring in my new project, but I don’t have much experience
with actually, and I’m, I’m looking to other social
scientists for, guide and advice. So basically, the, the book project
I’m currently working on examines the evolution of Southeast Asian
Resettlement Policy from 1975, the fall of Saigon, to 1990, and how that intersects
with the rise of modern conservatism. And there’s been a lot of, or
more scholarship on US refugee policy and debates over admissions of refugees,
ideologies of humanitarianism, there’s also some new work on, a great new work on refugee camps as
these in-between transitional spaces. But there’s, I found that there’s very
little about what happens to refugees after they are admitted into the US. And, and obviously you heard about refugee
issues that, prominent in the news today about, Central American child migrants or
North Africans arriving in Europe. And there’s a lot of discussion over
whether the U.S. or these various European nations Have an obligation or
should admit these individuals. But in the case of Indo-Chinese
refugees in the 1970s and 80s. For the most part, there was widespread
consensus among American officials and the public that the U.S. needed to
accept these refugees who were U.S. allies during the Vietnam War,
and acknowledge that there was American responsibility towards refugees
from Vietnam, Cambodia and Laos. And so accepting the, the refugees wasn’t
the main debate, but it was resettlement, or how to integrate or incorporate
these refugees into the nation. How to not have them remain perpetual
burdens on American taxpayers was this big question. So I’m looking at how Southeast Asian
refugee settlement policy fit into a period marked by conservatism and mounting skepticism towards
the role of government. This is after Vietnam, after Watergate
with the election of Reagan in 1980, and how resettlement
became a contested and convoluted undertaking at the federal,
state, and local levels. And so one approach using
technology I wanted to kind of integrate into this project is,
is mapping, or spatial history. And so, GIS or geographic information
systems, many of you know this, is a system for marrying data sets with
geography, and it does fundamentally change how historians, I think,
can conceptualize spatial information. So in my research,
where this would apply is that, it’s in examining patterns of migration. So the current scholarship is
based on this assumption of the failure of the federal government’s
policy of dispersing refugees. So this was known as the scatter or
broadcasting approach. That the federal government
would place people in various places all across the country,
so the Mong, for example, were resettled, or were assigned to
53 cities in 25 different states. And, and so, so scholars have noted
that this was a failed approach because what happened is you get secondary
migration, and refugees would then go to other places, that were more, that, that,
where they wanted to be to reunite with family members, and members of
their community, and so you would, you would get these concentrations of
refugees in places like California, the Central Valley or or
in Minnesota as the case of the Mong, and they were pointed to as
these problem spots. And, but the exact nature of
patterns of secondary migration, how it worked over time
is not well understood. So I think that’s where mapping could then reveal new insights on the geographies
of resettlement and migration. There is a possibly of mapping other
data besides population like I’m, I’m thinking of like, for example, kind
of mapping welfare payments for states, or areas where the Office of
Refugee Resettlement Plans, secondary resettlement programs, and
how that changed patterns of migration. Industries of various states
that kind of made it more provided better opportunities for
refugees to settle. And, so, so anyway, so I think, I’m
thinking about that as one possibility, and well, there are certain logistics
that would have to be overcome. But I would have to construct
the appropriate data sets to, to do this mapping, myself. And then finally, the, then the second,
approach that I’m thinking of if, for this new project is,
lexicographic analysis. So I’m sure the linguists can
definitely explain this better, but, it’s a form of humanities computing and
so digital lexicography would allow for tracing discursive evolutions and
shifts over time. So, and so
scholars can create these customizable, searchable databases of digital texts so
they can either draw from millions texts that libraries have already
been digitizing, or then, you could make your own text via scanners,
digital cameras, OCR text rec, recognition software, and
then use statistical software to ss, then see broad discursive trends in,
in language. And so you you can then quantify
the rise and fall of certain words and phrases, geographically and temporally,
and so this has been done in comp, comparative literature and
by literary studies scholars. And I don’t think it,
it would replace historians’ close, nuanced readings of complex,
often paradoxical texts. That’s really at the heart
of what historians do, but it’s a way to kind of, it’s an adjunct
to this discursive analysis. And so applied to my research,
I would see, I think what I would focus on is this term self-sufficiency
comes up over and over again in the stated goals of resettlement officials
at the federal, state, and local level. And in, or by private agencies,
these volags that would resettle refugees. And what they mean by this is, is finding
self sufficiency is finding remun, ruminative employment for refugee
families so that it would not become or remain welfare dependent, and so I could
make an argument in this project that the principal of economic self-sufficiency
as the over-arching goal of refugee resettlement produced increasingly
narrow views of successful outcomes. And so resettlement policy became
stymied by these fears of wasting government resources on social programs. And so I think an analysis of the rhetoric
surrounding self-sufficiency would be fruitful. So it’d be possible with these
technologies to compare rates of usage of self-sufficiency among
private and government agencies. Also, how refugees
themselves use the term. I’ve come across many letters
written by refugees to agencies, their own newsletters, their own mutual assistance associations
that use the term self-sufficiency. I could see it being juxtaposed with
other terms like welfare dependence, economic independence, self-support. So I think this would be a useful tool
to analyze rhetoric and to quantify the prevalence of it beyond just saying
that it was a term that was used often. So being that I could try to trace
how does it peak during the 1980s, does it decline over time, and so forth. So those are just some of the things
that I’m thinking about, and there are other examples of technologies
that change the nature of questions historians pose that I’m not, I’m,
I don’t cover in my own research. There’s a recent issue of
the American Historical Review that talked about the history
meets biology, and so kind of the intersection of
history with evolutionary biology, with neuroscience,
these other disciplines. But I’ll, I’ll stop there and,
yeah, thank you.>>[APPLAUSE].>>I wanna talk more [INAUDIBLE].>>[LAUGH].>>Okay, so I’ll just plug this in. Well, I just got the talk
that I wanted to hear today. [LAUGH] That, that, this is exactly what
we are trying to do is to reach out to people in different disciplines. That way we can actually
make a difference in, this will come up in a second to actually
help people do different things. Okay, I’m Duncan Temple Lang, from the,
I’m the Director of Data Science. So what’s Data Science very quickly. It’s everything from accessing data,
acquiring data, cleaning data, visualizing data,
exploring data, modeling data. Small part, but a very significant part. Conveying the results, okay,
it’s the whole pipeline. Absolutely everything. So it’s computer science, and
it’s statistics and domain sciences. So that’s what kind of
what we’re trying to do. And it’s, it’s challenging. But it’s beautiful stuff
that’s going on here, too. It makes me very excited. So, so
I’m gonna take a much bigger picture of, because I’m not a social scientist, I’m
not going to tell you what you know, what, how, how things matter in,
in social science but, more like, what can we do to actually get
more into disciplinary work. And what, what are some of the questions
that hope to provoke a little bit of thought cuz we’re, we’re facing some big
challenges on the uni, in the university, every university. So what, you know,
what do we actually think we can do? We can ask, I believe we can ask
new questions, brand new questions. They’re new durs,
there are new research agendas to be actually had because of
availability of new data sources. Okay, we have all the same, we can do all
the same things, but we can also, so, but we can do new things. And we can do them in,
we can do existing things and new things in qualitatively new ways, and
that’s so just exactly like digitizing documents and being able to
search them in different ways. And being able to map things and be able to visualize things rather than
actually just describe them in prose. And so we can convey results differently. How do we do this? These are, I mean, again,
this is a very short overview, but the, it’s interdisciplinary collaborations. This is the, this is at the heart of so
many different things. It’s so hard to do on, on campuses. It’s so hard to do with
funding agencies and so forth. But it’s, it’s exactly what’s
going to actually happen to actually make these
new things realities. Unfortunately these involve computing,
data technologies, data analysis, inference, visualization. When I wrote these I was very careful
not to use any term that referred to anybody’s field. There’s no Statistics,
there’s no Computer Science in here. These are very applied skills that
everybody needs that are not actually academic skills that, you know,
where you actually are going for P, for PhD degrees and so forth. These are actually skills that many,
many people need. And we actually think that these are akin
to reading, writing, and arithmetic. Ok?
They haven’t been taught, but we’ve got a really serious issue. That may change in about 10 or
15 years as they, as things get into the into the K-12,
but for the moment we’ve got a real problem with
people not knowing how to do these things. And this is an enormous big opportunity
of actually fusing different data sources together and integrating auxiliary data to be able
to answer qualitatively new questions. Okay, huge then. Very happy to hear you just
mentioned biology at the end. You know, there’s text,
there’s images, there’s movies, there’s all sorts of different sources and
genetics and phylogenetics. And all sorts of information that we need
to be able to pull in to actually get a richer sense. Maybe they won’t have answer,
or maybe they won’t change the, the results of our, our,
of our, our, questions. But they probably might, and they certainly will actually lead to more
confidence in what we actually have. So one of the things, the very simple
thing, people need to know about computing and so forth, but
they also need to know what Is possible. Okay, this is a hard problem. People have come to me and sort of said,
oh, I didn’t know you could do that. [LAUGH] At the end of the conversation,
it’s like, we need to have those conversations. We can do an awful lot
about digitizing text and being able to make searches better and being able to find out liars in clever
ways and we can customize these things. And to, to, to different applications. So, I, I,
I’m here to pitch something else, as well. So it, how do we solve this? We can’t solve the training problem now,
[LAUGH] but we have to. Do, we have to start. Okay? But we can actually do
interdisciplinary research, ok? And that means we actually
have to be open to it. And fortunately enough I, I’m the director
of this data science initiative whose task is to to do this and we have some minimal
funding to be able to do this kind of for free with you and we are looking for active collaborations of, for doing,
for doing data science related tasks. But at the same time, sort of,
we actually do need to train social science researchers
to actually be capable. Not experts. We do not want them
building the next Facebook, we want them to be able
to manipulate data. And, and not deal with massive scales and
engineering problems, but just be able to be self-sufficient
in a very different sense. To be able to, to be able to answer their
own questions that’s the longer term we can actually use interdisciplinary problem
solving and collaboration to get here. We can do the education at the same time. Okay, so they need these statistics,
broad, you know, all, all computing and technologies. Most importantly,
they need to learn how to problem solve. This is a really different thing. We were just talking about this,
and it’s a very, very different skill in some regards
when it comes to technicals. In, in data science, we talk about
pie shaped re, research structures. These are people who
have legs in two camps. Okay, these are,
rather than their single domain, they actually have to have multiple and
multiple skills. Okay?
One thing, so I’m going to sort of mix things
around a little bit, okay? One of the things I was thinking about
from the per, perspective of statistics is there’s statistics, there’s
machine learning, there’s data mining. These things, these terms have come and
gone and changed. Statistics has been around for
a long time. Machine learning and data mining
were sort of came in through CS. There is a danger I’m concerned
about just with some being taught, and looking at Mark and
Hildreds talk today. I’m worried that Google is gonna be able
to do social science better than we are, okay, or you are actually,
technically, okay?>>[LAUGH]
>>But we collaboratively, because remember the collaboration,
the and that is that they won’t do it well,
but they’ll do it. And they’ll actually be heard more loudly. And they may be actually
quite superficial. And in statistics we, so
the people didn’t actually respond to the needs of the changing
world of machine learning. And I, one, I’m just sort of saying, if
social science doesn’t do this, you know, what happens? Any answer is better than no answer. Knowing how, knowing that you could
get the answer in, in theory, but not doing it, leaves the door open
to many other people to do it. And that’s,
we can collaborate with these people, that’s ideal, because they
have the skill sets and so on. It’s not just the Googles, it’s just the
the computer scientists who are saying, I can find relationships here. But the depth of the relationships may
not actually be very significant and that’s, that’s an important thing. So you do need the entire pipeline, and this is where data science fits in, so
there’s opportunities for collaboration. These are all the things. These are the details of what, I mean,
some of the details of what people actually need to learn to do some basic
domain sciences in history and so for, linguistics and so forth,
that I think come up over and over again. These are common themes. The big, the big thing, at, is,
in the, the one in the green. These are just details. The rear is getting an experience
a sense of data, of data analysis. It’s, unfortunately, not very well taught
in statistics, I can say this I’m a, I’m in statistics, okay? It’s actually,
it takes a huge amount of experience and an enormous amount of time to develop, but
we can actually do a little bit better, but we actually,
people have to get exposed to this stuff. So, if we’re going to teach them how
to do computing, bit of statistics, not massive amounts of statistics. If we’re going to do this,
as Joe was mentioning. What can we do to put
it into the curriculum? Okay, this is my question
to you about something, slight trying to provoke a conversation. What can we can we do to
actually guess to get more, get more space in the curriculum? Okay. Who, who’s prepared to
actually throw stuff out? [LAUGH] Okay? Been at faculty meetings. It just doesn’t go very well. [LAUGH] Okay, and again, we may have actually have a sort of
a potential solution for this, okay? We may have special students who are just
sort of, you know, in a different track, there’s only three or four of them, then
they’re left to learn it by themselves. That’s unsatisfactory. There may be three or four in every
department with common needs and now there’s 25 of them and
that actually means, that means a quorum to
actually get a class going. Okay, but we need to actually interact in
that way and find the commonalities, okay? And the other real thing is how do we
actually let people go off and do all sorts of wonderful things, that are brand
new, very risky, and don’t stifle them? Karl Stamer, who some of you may actually
whose who, he’s the director for digital scholarship in the library,
just recently. He was here telling us about when he was
doing his English, his PhD in English, and he was doing all this internet stuff,
and they told him it was a fad. [LAUGH] And, and he should move on and
do some serious stuff. And he’s doing some absolutely gorgeous
things these days in data science. Okay so I just wanted to mention the DSI
because it is a resource on campus that I want you guys to know about and
leverage for, for, for free. Okay? So basically, it is,
we have a very strong focus on broad, interdisciplinary research. So we’re not academic statisticians or
computer scientists. It’s not just research,
it’s also education. Okay? We’re actively seeking
collaborations very soon, okay? Like, tomorrow. If you come to me tomorrow,
we will start then, okay? Summer, a great time, okay? On both existing and new projects, okay? And they can be massive,
brand new, very earth, you know, earth-shattering projects,
or, you know, reasonable problems that you can’t solve,
that you know that somebody else can. Okay?
And that would make life much better for you. Okay? So in terms of actually
getting things done. [INAUDIBLE] Okay? The the so basically I’ll say we’re very
sort of applied-oriented, orientation. But the research end of things is an,
is an enormous component. Okay? We want to help people
develop new curricula. We want to help people develop new turn,
new components in data-driven courses, we’ll provide workshops,
we need to hear from you. We really do, and
we’re prepared to actually go out and do an awful lot of work because
we believe so much in this, okay? We may even develop designated
emphases and minors in programs, okay, if there’s enough,
if there’s enough interest. And finding the commonality amongst these
different disciplines is important. Okay? There’s I’ll just say,
just quickly say, say one thing, okay? There’s a lot of buzz about big data. A lot of nonsense, [LAUGH] okay? I mean, and
we just don’t want to be lured into this. Okay?
It’s very exciting. There’s real opportunities, yet we need
to actually have wisdom to look at this. There’s no getting around it. You have to read the data, if you will. You actually have to be with it and
experience it, experience. And there’s a real danger that people
will get lost in technologies, okay? There’s a huge surge of this, okay, well,
let’s all go out and learn Python or R or whatever it is I’m actually not and not, forget the question
that we’re actually asking. So, and the other thing is,
there’s a big difference between big data. There’s a lot of talk about prediction. And I think, in social sciences, and somebody mentioned this earlier today
which I think is really important. We’re trying to understand things,
not predict them. Okay, we want to know what will
happen in an intervention. We want to actually be able
to understand the mechanisms. And this is qualitatively
different from what Facebook and Google are doing in terms of actually
predicting what ads you’ll click on. Okay, there’s a different question. Okay, so so, the, the,
the one thing I’ll just ask is, it really, I’ve seen this in other disciplines. Embrace it or not, it’s a decision you
guys are gonna have to actually make. It’s gonna take time to actually
develop the curriculum or not. So, but we are here to help. Sorry for taking so long. [APPLAUSE]>>Okay it’s really exciting to be here, and to sort of contemplate these
items that I feel like sometimes, I’m just sitting in my office
thinking about by myself, so it’s nice to have
a chance to articulate them. I conduct research that looks
at adolescent brain development. And I try to link patterns
that we see in brain function. Brain structure with most typically,
I’ve been looking at symptoms of psychopathology such as anxiety,
social anxiety, depression, substance use. And as well, I look at age-related
differences, so sometimes I’m comparing groups of adolescents
with groups of adults for example. And this has been really interesting for
me, because in my graduate training,
I didn’t do neuroscience at all. I, I was trained as
a developmental psychologist, but I found myself with questions that really involved trying to get a, a,
a better understanding of biology. And so, I took this sort of
new turn with my, with my work to learn about the brain and to go and
learn how to conduct neural imaging. So I spent almost like,
another PhD six year postdoc gaining some of these skills to try to,
to, to do this in my work. So it’s been very exciting and
has absolutely raised lots of you know, problems I need to,
to tackle everyday in terms of things, you know, the,
the struggles over resources, for example. Doing this kind of work
using FMRIs is costly. Just the hourly fee alone
costs a lot of money. There’s also issues around being able
to access the machine that you need. Having staff to help run the machine and
people to help you collect the data. And then how to train your students and,
and your trainees on these
methods that some kind, that really draw on exactly these kinds
of skills that Duncan was mentioning. Trying to process data,
visualize data, and solve problems in the patterns
that you’ve got in your data. So it’s really exciting to, to be somebody who’s been able to
integrate technology into my work, because it was driven by new research
questions that I, that I had at the time. So that’s been a very exciting thing. I think that in my field, you, you might call it developmental
cognitive neuroscience. Really, one of the first studies
was published in about 1995. So, it’s, it’s relatively new
to use this kind of technology in the kind of work that I do in
terms of developmental psychology. And a lot of these studies,
the initial ones relied on really small samples, about 12 kids,
12 adults being compared. And so one of the exciting things that
I’ve been working on in the projects I’ve had over the last few years is
actually drawing much larger samples. Not nearly as large as other
disciplines might but, might drawn on. But for example, I’ve been Acquiring data
on about 200 adolescents at this point. And the other interesting thing that,
at least to me, is I’ve been also trying to
track change within person, within individuals over time. So, a few of my projects now
are collecting fMRI data at multiple time points so that I can see, okay in this, you know in this set of kids what’s
going on in their, in their brain and, in relationship to particular symptoms,
at age 16, at age 17, at age 18, and trying to model those,
those patterns over time. So we might be able to see
correspondence between brain and behavior over time within individuals. So that’s been an exciting
new direction as well. That I feel like now there,
there have been a lot of advances in the software programs that
are required to, to process and analyze fMRI data that now allow
us to do those kind of analysis. When I first started out for example,
I really, I knew I really needed to do a, a statistical analysis called
a repeated measures ANOVA, but the software I used didn’t
have a way to do that at all. So instead I had to run just
a set of separate T-tests. And that, and it wasn’t necessarily
the best way to approach the data but it was, I was limited by the,
the software. That issue has now been addressed. You know, I can now run a nice
repeated measures ANOVA in the same software program that I
had used, you know, four years prior. So everyday I’m finding these
software programs, for example, to be changing to be updated,
to be, responding to the needs of those who are, you know, asking these
different kinds of questions over time. The other exciting piece
to me is that I feel like, my work has been moving in,
in more of an inter-disciplinary sort, sort of direction, in the sense that
I’ve also become very interested in trying to utilize existing data
available on the participants I’ve been studying because they’ve
been involved in other studies. And other studies in which the data
collected on them is different. It’s observations of their behaviors
when they were babies or toddlers. It’s questionnaires that they’re,
they’ve filled out about themselves, their parents have filled
out about themselves. Its, census tracks so we can understand
what neighborhoods they’re in. And trying to pull all this together to
see, well what kind of influences are, adolescents’ environments having
in relationship to, to brain? So that’s another exciting direction
my work has been taking, and I’ve been lucky to get to know
different faculty, for example, in the economics department,
through different initiatives on campus. So we come together and talk to each
other about poverty, for example, how do we measure socioeconomic status. And how can we do this in
the kind of data set that I have. How do they do it in the dataset
that they have, and what and what can we gain from understanding
these different approaches to, to better understand how kids develop,
and it what context, and what matters the most for their
healthy development and, and outcomes. The, the last piece I, I wanted to talk
about and it’s not something that I have used in my work, but it’s a,
it’s an area that I have seen sort of burgeoning in the, in the study
of adolescent development, for example. And it’s,
it’s generally referred to as a sort of, sampling exper, daily experiences, right. So ecologically, you know,
momentary assessments, and these are done using daily diaries,
for example. So kids are asked to report every single
day, at a certain time of day on, you know, things that have happened for
them, how they feel, so on and so forth. Or they carry beepers around with them and
they’re beeped and then they respond to that
beep with whatever question. Or you pull their Facebook pages and you relate that to their moods,
to their brains, to all sorts of things. So it’s been really exciting also to
try to get, a little bit more of that micro picture of their daily experience,
and doing that with technology. And I’ve seen, in different colleagues
I have in the field, I’ve seen them incorporate these different methods within
their studies to try to get this you know, sort of more, more micro time scale,
if you would, of their daily experiences, to, to integrate that and
to mental health, for example and, and
other indices of, of well-being. So that’s also a very exciting direction
that I’ve seen technology altering and shaping the course of research
questions being asked in the field. I think that’s all I’ve got,
but I’m happy to answer more.>>[APPLAUSE]>>Hi. So, I’m here to talk about linguistics. Linguistics is a,
the study of human language. It’s very broad. Here it encompasses things from phonetics,
which is just sounds, you know, like the kind of
sounds I’m producing right now. So for example, I can tell you something
like, North Americans tend to be nasal for vowels before an a. So if you say something like man. Man, man. You can try plugging your nose and
see if you sound different. If you sound different,
you have nasal speech. It’s not something most people
are consciously aware of, but its something that most people know. There’s thing like, things we are more
aware of like the dog and not dog the. So that’s something like syntax and then more complicated than that
is something like pragmatics. So if I say boy it’s cold in here, but what I really mean is,
will you close the window, you know. So that’s again not something
directly evident in the signal, but it represents language knowledge. So, really technology’s affecting
all levels of linguistic analysis. And it’s doing so in a couple of ways. First, it’s making data
collection way easier. Obviously, you know, recording people,
making video recordings, audio recordings, taking pictures. All this is much,
much easier than it used to be. People’s whole lives are being
documented these days. Before it was hard. I mean, a lot of us prob,
might not have video recordings of us when we were younger, because that
technology wasn’t really widely available. Now people’s telephones do that. So, documenting things has got much,
much easier. But this actually leads to a kind of
different problem in linguistics, it’s kind of like the dog chasing the car. You know, what do you do once
you do have all of this data? You know, I was thinking about this lunch
we just had and language is a weird thing, because, how many words were probably
said in total during that lunch? Tens of thousands? So we’re constantly bombarded and
surrounded by language, which is the output of the system
we’re trying to describe. And yet, what can we actually do with
all this language that’s happening all around us? So, one of the things that
people are doing with this is things like building speech corpora. So massive collections of written or
spoken data, hundreds of millions of words for example. So things like corporate tell us,
what do people actually say, not just what could they
hypothetically say. So things like how words go in and out of
style like why is everyone here, probably, more likely to say,
cool than, groovy, you know. We know that they may mean
roughly the same thing. Is there like a separate piece of
knowledge that tells you not to use certain words that mean
roughly the same thing. Corpora looking at what people actually
do instead of what people could hypothetically do, gives us all kinds of
information about language knowledge. For example,
there’s something called semantic prosody. I learned this when I took my
first speech corpora class. Semantic prosody, so for
example, the verb cause. Is it bad or is it good, to cause? That’s a weird question. I think most people
would say it’s neither. It’s making something happen. But I think most people also agree
that someone caused an accident. Someone caused mayhem. It’s weird to cause pride. No? She caused me to win first place. She caused me to be disqualified. So there’s a negative connotation with
this that’s somehow secret and hidden. These things come out when you look
at massive amounts of data and you look at what people
are actually doing with language. Instead of what people
might hypothetically do. The things is you only have access to
this when you actually have this data. Previous generations of linguists
could really only speculate. At this point when this data
is actually is available, there is really a lot less of
an excuse to just speculate. So that’s one way that
linguistics is being changed. Now you’re actually confronted with having
to test your hypotheses and dealing with. But the other thing is, you know, you have a corporate with
several hundred million words. What do you even do with that? So that’s one of the major problems of
kind of having your wish come true and getting access to all this data. On a more low level,
there’s things like actual computer, speech representation which involves
digital signal processing techniques that rely on advanced computer power. Like I learned signal processing using
a book from the early 90s which wasn’t that long ago, and a big focus of the book
was, how can you minimize RAM usage during what today is an operation that
like you, would be a joke for a watch? So, for example now we can do things
like signal, digital signal processing. We can communicate with really
computers in real time. Things like Siri exists, although they’re
not really as good as you might think they are apparently serialized on
sending speets to a server farm and that does the processing. So that kind of leads to another way
linguistics is changing which is the different kind of data
that’s being generated. So, things like Twitter,
which gets brought up a lot. There’s Twitter corpora. What is, what are tweets like? Or text messaging? How does text messaging affect languages? So a lot of people might see text message
language or chat language as like a sort of broken or
disorderly manner of language used. But it’s not, it’s, it’s just,
you know, a different kind of order. The other thing is I think when we
watch movies about the future, or Star Tre”, or whatever, a lot of people’s
conception of what the future means, whatever the future is you know,
talking machines. I think in a lot of movies, especially
like if you have sentient robots or whatever, usually you want
those robots to talk. You really see how in a science fiction
movie where a computer gets sen, sentience and is running around
killing people and actually can speak. Speaking is like the signify
the filmmaker uses to tell you that this is an intelligent
thing on par with us. Something to be feared, or whatever. But apart from that, I think we’d all
like to see a future where we can actually build talking machines. And have interaction with
artificial intelligences. And you know, when you are trying
to solve a problem like flight. Whether something flies or not,
the constraints on that are physical. You know, you have to understand
the physical universe and overcome the obstacles imposed
by that to make something fly. To make something talk though,
the answers in here, you know, the constraints
are placed by human psychology and humans almost definitely present the
optimal solution to this speech problem. So that’s sort of one thing I answer to
people that are particularly pragmatically minded when they say you know,
why, why study linguistics. I sort of just respond that wouldn’t you
like to get in like Star Trek style to your house and say you know, kitchen,
boil some water for me, you know. So in closing, I’d just like to say not
only is it affecting the research and the data but it’s really opening, it, it’s
going to more and more open up more and more avenues for useful research in linguistics that has
practical implications for everyday life. [APPLAUSE]>>Okay, thank you all very much for
this great provocations. So we had these presentations
specifically short so that there could be more time for
dialogue among you all. So we have some microphones in the room, and I’m sure they have more to
say if there aren’t questions. It silenced them.
[LAUGH] Sorry, let’s just get the microphone to you.>>Excuse me, sorry.>>Drew Hoffman from Sociology. So I’ve been doing some stuff
sort of like content analysis, discourse analysis, things like that. You know, I use these corpora,
I search them. Again, the searches are always messed up, there’s a bunch of weird shit
that’s not where it should be. So that’s one problem. But then another problem is like you know, I start counting references to a certain
term etcetera, and then at the end of the, and when I’m done with it,
I’m kinda like, yeah, so what. Right? And so I’m, I’m really much
more qualitatively inclined. Right? So I guess what I really,
the question, this is a long preface. But, the question is really like you know,
what are the tools that can allow. I’m really excited about all this data. But I kinda wanna use much more still
qualitative ways of, of interacting with it, and so, so I’m interested in,
in how technology could aid that.>>[LAUGH]
>>So can you, can you hear me?>>Yes.>>So one, one way it’s hard this is one of the reasons I mentioned about data
analysis actually taking a lot of time. You actually have to know what the
questions are and then you actually have to think very hard about what
the data are, are going to admit to. But one of the things it can
actually deal with is, which you, both of you sort of said, these searches
that things end up in the wrong place, they’re just crazy searches,
you’re finding outliers, you are actually finding anomalies on the data
>>And if you can find a good way of, of identifying these, you may actually
be able to eliminate those documents, or, or
actually correct those documents, or finding out something interesting
about those documents. So, in that regard actually just using,
using data summarization to explore your data, and get the quality correct,
or to find things you didn’t expect. That’s a big, big issue.>>And I fully under, I mean, at the end
of the day on the qualitative front, I mean, everything is
qualitative to a large extent. We, a lot of statistics is very boring
because we estimate parameters and then this is a big difference. And Joe and
I are working on this in, in class. We just, it’s what do you do with the
numbers that the, that people estimate? That’s the interesting part. And why did you estimate
them in that particular way? There are a lot of statistical
techniques and models that will a, that will allow you to address
different questions and then get the qualitative answers that are
a little more objective, than, than that. So you can actually say this
is entirely subjective. And that, that might be the case, but
again, it’s very much framed in what, what the questions are and
what makes sense to ask of your data.>>[INAUDIBLE]
>>So, my question was raised by something
Cecilia was talking about. In terms of using technology to access
this data, and you were describing the, the evidence of the data that you
had been using previously, and how it’s idiosyncratic, and you know,
census records taken by hand or just made by fallible human beings. And I wonder, both for you and for others, to what degree does the technology
highlight these qualitative elements that go into shaping the very stuff you’re,
you’re then quantifying [LAUGH] and to what degree does it, so
does it hide it or illuminate it? In other words,
does applying these technologies reveal or conceal the messiness and, and
the kind of act of faith you’re taking anyway of pretending that these are units
that can be, not pretending but, deciding that these are units [LAUGH] that can
be measured and manipulated and, right. That the technology itself, what role
does it play in your understanding of the original material, I guess?>>Yeah, I think that’s a,
it, that’s a good question. In the case of the census databases,
what I mean, I’m not sure exactly the tools, say,
for example, Ancestry.com has for transcribing the handwritten censuses,
but there’s somebody who’s doing that. Or there might be software that’s
doing that and and so, it could be, so it’s like if I, whether I’m doing that by
looking at microfilm or the software is doing that, I’m not really sure what’s
more accurate or what it illuminates. I think the rate of error could
be [LAUGH] comparable, or it, we could be making different
kinds of errors, so. So there, it’s like, I think it’s
the same issue whether you’re using and same, similar with, with this with OCR. I, I’m, I don’t know if you’ve used that. But it seems like there’s, yeah,
there’s this accuracy issue too. Like, is it accurately
digitizing what the text says? And I haven’t had experience with that so
I don’t, but I would like to hear how that works. So I think there’s yeah,
I don’t have a good answer to that but I think that there, definitely, it,
it’s, the technology is not the answer [LAUGH] to the problem of idiosyncratic
data, errors that are inherent to it. So. So.>>Question over there.>>Sorry.
Just there is one aspect where it can help and that is actually, well, the, OCR or whatever automated technique is actually
being used, we can then find the errors. By actually having humans go back in and
check the errors, we can actually then retrain the system,
okay, and actually improve those errors. So in an awful lot of cases, you, you, off
the shelf, OCRs and other, other things worked as well as they work, and
that’s the end of it the, at the moment.>>[CROSSTALK]
>>Which is the, one case where it’s actually,
it’s blatantly clear what the, how it’s, how it’s failing. And we get data as how it’s, how it’s
failing and then actually go back and change the algorithm where we can
actually get it pretty, pretty well. So the,
the combination of the two is cool. [SOUND]
>>Hi, Georgia Zellou, linguistics. So Santiago touched on a really
interesting point that in linguistics is really big where access to technology,
access to corpora, access to big data has basically
revolutionized the field. Has basically taken, allowed scientists
to ask different questions, to examine questions that were previously asked and
led to sort of a revolution in the field. And I wonder if this specialist
from the other areas could speak about this in their, in their fields. Whether technology and this kind of, the,
the, the growth of technology has led to sort of theoretical paradigmatic
revolutions within their, their fields. Thank you.>>I think one, one outcome in my field
is the, the data that has been collected. You’re using neuroimaging, for
example, particularly of structural brain regions and, and, and how they
change over the course of development. We, we had a real shift in the idea that,
you know, originally a lot of scientists, you know, in, in developmental psychology,
would sort of say okay, well, most of the brain development
is done by age three to five. But some of the findings from some of
these, these large scale longitudinal studies of changes in brain structure
really actually show that the brain has, is continuing to develop
into the early twenties. For example ,the prefrontal cortex. So that, that finding alone has
dramatically shifted the way people think, has really opened up new doors at least
in my field, in, in terms of, you know. It’s, it’s not,
it’s not all done very early on. There’s, and, and, and
then that has implications for, you know, the, the legal implications, for example, of adolescent’s behavior,
and, and things like that. So that’s been sort of a, a revolution,
I would say, in my field with technology.>>Yeah. Maybe I can just.>>i just wanted to add something to this. Just, I was really struck, I mean,
it came, came up in all the talks, this, you know, when Thomas Kuhn defined scientific revolution like a paradigm
shift, he said what it really came down to is that new things counted as evidence
that didn’t get counted before. And what all of you touched on and Duncan
emphasized is, you know, what does it take to retrain our colleagues or train our
graduate students or train the next generation in understanding what it means
to make a claim based on machine learning? What it means to evaluate
someone else’s GIS map. What it means to show someone, you know, to, like, cuz there’s these two things
that kind of work side by side. One is we develop, like, new journals. So you get a journal of, like,
computational linguistics. And then everyone in that knows this. But then the other part of linguistics
might not know in any way how to even read those papers or
evaluate the findings in them. And so there’s, you know, in some cases, historically those disciplines
split over things like that. Or, you know, in the case of anthropology, they just don’t talk to each
other very much anymore.>>[LAUGH]
>>But I’m interested because we, this one, the hype kind of leads the,
the, the innovation. It’s interesting, cuz there’s, like,
more of an attempt to dialogue about it. There’s more of an attempt to say I’m,
I’m very curious and suspicious at the same time. And so
I think kind of reaching out to that, this shift that’s kind of, we can name. You’re not supposed to be able to name
paradigm shifts when they’re happening, but the temporal flow here is
getting disrupted a little bit. And so I think there are these ways in
which, you know, each, each of you has been narrating how you kind of were pulled
into thinking differently about evidence. And then the second challenge is then
how do you kind of push that out to help your colleagues understand
the nature of your new evidentiary claims? I don’t know if that, so. I don’t know if there’s any more,
something more you can add on those lines, but I’d love to hear it. [LAUGH]
>>I guess what I would say quickly is that I still think,
for me as a historian at least, the, the qualitative nature of the work is,
is still Primary, I think what, some of these technologies lend, whether
they have revolutionized the field, I don’t know if I can,
>>[LAUGH]>>If I feel comfortable [LAUGH] making that claim, but I think it’s definitely allowed,
projects to be more multi-dimensional, and the conclusions that historians can draw,
to be much more layered, so especially if we’re thinking about something like
mapping, and seeing spatially. So, a great example is that we just hired
new colleague, Greg Downs, who has a new book out on reconstruction, and, the,
or the aftermath of the Civil War, and he’s got this great, site, that look,
that looks at mapping occupation, and shows the, that presents
the spatial history of the U.S. Army after the Civil War, and so
you can really see, the kind of the, the reach of the U.S. Army in the south,
and I think it, it does lead to yeah, just more nuance, understanding of what
happened immediately after the Civil War. And, yeah so anyways,
I think there are ways in which that these technologies really
enrich what we’re doing. You, even if, what we’re, the arguments
that we’re making relies still on qualitative readings of the sources.>>The change in, in terms of,
linguistics with technology is not really splitting it, so much as, you could see it
as, like, progressing up since language, linguistic fields are sound on one level. You know, that’s directly observable,
it’s easy to deal with. Basically, since digital signal processing
in the 70s, especially with computers, people who were studying speech sense
have no excuse for not directly observing things and analyzing them empirically
because sounds are out here, but, you know, as we can college more data and
have more competition of our more sophisticated analysis techniques,
slightly more complicated linguistic problems can be
dealt with computers and things like that. But, you know, it’s going to be a very,
well, you know, I wouldn’t have like say 100 years ago we
that we’ll never put a guy on the moon but it feels like it’s going to be a very
long time before we have like a simple concise computational explanation for
something like semantics. You know, how you extra subtle
meaning from a sentence. So people in linguistics and
those fields it’s completely understandable that they’re not
being changed dramatically by technology. So in linguistics I think
it’s it’s more split like, in terms of how accessible the
observations are and how, how, how much you can realistically be expected to
involve technology and data collection and more sophisticated analysis techniques
in the thing you’re actually doing.>>So I have sort of a pragmatic question. It seems like collaboration
in interdisciplinarity is one of the recurring themes. And, you mentioned a six year
post doc to learn the technology. Learning these technologies is,
>>[LAUGH]>>It’s time intensive right? And so, if in your PHD training you’re,
you’re learning the technology, it means you’re sacrificing some
of the disciplinary knowledge. Right?
I think that’s in transit, or we’re doing long post talks,
or whatever. So I’m wondering if [LAUGH],
in a pragmatic sense in collaborative teams, how advantageous it
can be to have like a high variance in the technological versus
disciplinary focus of the researchers. Would you rather have a bunch of
pie-shaped researchers on a team versus having some more specialized technologists
and some more disciplinary oriented folks? Both for the quality of the research, and
then for professional advancement too. Are there,
are there cracks to watch out for with, with regard to adoption of technology?>>I loved my post doc
it was really great. [LAUGH] But anyway, it’s great here too. So I think that it’s, it’s, it’s interezz,
this is an issue that comes up a lot. Particularly as I’ve been participating in
some different interdisciplinary groups, and it, and it can become hard I think for
trainees to know where their place is in a field, and
know where they should be publishing, and. And what they should be,
creating them, themselves as, in terms of getting known for,
for a topical area, versus some kind of technological,
skill set. You know, but so long as you’re
marrying those two, I think that’s, that’s likely to be a very
strong way to go so that one, one isn’t in opp,
in isolation of the other. Particularly along an academic path. In, in some ways. So I think that it’s, you know, they are real issues that the trainees
I work with kind of come up against. But I think, also, if you think about if
you’re, if you’re starting on projects and you need people to work with you
on these different projects, there are ways to support those projects
by pulling different people in. Different disciplines,
different skill sets. So, it’s sort of a different level,
perhaps, than the kind of direct, you know, you could work with your trainees
on specific fiel, within field kinds of things, and then have some outer supports
on, on specific projects, for example. That might be one way to do that.>>Do you wanna?>>Well I think it’s time [INAUDIBLE]
>>My experience in, in linguistics, it’s
>>I mean, I’m really. I’m probably extremely biased in this
because I lean more the one way. But. It’s just.
I’ve found it’s just much, much harder to acquire the technological
or statistical analysis for data collecting techniques
than the theoretical aspects. Which you can just read, you know? Most people can.>>[LAUGH]
>>No, but I mean, really. Like, I think. Most people can [CROSSTALK].
>>Just read.>>[LAUGH]
>>Well>>It’s much it’s things that you can do on your own sort of, where the other things are hands-on,
you have to it’s trial-and-error. And sometimes no amount of reading,
at least in my experience, no amount of reading helped me
understand some of these concepts. And it just took time more than anything,
whereas the theoretical aspects, I think that most grad students
can pick these things up, even, even on their own, a lot of research
is done alone, without direction. So just, in my, from my own perspective, I’d rather have a couple people working
on a project who can handle these complicated things that can’t be picked
up, you know, in a short amount of time. Or at least most of us don’t have
the ability to pick these things up in a short amount of time.>>[LAUGH]
>>Just, this is, this is r, music to my ears. We do actually teach a lot of
courses that are in textbooks. And then this stuff, and this stuff,
the computing and the statistics, the statistical practice is
actually not in textbooks. It’s just the methods and the actual how-to’s so it’s, so it’s,
you know, bizarre that we invert this. We can actually try this again,
I’m making this pitch for. We will teach people these, these
skills if you come to us or, you know, if I can come to you and identify
commonalities so we can amortize it over more than three people, you know, and I,
and I think they are very common skills. As for the pie shaped people,
the, you know, I, it goes back to what I was saying about this is the same
as reading and writing and arithmetic. We want people to have basic
computational skills so that they can actually self-start,
that they can problem solve and not have to wait for three days for
somebody to come in. But we do not need everybody to be
a data engineer or a statistical guru. But they have to have, they should have
some facility so that they actually know what’s possible and what’s, and, and how
to get themselves out of trouble enough. Or know when to go and talk to somebody. That’s actually a big deal.>>If they don’t have the exactly
same vocabulary that’s a real, that’s a real issue, so
have to actually learn enough of that. And that’s just One last thing is there
was, there’s these RISE grants on, on, on campus, and
there was a RISE Symposium, and one of the messages coming out of
that was, this is suicide for, as as a young faculty member, to actually
go and be a very interdisciplinary person, and and that’s a real shame but
it’s a reality that we have to deal with. At the same time it also makes you very
unique [LAUGH] and makes you very, very attractive towards other people. And if it’s an academic job
that’s one thing, but if it’s a, if it’s a, if it’s a, an industrial
job you better know these skills, so there’s a real tension at the moment that
we’re gonna have to actually deal with. Colin Cameron, Economics. I’d like to follow up on that. I think in medicine, medicine was willing to subcontract
out to statistics, all right. And you know the issue, and
Duncan now is trying to reach out. Right, an issue, I think, is whether
statistics can do a similar thing for social sciences, and, you know, I
speculate, for example, whether the sorts of people who get into statistics come
from more of a, a sciences background, and that social science is a bit
more of a a bit more of a stretch.>>Two things. It’s not just statistics. This is all of data science, so this is
the whole computational pipeline, so just, just, and Colin, you know this is just,
more than anyone, and, so that, you know, that’s good. I mean, I, I do, I do worry that,
that, that statistics, there’s a lot of people involved in social
sciences and, in statistics discipline, but we have, it’s much easier to actually
be deal, dealing with harder science, and that’s, and that’s a, so
we need to actually attract and, and educate statisticians who,
to go after some of these problems. Personally, I think there’s awful lot of
low-lying fruit, to actually be had, and that’s, that’s why I think we can,
we can attract. But it is, it’s really a meeting,
a meeting of the two minds of people saying I, I want to do more,
more quantitative stuff. And again,
not to lead to quantitative results but to lead to qualitative results through
quantitive methods and then, and then the statisticians meeting in the
middle, and then the computer scientists.>>Yeah, one of the things that
fascinates me about this I mean, I could say all the projects represented
here is that in this kind of quantitative, qualitative divide which usually was
defined by statistics as the quantitative side of things that data science actually
seems to sit much more in-between. Maybe even a third type of literacy. In that it’s incredibly obvious to
anyone who struggles with a corpora that if you’re not qualitatively approaching
it at the same time that you’re quantitatively approaching it,
It’s useless. And so that, you know,
as an anthropologist, when I hang out with people doing
data science in various ways, I hear a very different type of
talk in which qualitative and quantitative doesn’t in any way hold for
any length of time in that it, it, it very different than the way in
which the statistical analysis often ends with a number, and then that’s the
end, and that data science often doesn’t. And so that makes it very interesting,
I think, for thinking about what interdisciplinary work means when multiple
people are in a conversation, and what are kind of other types of future forms of
evidence and argument making look like. I think we have a question here. Here, oh.>>Hi, I’m Xiaoling Shu from Sociology. The traditional model of research
is a primarily confirmative model. That is, you have a theory and you come up
with hypothesis and you build the model. And then, and, if the focus on model building,
focus on the few, pred, predictors. But then the new type of
research give data more power. So, you have data mining and
unsupervised learning. And deal with those types
of new techniques that traditionally in the social sciences
many people would have serious doubts against data, that’s sort of, quote
unquote, data-driven type of research. So, how we sort of this new,
how we persuade the people who are trained in the more
traditional paradigm of, of, wa, research and sort of persuade
them to accept that’s the, that the findings are valid and
the and powerful and not, just some patterns that emerge
from this one data set or or it’s just that’s probably my question.>>Well I, I don’t know. Luckily or
unluckily depends on how you look at it. Qual, qual, purely, purely descriptive
work has been a part of linguistics, really, from the start. So there isn’t really
that problem you know, a lot of people only describe dialects or
social aspects of language or sound. You know how people convey identity or
you know, like the way we choose to speak really is mostly to tell
other people what kind of person we are. So, when you look at it like that, with
language as like a social tool sort of just, just describing, it’s not just,
it’s not just describing anymore. It’s actually just explaining how this
message is being transmitted from one social group to another,
from one dialect to another, et cetera. So actually I that for some aspects
of linguistics that’s an issue but generally in the field a lot of people
do exclusively descriptive work or, or qualitative work without even testing
any hypothesis at all by design. So yeah.>>Well I would just say real quick,
it struck me what Santiago is saying about how in linguistics, it’s like, this data
is there, so it’s, you can’t ignore it. You, it’s out there, and you have to test these hypotheses,
and I kind of see that within history. I mean, we don’t have so
many ready-made data sets for people to manipulate or mine,
but I think increasingly as they, as data becomes available,
it just makes sense for historians. That if you’re, this is in your field,
it relates to your work, that you have to do something with it. And so
I think just at that really basic level kind of incorporating these new methods
and technologies is going to, yeah. It’s something that we’re, we’re just not
going to be able to overlook anymore.>>So I, I, have mixed, mixed feelings about this because I, I,
there’s this whole fourth paradigm data, data driven discovery and
it is very exciting and, and, and there’s no doubt about it, it has,
it’s very legitimate, but it doesn’t mean that all findings are correct and
there’s a lot of problems with this. So, it, it, one has to, as,
as statisticians, we’re trained to be incredibly skeptical and that should
only get heightened by everybody. You know, some of the stuff we
saw earlier on today, so re, relationships that Google finds, we’ve
got to be very careful about what are, what are the actual real
confounding variables and we’ve got to be able to validate these
conclusions on another data set, okay. And we’ve go to figure out and, you know,
so, if, if, we will always find the biggest correlation, there has to be
our biggest correlation period, okay. And it may or may not be significant but
it, how many tests have we done? So one of the things that we talk about
in statistics is multiple testing. So, you know, you were talking about doing
multiple te, you know, the reason we have these repeated measures is because
you can’t do a hu, a thousand tests. At 5% level and
expect them to get a 5% answer and, and if this is all mumbo jumbo, it should be. Okay?
That’s because, that’s just nonsense. You know, in terms of,
this is all how we do things. It’s not qualitative. So, so you have to be able to figure out
what, what, you know, can we validate this somewhere else, so they can be hypothesis
generating techniques, not validation techniques, unless you go to a different,
and we’ve got all of these other issues. And one of the biggest problems we
have is all of this data coming from. And I just gonna pick on
this because it’s easy. Okay?
It’s Facebook or history. And when you read that,
you read certain texts. These are biased samples. These are not the entire population. You better know who you’re
making inferences about. So everyone should be skeptical,
at the same time that’s how we’re going to actually find
new things, and then we go and study them. So, instead of us coming up with
a conjecture and finding data and then figure things out,
we can find the conjecture and data, then go find other data. And then maybe find, maybe confirm things. In my opinion. [LAUGH]
>>I guess this is question about like the data science initiative. So one of the things that we find
is that different disciplines have different measures of what
they consider success. So, some areas, you know,
it’s x number of publications. Some, some areas it’s like
a book written and published. So, when you bring together people
from different disciplines, what are your thoughts on like,
how do you, how would you measure say,
student in this initiatives like success? How do you determine like, and
more importantly like, if you decide it’s, you know, publications,
where do you send those papers? And I guess it’s more a question about do you have thoughts on how to handle this?>>[INAUDIBLE]
[LAUGH]>>So you asked about the data science initiative. I tell you, I,
I’ve come with extreme prejudice. I, I was at Bell Labs, which was the,
which is the research arm of AT&T and Lucent Technology, which was a big monopoly of many
years in the telephone industry. We had a wonderful research
where the laser and all these da, and and
transistor came out of. We were explicitly told, our aim to
actually looking for ten year results. Ten years in advance. And we were so, you know, if you wrote
a paper by yourself that was good. You wrote a paper with your colleague
department that was better. If you wrote a paper with a colleague in
a different department, that was success. Okay, so we have a real, that’s a very different culture
in academia than in Bell labs. And, and IBM and other places like this
and we’ve got to, got to deal with that. It’s a real concern. A very, very real concern. It goes back to what Michael
was saying like, you know, do I want these different people. Depends where they’re going. If’ they’re going into industry
then they’re successful, if they become very broad. And, and that, and that’s useful. One thing we can so I, I, I don’t, I,
I can’t answer that because I’m because it’s somebody else whose gonna evaluate
these students and faculty and so forth. But it is one of those
things I’d like to get past. I’d like to get past these ridiculously
blunt measures of what of, of metrics. Hey, we can do that with data. Okay and we can.
We’ve gotta start doing this data metrics and all metrics,
all these different things can be done. But it’s a cult, it’s a cultural change
and what we actually appreciate but I’m much more concerned about, almost
everyone I know wants to do good work and not, is less concerned
about being evaluated. And I just think if we do that,
good things will actually happen. But, we can do is the data science
has actually volunteered and Carl Steiner was pioneering this. We can help evaluate students
who you can’t evaluate because they’re doing different things. So if you are somebody in history, somebody who is doing a a,
a research that is very quantitative. We can actually help to say that person
is not trying to pull your leg and just hasn’t done anything for
the last year. To that, I can help a little bit.>>And different and
departments can try to rise to the challenge of actually writing
out standards of scholarship that include broader recognition
of what counts as production. You know?
I mean, in anthropology and cultural
anthropology we actually did this, where we defined things like how a book or
foreign language publication counts as scholarship in
a way in that other fields it might not. And this was partly because as
a discipline that really spanned this qualitative quantitative divide and single
author multi-authored, it became very important to speak across the isle and
establish what the expectations were. And this has all sorts of really
interesting effects to actually say that like, producing a film can count as a full
manuscript in our, In our department, producing media piece, if it can be
evaluated and argued that it’s of the length of a monograph otherwise it
might only count say, as an article. But that we can innovate in terms of
recognizing that producing software, that actually producing and
maintaining a data set, could count as fundamental research
in history not just service. But these things take the work
to think ahead about this and to actually then start trying to codify
it, to recognize it, because in this way, you can save the person three years down
the line when they come up for ten year. And they can point to this document
that they saw when they arrived, and you know, contractually speaking,
now they’re right. I mean, this is, you know,
it’s kind of an interim solution. But I think it’s worth as we all talk and get excited about new
forms of scholarship. To think about where we can
also put that on paper that, you know,
in our discipline now this is research. It might have looked liked service before,
but now it is research, etc. Those kinds of things I think can be
really interesting to try to explore. You might meet lots of resistance. But it’s, then at least the conversation’s on the
table explicitly, rather than implicitly. And especially for junior scholars, having
it there explicitly would be a good help. Would be very helpful to them, because
they might be getting mixed signals and as we know,
that can cause distress [LAUGH]. Time for one more question. Or let’s have it offline and
we have a small break.>>Till 3:45.>>Till 3:45.>>[APPLAUSE]

Leave a Reply

Your email address will not be published. Required fields are marked *