In The Field: National Digital Newspaper Program


the National Digital Newspaper Program is a joint program between the National Endowment for the Humanities and the library of congress as well as state agencies that participate in the program ok historic newspapers are important resource for any kind of research and education purposes the they’re often referred to as the first draft of history the national digital newspaper program was begun in 2004 as a joint program between the National Endowment for the Humanities and the library of congress to build a sustainable database of digitized historic newspapers to make available to the general public for free and open access today we provide 11 million pages in that open database and this comes from a website called chronicling America at chronicling America.loc.gov the national digital newspaper program is making our history available class it’s a rich and diverse history since we have equal content from each state partner people across the country can truly see themselves represented in our nation’s history regardless of where they grew up or where they live now they may even be able to find out what their grandmother was doing in iowa in 1890 chronicling America is an open resource that’s available on the web to anyone at anytime around the world they can access it in their school in their library or in their living room. NDNP shows a rich public sphere with many voices so the fact that it’s free and open access means that it’s available to members of the public community college professors k through 12 students genealogists really anybody with an internet connection and we know that it’s not just used within the United States used around the world by people in countries like Great Britain Australia Canada and India those are largest international users each state that is participating are funded by the National Endowment for the humanities to select and digitize from their newspaper collections important newspapers from their historic collections that represent their state in a national collection the national digital newspaper program was really built on something called the United States newspaper program which was a cataloging and microfilming program and it was just like the NDNP, it was a nationwide program and it was enormously successful because you’re talking about all 50 states and trust territories involved so you have hundreds of you know just thousands upon thousands of catalog records but you also had all this microfilm that was created during this program so the potential energy for the NDNP came from microfilm which is really neat because people usually very frustrated by microfilm but those little pictures created the possibilities for the NDNP you want to have geographical representation you want to have political representation you want to have a sort of social and cultural representation and you do the best you can to cast the widest possible net over your particular State Virginia of course being the one that I’m involved with so we recently expanded the date range of chronicling America to include any public domain newspapers from 1690 to 1963 so after state partners digitize newspapers you’ll be able to look at stories about the American colonies the Constitutional Convention the great depression the Second World War the early years of the civil rights movement really anything from that span of years the library of congress plays the role of setting the technical specifications for the program managing the data and aggregating it and making it available to the general public for the most part the newspapers that are included in the national digital newspaper program are already preserved in microfilm and we work with microfilm primarily as it’s easier to scan and more efficient to create digital images than from paper although some of the material in the collection is actually from paper where microfilm doesn’t exist the materials on film are scanned usually by external contractors in a high-speed microfilm camera that is then collated and added metadata is added Our work starts before any materials arrive here at creekside digital we will work with our NDNP awardees to identify which particular titles they’re going to be sending out to us for digitization if it’s microfilm we will slot that into the queue for digitization we will first for each batch scan a technical target that is on microfilm and we will analyze that target with special software that ensures that our micro film scanner is operating correctly for direct capture materials these are source paper newspapers that we receive here at creekside digital we do the same thing we use a different type of target a reflective target and we will image that with our camera system and again run it through special software that will ensure that our camera system is operating at a required level of performance as mandated by these projects then we will actually start to perform the metadata markup so we have special software that will run these page images through that will it will analyze the layout of each page it will show us where all the columns of text are where the headlines are there any advertisements are any illustrations are and all of those elements on the page down to the individual word level will be marked up essentially the computer will create a series of bounding boxes that encapsulate these individual page elements text will be identified separately it will have optical character recognition performed on each individual element of text that will make the text machine searchable and therefore keyword searchable OCR optical character recognition of course is the the program that allows the computer to read a newspaper that’s where really where the rubber meets the road it’s great having pretty pictures of a newspaper but it’s the text searchability if I may use that term that really makes the NDNP an exciting resource you’re now able to search text probably within not only articles but within advertisements within auction ads whatever and that allows people to really dig into the data if you will the neh and the library of congress consider chronicling America to be a significant humanities data set and so the library of congress has created a well-documented API or application programming interface as a way for researchers to access all the data of the entire corpus of newspapers here so so far researchers have done some really interesting projects they studied the way that newspapers reported on the spanish flu epidemic of the early 20th century they’ve studied the way that news went viral across the country before the age of the internet and we’re really excited to see what other projects folks can do accessing the data by the API. The NDNP is improving access and it’s actually doing some preservation work as well because we’re using preservation quality microfilm and preservation quality microfilm can last literally even centuries if properly stored so that’s really a nice feature chronicling America has newspapers in German French Spanish finish and italian and these are languages that most newspapers repositories don’t collect with more languages to come you can see all kinds of things in these newspapers so for example the German immigrants who were the most prolific publishers and non-english language newspapers in the country until the First World War you can see spanish language newspapers from the southwest border lands in New Mexico and Texas and you can even see what Italian anarchists were saying in vermont in 1917 once the content is digitized it is usually sent to the individual awardees who are sponsoring the digitization of that material and they perform quality review on it and make sure it meets the specifications of the library Congress has already set it’s then stored on hard drives and delivered to the library of congress and hard drive form where we ingest it to our systems and then into the website and make it available to the general public all the content is stored on-site at the library of congress currently that’s about 600 terabytes of data that is the 11 million pages that we’ve made available to the public and by digitizing the material were able to provide much greater access to it and much greater access in new ways providing that keyword access allow very much deeper exploration of the content in the newspapers and has ever been possible when the material was only in print or in microfilm we can do different kinds of searching we can do different kinds of limiting of the search results by states or time and we get this fascinating aggregate and cross section of information about particular events and time places in time that we would not be able to have in other than any other format and it has really changed the nature of certain kinds of research the great thing about neh was that they came up with the United States newspaper program and then as a result of that they came up with the national digital newspaper program these are programs nationally funded that never would have worked otherwise that is where a collaborative effort is born out of a centralized agency providing funding and support and know-how and a vision on how this program should be designed and implemented and people all over the world are benefiting from it and I’m telling you it would not have happened any other way without the support of neh obviously the national digital newspaper program would not be possible without the participation of the National Endowment for the Humanities it helps us to work with our academic institutions and other participating institutions to get this work done properly when the neh supports these efforts they absolutely make participation possible for the entire 50 states and ensure that all the states have their content represented on chronicling America a lot of the materials we work with every week are very fragile they’re in the process of deteriorating without NEH’s support we may not get an opportunity to digitize them the information they contain might be lost forever the interesting thing about newspapers is that they’re sometimes not they’re not a book and they’re not exactly a periodical and so in libraries and archives are often treated as ephemeral here today gone tomorrow and so the once you think what should we keep this should we toss it and then a hundred years goes by and you got a pile of this stuff and so the great thing about the NDNP and previous that the USNP is that it really focused on a format that was sometimes treated as ephemera and possibly then that could possibly disappear forever historic newspapers contain information on so many different subjects in American history so everything from politics to sports to health to the arts music poetry etc and it’s not just news stories that are contained in these newspapers we have poetry and sermons that were reprinted frequently in the newspapers we have ads that are really interesting to look at we have a lot of different kinds of visual representation so whether it’s drawings or photographs you can really see the way people saw their world in these newspapers the wonderful thing about this project and the chronicling America website is that it brings together history from our nation all over the country we have the same events that are covered by papers in California by papers in Texas papers in Minnesota papers in Philadelphia and you get this excellent view of different perspectives on the same things across the country Thanks to NEH and LC & NDNP, we were able to develop a certain amount of sustainability and develop our own program we have contributed at least 400,000 pages i think to date to chronicling America but through that we also developed our own database called Virginia Chronicle and in that were allowed to add out of scope papers By out of scope I mean there are papers we wouldn’t be able to contribute chronicling America for one reason or another and so now we have about 800 thousand pages were pushing to a million very soon and so that includes all of chronicling America titles and more and that’s just this would not would not have happened if it wasn’t for the NDNP every facet of American life is captured in these newspapers so we see so many different use cases where people leverage this content for variety of different things once it’s digitized and made available it’s interesting to see how even the concept of a newspaper itself has changed you know over the course of materials that we get in here we get in everything from modern newspapers that are still being printed all the way back to you know some of our nation’s earliest newspapers and everything in between and to see how the country’s changed you can you can literally see the country going through changes as you work through digitizing some of these longer running titles and it’s very very interesting the advertisements are interesting you can see what people bought a hundred years ago and what it costs and stuff that’s not available anymore that’s obsolete or just not made or not healthy and not sold anymore all those things again they become a touch point to where we are now in the course of time and what life was like a hundred years ago you can see what underwear looked like in 1912 from those ads you can look up a different holiday celebrations which were really different in other areas in American history like how people attributed occult powers to cabbage in halloween in the 19th century you can even look up sports history like when the cubs first won the world series in 1907 a perfect example the power of the chronicling America is that it can be used as a primary source for teachers k through 12 college university professors but teachers particularly can use this as a direct students as a primary source and it also gives them a sense of news as it happened and I think that’s really a great that makes it a great resource

Leave a Reply

Your email address will not be published. Required fields are marked *