Web archiving
As part of the Magazzini Digitali service, the National central library of Florence gathers, conserves and provides permanent access to internet content of Italian cultural and historical interest.
The project
Websites and the documentation they contain are considered digital “ephemera”, and anyone who has surfed the internet will have experienced “broken links” and the resulting 404 error. However, it is undeniable that they have also become an essential source of information for contemporary history and culture.
On this basis, in 2018, as part of the wider-ranging long-term service for conservation and access to digital publications, the BncF launched a web archiving programme similar and complementary to the efforts of large-scale organisations dedicated to memory from all over the world.
On the basis of the provisions of the law regarding the “Legal deposit of documentation of cultural interest destined for public use” (Law 106/2004 and Presidential Decree 252/2006), the main focus is on the gathering of:
- documentation and websites that ensure continuity for the previously established collections, including on traditional media and via traditional forms of technology.
- documentation and websites concerning the scientific output of universities, research centres and cultural institutions.
- documentation and websites created and published on the internet by public entities.
The library uses the Archive-it platform for harvesting and accessing archived websites.
Save for specific requirements, harvesting usually takes place a couple of times a year.
How to participate
In Italy, the legal deposit of documents disseminated via digital network is not a mandatory requirement and therefore participation in the programme is on a voluntary basis.
Requests to participate should be sent via email to bnc-fi.magazzinidigitali@cultura.gov.it, and if the relative resources are deemed to be suitable for archiving, applicants will be required to fill in the dedicated online form.
The library reserves the right to subsequently contact participating organisations and institutions to define harvesting and assess technical requirements.
Technical requirements for harvesting
In order to allow harvesting (automatic collection), websites must:
- grant access to the Archive-it crawlers: archive.org_bot.
- in cases in which robots.txt exclusion protocol is set up, provide for exceptions for the aforementioned bots.
The following measures are also recommended:
- Bring together publications of cultural interest on a single page and/or directory within the website (e.g. “Publications”, or in uniform subsections, e.g. “Mobility” > “Documentation”, “Social services > “Documentation”), which not only facilitates searches and access by general website users, but also speeds up the selection, harvesting and application of metadata to material for conservation purposes.
Sitemap protocol may also be used to provide Archive-it crawlers with more precise indications on which pages should be scanned; - Use uniform file naming that reflects the content and/or other related documentation (e.g. different series of a particular magazine, issues in a series…).
- Avoid publishing multiple versions of files in different areas of the website, favouring the use of internal links.
Limits to harvesting
- The harvesting of websites or sections of websites to which access is restricted is possible if the BncF is provided with the relative credentials; harvesting is not possible if the website uses CAPTCHA.
- Websites and/or sections of websites produced with Flash or JavaScript, which are notoriously difficult to index by search engines that do not recognise languages other than HTML, cannot, for the same reasons, be harvested by current technology. The use of these platforms is not recommended.
- Documentation provided for viewing via an integrated viewer on the website (e.g. Sfogliami.it, PressReader, etc…) may be harvested but is almost never available for viewing with the current replay systems used by Archive-it.
In cases in which, for reasons of access, these platforms need to be maintained, a downloadable version of the documentation or an alternative deposit method should be provided.
Website archivability
The library has drawn up a list of criteria for Website archivability, drawing on good practices widely used by organisations dedicated to memory all over the world.
These criteria will become required measures with the implementation of legislation regarding the legal deposit of documentation disseminated via digital network.
Access to collections
Archived websites are organised into collections by theme, as part of the wider-ranging BncF collection on Archive-it::
- Association
- Domain .it (2006)
- Research organisations and institutions
- Cultural organisations and institutions
- Institutions belonging to the Ministry of Culture (previously the Ministry of Cultural Heritage and Activities
- Open Access Books
- Open Access Journal
- Professional registries and associations
- Public administration
- Local history
- News publications and websites
When filling in the form to request participation in the service, website owners can choose whether to allow public access from any online terminal or to restrict access exclusively to the BncF internal network.
Useful links
Online contributions in Italian
The following list is partial and is constantly growing.
2023
- Storti, Chiara (2023). “Resource not found”: cultural institutions, interinstitutional cooperation and collaborative projects for web heritage preservation. JLIS.It, 14(2), 39–52. https://doi.org/10.36253/jlis.it-533
- Allegrezza, Stefano. 2023. “Web e social media come nuove fonti per la storia.” Umanistica Digitale, January, 137-162 Pages. https://doi.org/10.6092/ISSN.2532-8816/15665.
2022
- Luigi Giungato, Memorie dal sottosuolo digitale: frontiere e prospettive del social web archiving in Agenda Digitale, 28 luglio 2022
2020
Web archiving e pandemia
- Lorenzana Bracciotti, Pandemia e web archiving. Conservare le fonti online #igiornidellapandemia in Il mondo degli archivi, 2 maggio 2020
- Archiviazione permanente dei siti italiani sul Coronavirus: call to action – BNCF, 31 marzo 2020
2019
- Chiara Storti, Web archiving, “sfida culturale”: il servizio della Biblioteca Nazionale Centrale di Firenze in Forum PA, 12 giugno 2019
- Costantino Landino, Lina Marzotti, Perché dovremmo pensare al web archiving in Forum PA, 20 marzo 2019
- Lorenzana Bracciotti, Il Web Archiving. Conservazione e uso di una nuova fonte in OS – Officina della Storia, 10 gennaio 2019
2018
- Costantino Landino, Strumenti per il Web Archiving: alcune soluzioni in Il mondo degli archivi, 6 luglio 2018
2006
- Bergamin, Giovanni. 2006. “La raccolta dei siti web: un test per il dominio ‘punto it.’” DigItalia 2 (0): 170–74. http://digitalia.sbn.it/article/view/306.
Contacts
Enquiries can be made by writing to or calling:
Chiara Storti | Resp. Magazzini Digitali e Web Archiving
bnc-fi.magazzinidigitali@cultura.gov.it
chiara.storti@cultura.gov.it
tel. 055 24919 73