An atlas of orchids distribution in the Campania region (Italy), a citizen science project for the most charming plant family

A database of the orchids of Campania has been built up since 2000 with data collected for academic purposes, research projects and, more recently, with the contributions of enthusiastic amateurs and scholars, thus becoming a real citizen science project leading to the realization of an online Atlas (http://www.floracampana.unina.it/Orchidee/index.html). In this paper, the collection and storage of the data and the synthesis of them are presented. On 31 December 2016, the database accounted for 14680 records from more than 30 contributors relating to 126 taxonomic entities (species, subspecies, hybrids and a few “sensu lato”). The bibliographic records number 3663 (24.9%) and cover a time range of four centuries (from 1616 to 2016). Amongst the 11017 field records (observations), more than 99% are geo-referred and are “punctual” type (precision less than 100 m). The spatial and temporal distribution of the data has been analysed and biases have been underlined. The observations show a clear difference in the study effort year by year but always with a significant contribution of the citizen scientists. The analysis of the spatial distribution shows that the observations are preferably collected in protected areas, around main roads and on the roadsides. Many cells of the grid still lack information and these should be the object of future research.


An atlas of orchids distribution in the Campania region (Italy), a citizen science project for the most charming plant family
Antonio Croce 1 , Roberto Nazzaro 2

Introduction
Although the expression "citizen science" is becoming a hot topic and the phenomenon itself is facing a great development worldwide (Bonney et al. 2009, Silvertown 2009, Conrad and Hilchey 2011, Gura 2013), it has never diminished in the study of orchids as generations of enthusiastic amateurs and non-professional researchers have always significantly contributed to the knowledge of this charming plant family (van der Cingel 2001) in many fields (cultivation, morphometry, taxonomy, systematics, biogeography, plant-pollinator relationship etc.).The contribution of citizen scientists is becoming essential for projects that require the collection of large amounts of data over long periods or in large areas such as conservation ecology projects which need mapping and monitoring.
In Italy, projects involving researchers and volunteers in the collection of distributional data of wild plants and the online atlas have been started, for example, by the Wikiplantbase project in Tuscany, Sardinia, Liguria and Sicily (Bedini et al. 2016, Peruzzi et al. 2017).For decades, many associations and groups of non-professional researchers and volunteers have been collecting data on the distribution of plants in large areas especially in north-eastern Italy.Over a long period, the Trentino Province has been explored and the cartography of the native orchids has been significantly advanced by taking advantage of the participation of more than 200 contributors (Perazza and Perazza 2005).The methodology adopted in that project can be assumed as a real guideline for in-field data collection.
Starting from the last years of the 20 th century, many research projects on the distribution of native orchids have taken place in the Campania region (Southern Italy) from the grid maps prepared by Büel 1982 followed by Nazzaro et al. 1996, to the recent surveys which put a higher precision (GPS data) on the basis of the cartographic restitution (Nazzaro et al. 2002, Croce 2012;Croce and Nazzaro 2012).The cartography of the orchids family is important both for obtaining more complete base knowledge and to acquire distributional and ecologic data for conservation purposes.Finally, the orchids living in an area are useful biological indicators of the environmental quality (Bianco 2012).
The compilation of a catalogue with the distribution data for orchids in Campania, started at the beginning of 2000.In 2005, it accounted for 6960 records (Nazzaro et al. 2006) from field surveys in the Cilento National Park (Prov.Salerno, southern Italy), the Taburno-Camposauro Regional Park (Prov.Caserta, southern Italy), Roccamonfina volcano, Vesuvius and other small areas.Since then, many contributors have taken part in the implementation of the database and also, as a result of the development of the internet and of devices such as smartphones and tablets, which have allowed faster communications and data collection and sharing, as well as the increasing number of nature lovers attending websites, forums and social networks have all led to a significant rise in the numbers of contributors.Contributors share their data in very different ways, from a unique photograph of a plant needing help for its identification to the completion of their own private observation datasets collected over several years of research in the field (e.g.Sorgente and Croce 2016).
In this study, the structure and main features of the database and a synthesis of the data collected are described.Since sampling bias is a problem affecting large databases concerning plants and animal distribution (Kadmon et al. 2004), simple analysis to evaluate the temporal and spatial bias in the sampling patterns have been undertaken.

Aims and geographical limits
The database gathers occurrences of native orchids mainly in the Campania region although data falling within a few kilometres out of the administrative boundaries of the region have also been accepted.The Campania region stretches along the southern Tyrrhenian Sea and covers an area of 13595 km 2 .The landscape varies from large coastal plains to the sub-apennine ridges of Lattari and Cilento mountains which extend from the coast inland and the Apennine that rises 2000 m a.s.l. in the Matese Mountains.The large volcanoes of Roccamonfina, Campi Flegrei and Vesuvius lie on the northwest.The islands of Ischia, Capri and Procida also belong to the region.The territory can be divided into 5 provinces: Avellino, Benevento, Caserta, Naples (from 2014 "Metropolitan City of Naples") and Salerno.

Taxonomic scheme
The identification of the recorded entities is given following a taxonomic scheme mainly derived from GIROS 2016.The nomenclature is also checked on "The Plant List" site (The Plant List 2013).When the subspecies is not identified or its identification is doubtful, records are labelled as "s.l." (sensu lato).The original names are however kept and the nomenclature will be an object for study for a future regional checklist.

Structure of the database
The first version of the database was a simple set of MS Excel sheets.Recently the records have been stored in a MS Access database, allowing fast and easy queries on the data, the update of the nomenclature, the management of the bibliographic records and data export in a large number of formats.
The database is made up of 4 related tables (Suppl.material 1).Two kinds of data are recorded in the database: observation and bibliographic.
An observation datum refers to a unique record of presence of a species observed on the ground.Contributors specify the Coordinate Reference System adopted in order to project the data properly on the map.The best precision level (preferable measured with a GPS) is requested from the volunteer contributors.The coordinates are however checked and validated (always converted to metric units) and observations can however be classified into 4 kinds of data according to the level of precision: Punctual: the record has a precision less than 100 m (derived from GPS or topographic maps); 1 km 2 grid cell centroid: data has a mean precision, allowing its position in a 1×1 km cell of the adopted grid; 10 km 2 grid cell centroid: data has a low precision, allowing its position in one of the 10×10 km cells of the adopted grid; Not geo-referenced: data is too vague or has wrong attributes (e.g. it is in the sea or it is far from the region boundaries if projected) and it is not possible to provide a geo-reference.The adopted grids are referred to the UTM WGS84 33N Coordinate Reference System (EPSG code 32633).
The identification of the observed taxon is often validated with the aid of the referees for the project on the basis of photographs.The first original identification name is kept for possible future re-attributions to other taxa.The full date of the observation, the number of plants in the site and, optionally, other information such as locality, altitude, substrate, vegetation are requested from each contributor.To encourage data sharing each contributor can send his own observation as a message (e.g. a whatsapp message with the position and the photograph of the observed taxon) or compiling an Excel sheet providing all the requested information.The method explained in Perazza 1994, adopted in most of the research projects, is recommended as a protocol for data gathering.
Another kind of data derives from published literature.Bibliographic records include the original taxon names (the one given by the author), locality and date (year) of publication.Any other information, such as date of observation, altitude, habitat or rarity provided by the authors, is also recorded.
The taxonomy of bibliographic records was revised and updated to the adopted taxonomic schemes.Each bibliographic record was then linked, when possible, to a unique 10×10 km grid cell following two steps: 1.A polygon is drawn representing the extension of the locality reported by the author, according to the method used by Santangelo et al. 2008; 2. When the polygon is completely included in a single cell, the record is assigned to that cell.
When the polygon extends, even partially, on to two or more cells, the polygon was assigned to the cell covered to the greater extent by the polygon.When the polygon is too large or too vague (e.g. it covers two or more cells completely), or the locality is impossible to locate, the data are treated as "not geo-referenced".Even the bibliographic records representing a catalogue of the existing literature (i.e. when the author reports a presence on a previously published record), are treated as "not geo-referenced".
Cartographic operations (grid cell assignment, counts, etc.) and maps drawing have been carried out with QUANTUM GIS 2.14 "Essen" (QGIS Development Team 2016).

Bias analysis
Since the presence of the orchids has not been recorded following a randomised sampling design, the database could be affected by sampling bias.Thus, the punctual georeferenced observations were analysed in order to assess their bias in time and space and specifically fit into two possible distribution patterns: a. the observations were collected preferably in areas perceived as relevant for their natural heritage (e.g.protected areas); b. contributors explored mainly the areas around the roads so that observation sites fall into areas easy to reach and to explore by car ("road effect"), mainly located near the roads or on the roadsides.
To test this hypothesis, from the punctual type observations, a set of localities (with unique coordinates) was extracted.A null model was then created generating the same number of points as the localities, randomly distributed in the region area.
The differences in the position between the observation localities and the randomly distributed ones, inside or outside protected areas (parks, reserves and Sites of Community Importance -SCI sensu Directive 92/43/EEC) were tested using the Chi-Square test.
The road effect was analysed from two points of view.To test the "highway effect" (Soberón et al. 2000), a 2 km wide buffer was drawn around the State highways and the motorways and the differences in the number of localities falling inside or outside the buffer, actual versus randomised, were tested.Such a buffer measure is the same one used in surveys analysing the roadside bias (Hijmans et al. 2000) and, moreover, the buffer covers about one third of the total regional surface.
On the other hand, another source of bias is the so called "roadside effect" (Kadmon et al. 2004) due to the distribution of the observations along the roads accessible by car.Therefore, from the shapefile of the roads, a distance raster was built and used to give to each point (actual or randomised) a distance from the nearest road.Then the distances were grouped in buffers of different width and the differences tested in the distribution of the observation localities against the same number of localities randomly distributed falling inside the same buffer, using the Chi-Square test.In addition, the distribution of the overall localities to the randomly distributed points was compared using the Kolmogorov-Smirnov test.The highways, motorways and other roads were extracted from the shapefile of the roads of the Campania region (Geoportale Regione Campania: http://sit.regione.campania.it/portal/portal/default/Cartografia).The statistical analyses were undertaken with PAST (Hammer et al. 2001).Geographic analysis have been carried out with QUANTUM GIS 2.14 "Essen" (QGIS Development Team 2016).

Results and discussion
On 31 December 2016, the database accounted for 14680 records.
Even if the present paper does not aim to discuss taxonomy or nomenclature, the object of a forthcoming critical check-list, the 14680 records can be provisionally referred to 126 taxonomic entities amongst which are 94 specific or infraspecific entities (including some species sensu lato) and 32 hybrids (Suppl.material 2).

Literature data
Bibliographic records account for 3663 (24.9% of the total archive) data derived from 68 different bibliographic sources dated from 1616 to 2016 (Suppl.material 3).They report a total of 83 entities including only 4 hybrids (Suppl.material 2).Amongst them, some are errors according to other authors (e.g.Orchis militaris), doubtful citations (e.g.Dactylorhiza majalis or Orchis patens) or species reported for larger areas only partially falling into the present boundaries of the Campania Region (e.g."Lucania") and therefore will probably be excluded from the Campania Checklist.The most cited species are Anacamptis morio, A. papilionacea, Serapias lingua, A. pyramidalis, Orchis italica and O. provincialis.
Only 128 bibliographic records were not geo-referenced.The 3535 geo-referenced bibliographic records cover 99 grid cells out of the 183 covering the Campania region (Figure 1) with a maximum of 339 and a mean of 35.7 records per cell.Most citations refer to the Southern areas (Cilento, Alburni and Vallo di Diano National Park) and to the Sorrento Peninsula.Considering the distribution of the species reported in lit- erature in the 99 grid cells, the most common species are Serapias lingua, Anacamptis pyramidalis and Ophrys apifera.
The richness of taxa (Figure 2) in the cells reaches 51 species on the Alburni Mountains and is relatively high.This is also the situation in the cells corresponding to areas studied over a long period such as Capri Island or with a special attention paid to the Orchid flora as Lattari Mountains and the Taburno-Camposauro complex.

Observations
Observations account for 11017 records from more than 30 different contributors (academic researchers, scholars working on bachelor/PhD theses, postdoctoral research- ers, members of naturalistic associations and regular or occasional volunteer contributors).The quality of the observation data is very good since 99.3% are "punctual" type data even if they sometimes lack some important features: the number of plants is missing in the 31.7% of the total observations while on the opposites the full date, very important to validate the correct identification of some species (e.g.Ophrys taxa), for the monitoring activities or to describe the phenology of the species, is present in the 95.5% of the records.
A total of 110 taxa have been observed and 33 of them are hybrids (Suppl.material 2).The most observed species are Orchis italica, Anacamptis morio and Dactylorhiza maculata subsp.saccifera.
Out of the 183 grid cells covering the Campania region, 109 cells have at least one observation (Figure 3).In these cells, the number of records varies from 1 to 2228, with a mean of 101.3 and there is a high number of cells with less than 10 records.The most widespread species, according to their distribution in the grid cells are Orchis italica, Serapias vomeracea subsp.vomeracea, Anacamptis morio and Anacamptis pyramidalis.
The richness of the cells is higher in many cells for which no data was available in literature and reaches maximum values of 63 entities, including hybrids (Figure 4).

Study efforts and bias
Looking at the collecting effort over time, observations are very heterogeneously distributed (Figure 5) with a peak in 2001 -2002.The impact of the volunteer contributors (citizen science projects) is always significant and account for the 39% of the total field records.The spatial distribution is also biased since the records are heterogeneously distributed in the five provinces (Tab 1).More than 75% of the observations fall in the Caserta and Salerno provinces.This is mainly due to the presence of important study areas (Matese Mountains, Roccamonfina volcano and Cilento-Vallo di Diano National Park) but when the extensions of the provinces are considered, surprisingly the province of Salerno is still not sufficiently explored, with less than one record per square kilometre.Conversely, for the small province of Naples, there is an average of 1.51 records/km 2 .The bias in the spatial distribution of the data can also be a consequence of the well known "botanist effect" phenomenon (Moerman and Estabrook 2006) since the most explored provinces are also the most inhabited and where most contributors to the project actually live.The observation records were clumped in 4037 different localities (points with different coordinates) and the same number of points was randomly generated using the QGIS random point tool.The distribution of actual localities was biased since 67% of them fell inside 27% of the region included in protected areas (χ 2 = 1973.65,DF = 1, p<0.001; Figure 6).On the other hand, less than 26% of the records were located inside the 2 km wide buffer around the main roads (Figure 7).The value of χ 2 (83.25,DF = 1, p<0.001) confirmed that the observation localities were preferably distributed far from the highways and there was not a clear "highway effect".On the other hand, the "roadside effect" was an important bias since the number of observations near roads was greater than the number expected from a spatially random distribution (Table 2).The differences were very significant in the buffer 0-100 m (χ 2 = 19.72,DF = 1, p<0,001).The Kolmogorov-Smirnov test confirmed this bias since the observed distribution was significatively different from the random distribution (D = 0.044, p<0.001).

The online atlas
The database information has been used to produce an online atlas (http://www.floracampana.unina.it/Orchidee/index.html)which includes the distribution maps on a UTM grid with 10 km × 10 km cells, some photographs and other information about the presence of native orchids in Campania.The sites are periodically updated and, at present, it considers 75 entities observed at least one time and 2 others whose presence is reported for the region in literature but there are no records in the database i.e.Orchis patens (Del Guacchio 2010) and Epipactis meridionalis (Acta Plantarum 2007, GIROS 2016).

Conclusions
The database and the online atlas are intended as a means for promoting the aggregation between people interested in nature and avoiding (or limiting) the dispersion of distribution data on orchids.These will also represent a useful tool for the scien-  Unifying the large amount of data collected in research projects, with the numerous but sporadic contributions from volunteers, may contribute to the avoidance of data dispersion and may place information in a wider time and geographic context.
Nevertheless, the project has so far generated some criticism which should be resolved in the future because there may be a source of bias in the data.Many records are incomplete although more structured research projects for Vesuvius, Roccamonfina or Cilento used a protocol for data collection (e.g.Perazza 1994).Volunteer contributions, as a matter of fact, often miss important data about ecologic or conservation features (habitat, number of plants, phenology etc.).Sharing of the progress and the analysis results could be a significative contribution to the volunteers' awareness about the importance of recording complete data.The bias analysis performed on the data spatial distribution could give important directions about where (and when) to sample.Actually, the distribution of the localities for observations is highly biased and the question whether the "biodiversity hotspots" are really the richer in species or simply the richer in observations still needs an answer.Contributors tend to sample inside the protected areas but often not too far from their car.So sampling is easier and faster but biased.Finally the systematic scheme can be a source of conflict since there is not a unique and dominant point of view.For this reason and to allow an easy switch from one scheme to another, the documentation of the observations (i.e.photographs and, secondarily, accurate descriptions of the plants) is essential for the correct identification or re-identification of the records.
In addition to the intrinsic value of distribution data, the following potential of the project can be highlighted: -the development of a naturalistic and scientific culture; -the implementation of the knowledge of rare and protected species and the use of orchids as environmental indicators; -referring also to the previous point, the coordinates collected with high precision and accuracy can be useful in the monitoring activities required for the species listed in the Annexes of the Habitat Directive (Council Directive 92/43/EEC on the conservation of natural habitats and of wild fauna and flora).Amongst the plant species listed, Himantoglossum adriaticum has a very widespread distribution throughout Italy and would require significant monitoring efforts (Gargano et al. 2016, Ercole et al. 2017) -the networking of people sharing their interest towards orchid family and nature can be a model for a sustainable use of the landscape; -a structured database can be integrated into other collections of data both in "horizontal" networks (e.g.floristic or biodiversity databases) and in "vertical" networks (e.g.national and international orchid databases).

Figure 5 .
Figure 5. Distribution of the observations over time, collected in research projects or by volunteer contributors.

Figure 6 .
Figure 6.Distribution of the localities of the observations (punctual type) inside (black circles) and outside (open circles) protected areas (grey areas).

Figure 7 .
Figure 7. Distribution of the localities of the observations (punctual type) inside (black circles) and outside (open circles) a 2 km buffer around the main roads (bands).

Table 1 .
Observations and data for each of the 5 provinces of Campania Region and for outer areas.

Table 2 .
Number of actual observation localities and randomly placed localities (null model) falling inside the buffers around the road and the calculated chi square value.*** p<0.001Nowadays, admittedly, the knowledge on orchid distribution in the Campania region is far from being satisfactory since large areas (or cell grids) remain unexplored.However, this database and the related atlas may represent the first step towards the increase in fine-scale knowledge of orchid distribution in this important Mediterranean region.