Contributions from iNaturalist national sites

Data analyses

Author

Florencia Grattarola

Published

July 1, 2025

The iNaturalist Network is a localised experience that is fully connected to the global iNaturalist community. Network members are local institutions that promote local use and facilitate the use of data from iNaturalist to benefit local

The aim of this report is to give an account of the importance of the iNaturalist network members by analysing the number of records for each country.

Code
library(MASS)
library(randomForest)
library(stargazer)
library(httr)
library(jsonlite)
library(knitr)
library(rgbif)
library(ggrepel)
library(extrafont)
library(tidyverse)  

options(knitr.kable.NA = '') 
options(scipen=999)

iNaturalist National Sites

Code
iNat_network <- 
  tribble(~'site', ~'site_name', ~'site_id',
        'Global', 'iNaturalist', 1,
        'Mexico', 'iNaturalistMX', 2,
        'New Zealand', 'iNaturalistNZ', 3,
        'Canada', 'iNaturalist.ca', 5,
        'Colombia', 'NaturalistaCO', 6,
        'Portugal', 'BioDiversity4All', 8,
        'Australia', 'iNaturalistAU', 9,
        'Panama', 'iNaturalistPa', 13,
        'Ecuador', 'iNaturalistEc', 14,
        'Israel', 'iNaturalistil', 15,
        'Argentina', 'ArgentiNat', 16,
        'Costa Rica', 'NaturalistaCR', 17,
        'Chile', 'iNaturalistCL', 18,
        'Finland', 'iNaturalistFi', 20,
        'Sweeden', 'iNaturalist.Se', 21,
        'Spain', 'Natusfera', 22,
        'Greece', 'iNaturalistGR', 23,
        'Guatemala', 'iNaturalistGT', 24,
        'United Kingdom', 'iNaturalistUK', 25,
        'Luxembourg', 'iNaturalist.LU', 26,
        'Taiwan', 'iNaturalistTW', 27,
        'Uruguay', 'NaturalistaUY', 28)

iNat_network %>% 
  mutate('#'= row_number()) %>% relocate('#') %>% 
  rename(`Site` = site,
         `Name`=site_name,
         `ID`=site_id) %>% 
  kableExtra::kbl(digits=1, format.args = list(big.mark = ',')) %>% 
  kableExtra::kable_material('striped') %>% 
  kableExtra::row_spec(row = c(2,5,8,9,11,12,13,18,22), bold = T, color = "white", background = "#228A22")
Members of the iNaturalist Network. Shown in green are the sites from Latin America
# Site Name ID
1 Global iNaturalist 1
2 Mexico iNaturalistMX 2
3 New Zealand iNaturalistNZ 3
4 Canada iNaturalist.ca 5
5 Colombia NaturalistaCO 6
6 Portugal BioDiversity4All 8
7 Australia iNaturalistAU 9
8 Panama iNaturalistPa 13
9 Ecuador iNaturalistEc 14
10 Israel iNaturalistil 15
11 Argentina ArgentiNat 16
12 Costa Rica NaturalistaCR 17
13 Chile iNaturalistCL 18
14 Finland iNaturalistFi 20
15 Sweeden iNaturalist.Se 21
16 Spain Natusfera 22
17 Greece iNaturalistGR 23
18 Guatemala iNaturalistGT 24
19 United Kingdom iNaturalistUK 25
20 Luxembourg iNaturalist.LU 26
21 Taiwan iNaturalistTW 27
22 Uruguay NaturalistaUY 28

Methods

We tested different explanatory variables and saw which is the model that best explains the values a country has for iNaturalist. Indicators per country were extracted using the WDI: World Development Indicators (World Bank) on the 23rd of May, 2025.

Response variables:

  • total number of records on iNaturalist per country: n_records_inat.
  • number of records from iNaturalist on GBIF per country (proxy for quality, research grade records): n_records_gbif_iNat.
  • number of users recording in the country (this is not exactly the users from the country, but users that have generated records in the country): n_users.

Explanatory variables:

  • population of the country: population (SP.POP.TOTL).
  • area of the country in km2: area (AG.SRF.TOTL.K2).
  • country’s centroid latitude (as a proxy of expected biodiversity): latitude (rnaturalearth).
  • GDP per capita: gdp_per_capita (NY.GDP.PCAP.CD).
  • % of the GDP of the country dedicated to research: gdp_research GB.XPD.RSDV.GD.ZS).

Data extraction

Functions

Code
source('R/national_sites.R')

Data download

Code
America <- tibble(country_name= c('Canada', 'Mexico', 'Brazil', 'Costa Rica', 'Colombia', 'Peru', 'Argentina', 'Ecuador', 'Panama', 'Chile', 'Venezuela', 'Belize', 'Honduras', 'Bolivia', 'Guatemala', 'Cuba', 'Nicaragua', 'Paraguay', 'Bahamas', 'Jamaica', 'Trinidad and Tobago', 'Guyana', 'Dominican Republic', 'El Salvador', 'Suriname', 'Uruguay', 'Haiti'))

America <- America %>% 
  mutate(country_code = countrycode::countrycode(country_name,
                                                 origin = 'country.name',
                                                 destination = 'iso2c'))

America <- left_join(America, iNat_network %>% rename(country_name=site))

n_inat_gbif_country <- recordsPerCountryGBIF(America$country_code)
n_inat_country <- recordsPerCountryiNat(America$country_name)
n_users_country <- usersPerCountryiNat(America$country_name)

area_country <-areaPerCountry(America$country_code)
population <- populationPerCountry(America$country_code)
gdp_per_capita <- gdpPerCapitaCountry(America$country_code)
gdp_research <- gdpResearchPerCountry(America$country_code)
latitude <- latitudePerCountry(America$country_code)

data_variables_America <- left_join(left_join(left_join(
  left_join(left_join(left_join(left_join(left_join(
  America, n_inat_gbif_country),
  n_inat_country), 
  n_users_country),
  area_country), 
  population), 
  gdp_per_capita), 
  gdp_research), latitude)

saveRDS(data_variables_America, 'data/America_data_variables.rds')

########################################################################

Europe <- tibble(country_name = c('Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czechia', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'United Kingdom', 'Iceland', 'Liechtenstein', 'Norway', 'Switzerland', 'Albania', 'Bosnia and Herzegovina', 'Georgia', 'Moldova', 'Montenegro', 'Macedonia', 'Serbia', 'Turkey', 'Ukraine'))

Europe <- Europe %>% 
  mutate(country_code = countrycode::countrycode(country_name,
                                                 origin = 'country.name',
                                                 destination = 'iso2c'))

Europe <- left_join(Europe, iNat_network %>% rename(country_name=site))

n_inat_gbif_country <- recordsPerCountryGBIF(Europe$country_code)
n_inat_country <- recordsPerCountryiNat(Europe$country_name)
n_users_country <- usersPerCountryiNat(Europe$country_name)

area_country <-areaPerCountry(Europe$country_code)
population <- populationPerCountry(Europe$country_code)
gdp_per_capita <- gdpPerCapitaCountry(Europe$country_code)
gdp_research <- gdpResearchPerCountry(Europe$country_code)
latitude <- latitudePerCountry(Europe$country_code)

data_variables_Europe <- left_join(left_join(left_join(
  left_join(left_join(left_join(left_join(left_join(
  Europe, n_inat_gbif_country),
  n_inat_country), 
  n_users_country),
  area_country), 
  population), 
  gdp_per_capita), 
  gdp_research), latitude)

saveRDS(data_variables_Europe, 'data/Europe_data_variables.rds')

########################################################################

Asia <- tibble(country_name = c('India', 'China', 'Indonesia', 'Pakistan', 'Bangladesh', 'Japan', 'Philippines', 'Vietnam', 'Iran', 'Turkey', 'Thailand', 'Myanmar', 'South Korea','Iraq', 'Afghanistan', 'Yemen', 'Uzbekistan', 'Malaysia', 'Saudi Arabia', 'Nepal', 'North Korea','Syria', 'Sri Lanka','Kazakhstan', 'Cambodia', 'Jordan', 'United Arab Emirates', 'Tajikistan', 'Azerbaijan', 'Israel', 'Laos', 'Turkmenistan', 'Kyrgyzstan', 'Singapore', 'Lebanon', 'Palestine','Oman', 'Kuwait', 'Georgia', 'Mongolia', 'Qatar', 'Armenia', 'Bahrain', 'Timor Leste', 'Cyprus', 'Bhutan', 'Maldives', 'Brunei', 'Taiwan'))

Asia <- Asia %>% 
  mutate(country_code = countrycode::countrycode(country_name,
                                                 origin = 'country.name',
                                                 destination = 'iso2c'))

Asia <- left_join(Asia, iNat_network %>% rename(country_name=site))

n_inat_gbif_country <- recordsPerCountryGBIF(Asia$country_code)
n_inat_country <- recordsPerCountryiNat(Asia$country_name)
n_users_country <- usersPerCountryiNat(Asia$country_name)

area_country <-areaPerCountry(Asia$country_code)
population <- populationPerCountry(Asia$country_code)
gdp_per_capita <- gdpPerCapitaCountry(Asia$country_code)
gdp_research <- gdpResearchPerCountry(Asia$country_code)
latitude <- latitudePerCountry(Asia$country_code)

data_variables_Asia <- left_join(left_join(left_join(
  left_join(left_join(left_join(left_join(left_join(
  Asia, n_inat_gbif_country),
  n_inat_country), 
  n_users_country),
  area_country), 
  population), 
  gdp_per_capita), 
  gdp_research), latitude)

data_variables_Asia <- data_variables_Asia %>% 
  mutate(area = ifelse(country_name == 'Taiwan', 36197, area),
         pop = ifelse(country_name == 'Taiwan', 23365274, pop))

saveRDS(data_variables_Asia, 'data/Asia_data_variables.rds')

########################################################################

Oceania <- tibble(country_name = c('Australia', 'Papua New Guinea', 'New Zealand', 'Fiji', 'Solomon Islands', 'Federated States of Micronesia', 'Vanuatu', 'Samoa', 'Kiribati', 'Tonga', 'Marshall Islands', 'Palau', 'Tuvalu', 'Nauru'))

Oceania <- Oceania %>% 
  mutate(country_code = countrycode::countrycode(country_name,
                                                 origin = 'country.name',
                                                 destination = 'iso2c'))

Oceania <- left_join(Oceania, iNat_network %>% rename(country_name=site))

n_inat_gbif_country <- recordsPerCountryGBIF(Oceania$country_code)
n_inat_country <- recordsPerCountryiNat(Oceania$country_name)
n_users_country <- usersPerCountryiNat(Oceania$country_name)

area_country <-areaPerCountry(Oceania$country_code)
population <- populationPerCountry(Oceania$country_code)
gdp_per_capita <- gdpPerCapitaCountry(Oceania$country_code)
gdp_research <- gdpResearchPerCountry(Oceania$country_code)
latitude <- latitudePerCountry(Oceania$country_code)

data_variables_Oceania <- left_join(left_join(left_join(
  left_join(left_join(left_join(left_join(left_join(
  Oceania, n_inat_gbif_country),
  n_inat_country), 
  n_users_country),
  area_country), 
  population), 
  gdp_per_capita), 
  gdp_research), latitude)

saveRDS(data_variables_Oceania, 'data/Oceania_data_variables.rds')

########################################################################

variables_global <- bind_rows(data_variables_America %>% 
                                mutate(continent = 'America'),
                              data_variables_Europe %>% 
                                mutate(continent = 'Europe'),
                              data_variables_Asia %>% 
                                mutate(continent = 'Asia'),
                              data_variables_Oceania %>% 
                                mutate(continent = 'Oceania')) %>% 
  unique()

variables_global <- bind_rows(readRDS('data/America_data_variables.rds') %>%
                                mutate(continent = 'America'),
          readRDS('data/Europe_data_variables.rds') %>%
                                mutate(continent = 'Europe'),
          readRDS('data/Asia_data_variables.rds') %>%
                                mutate(continent = 'Asia'),
          readRDS('data/Oceania_data_variables.rds') %>%
                                mutate(continent = 'Oceania'))  %>%
  unique()

saveRDS(variables_global, 'data/global_data_variables.rds')

Summary of the data for the Network members

Total number of records on iNaturalist per country

Code
ids <- data_variables %>% 
  select(country_name,site_name, n_records_inat) %>% 
  arrange(desc(n_records_inat)) %>% 
  with(which(!is.na(site_name)))

data_variables %>% 
  select(country_name,site_name, n_records_inat) %>% 
  arrange(desc(n_records_inat)) %>% 
  mutate('#'= row_number()) %>% relocate('#') %>% 
  rename(`Country` = country_name,
         `iNat site` = site_name,
         `Records on iNat`=n_records_inat) %>% 
  kableExtra::kbl(digits=4, format.args = list(big.mark = ',')) %>% 
  kableExtra::kable_material('striped') %>% 
  kableExtra::row_spec(ids, bold = T, color = "white", background = "#228A22") %>% 
  kableExtra::scroll_box(height = '600px')
# Country iNat site Records on iNat
1 Canada iNaturalist.ca 17,900,009
2 Australia iNaturalistAU 10,849,285
3 Mexico iNaturalistMX 8,412,995
4 United Kingdom iNaturalistUK 6,943,926
5 Germany 5,091,498
6 France 4,932,213
7 Spain Natusfera 4,604,659
8 Taiwan iNaturalistTW 3,666,074
9 Italy 3,662,680
10 India 3,571,023
11 Brazil 3,496,740
12 New Zealand iNaturalistNZ 2,981,047
13 Austria 2,476,537
14 Portugal BioDiversity4All 2,107,143
15 Colombia NaturalistaCO 1,907,160
16 Ecuador iNaturalistEc 1,874,438
17 Argentina ArgentiNat 1,833,740
18 Denmark 1,657,667
19 China 1,646,241
20 Costa Rica NaturalistaCR 1,580,872
21 Ukraine 1,526,147
22 Czechia 1,301,754
23 Poland 1,254,562
24 Finland iNaturalistFi 1,170,974
25 Thailand 996,118
26 Malaysia 973,288
27 Japan 965,864
28 Indonesia 955,846
29 Netherlands 954,360
30 Bolivia 940,781
31 Switzerland 905,895
32 Chile iNaturalistCL 874,666
33 Peru 785,071
34 Singapore 746,405
35 Panama iNaturalistPa 716,270
36 Greece iNaturalistGR 709,163
37 Sweden 659,853
38 Belgium 647,042
39 South Korea 532,394
40 Hungary 481,094
41 Philippines 479,525
42 Lithuania 464,960
43 Croatia 427,593
44 Luxembourg iNaturalist.LU 375,167
45 Norway 371,783
46 Turkey 367,680
47 Turkey 367,680
48 Israel iNaturalistil 358,153
49 Honduras 333,491
50 Ireland 263,299
51 Romania 247,772
52 Slovakia 226,580
53 Slovenia 205,635
54 Sri Lanka 203,493
55 Kazakhstan 198,800
56 Vietnam 189,863
57 Guatemala iNaturalistGT 169,504
58 Bulgaria 162,215
59 Belize 159,063
60 Uruguay NaturalistaUY 156,551
61 Dominican Republic 145,886
62 Serbia 144,065
63 Trinidad and Tobago 127,655
64 Nicaragua 122,378
65 Iceland 115,563
66 Nepal 110,868
67 Mongolia 94,205
68 El Salvador 92,405
69 Venezuela 89,298
70 Cuba 87,892
71 Cambodia 83,930
72 Estonia 75,483
73 Jamaica 73,804
74 Albania 73,194
75 Armenia 73,171
76 United Arab Emirates 71,277
77 Latvia 69,627
78 Fiji 66,619
79 Cyprus 65,867
80 Maldives 61,266
81 Bahamas 59,704
82 Iran 59,489
83 Montenegro 57,086
84 Bhutan 46,283
85 Uzbekistan 42,320
86 Bosnia and Herzegovina 41,590
87 Pakistan 36,960
88 Paraguay 35,851
89 Bangladesh 35,612
90 Papua New Guinea 35,464
91 Kyrgyzstan 33,781
92 Myanmar 32,490
93 Palestine 31,040
94 Saudi Arabia 30,342
95 Oman 29,897
96 Suriname 29,312
97 Kuwait 29,189
98 Malta 27,991
99 Guyana 27,703
100 Macedonia 27,626
101 Laos 26,952
102 Syria 24,597
103 Marshall Islands 24,568
104 Vanuatu 23,271
105 Jordan 22,029
106 Palau 18,512
107 Azerbaijan 18,467
108 Haiti 16,634
109 Solomon Islands 14,516
110 Moldova 14,322
111 Iraq 14,125
112 Lebanon 13,122
113 Brunei 11,386
114 Cyprus 10,894
115 Qatar 10,678
116 Yemen 9,808
117 Tajikistan 9,560
118 Tonga 8,175
119 Georgia 6,820
120 Georgia 6,820
121 Federated States of Micronesia 6,153
122 Samoa 5,247
123 Liechtenstein 4,717
124 Bahrain 3,592
125 Tuvalu 2,885
126 Kiribati 2,550
127 North Korea 1,884
128 Afghanistan 1,050
129 Turkmenistan 626
130 Nauru 103
131 Timor Leste

Number of records from iNaturalist on GBIF per country

Code
ids <- data_variables %>% 
  select(country_name, site_name, n_records_gbif, n_records_gbif_iNat) %>% 
  mutate(proportion=n_records_gbif_iNat*100/n_records_gbif) %>% 
  arrange(desc(proportion)) %>% 
  with(which(!is.na(site_name)))

data_variables %>% 
  select(country_name, site_name, n_records_gbif, n_records_gbif_iNat) %>% 
  mutate(proportion=n_records_gbif_iNat*100/n_records_gbif) %>% 
  arrange(desc(proportion)) %>% 
  mutate('#'= row_number()) %>% relocate('#') %>% 
  select(-n_records_gbif_iNat) %>% 
  rename(`Country` = country_name,
         `iNat site` = site_name,
         `Records from iNat on GBIF`=n_records_gbif,
         `Proportion`=proportion) %>% 
  kableExtra::kbl(digits=4, format.args = list(big.mark = ',')) %>% 
  kableExtra::kable_material('striped') %>% 
  kableExtra::row_spec(ids, bold = T, color = "white", background = "#228A22") %>% 
  kableExtra::scroll_box(height = '600px')
# Country iNat site Records from iNat on GBIF Proportion
1 Maldives 102,600 28.5780
2 Ukraine 3,305,090 27.4630
3 Albania 113,283 26.0198
4 Italy 7,401,549 22.7511
5 Singapore 1,664,886 22.4301
6 Kazakhstan 435,373 22.2200
7 Montenegro 142,401 19.2155
8 Tuvalu 9,554 18.7565
9 Croatia 1,078,021 18.6975
10 Malta 81,373 18.3243
11 Lithuania 1,232,602 17.2011
12 Bosnia and Herzegovina 82,290 16.6764
13 Uzbekistan 117,367 15.7259
14 Fiji 275,858 15.1440
15 Marshall Islands 109,599 14.1853
16 Hungary 1,882,300 12.8489
17 New Zealand iNaturalistNZ 14,957,900 12.3202
18 Czechia 4,357,996 12.2621
19 Greece iNaturalistGR 3,065,179 12.0781
20 Indonesia 3,076,242 11.8549
21 Austria 12,772,943 11.5651
22 Timor Leste 91,370 11.3823
23 Slovenia 790,292 11.2033
24 Armenia 192,968 11.1687
25 Romania 1,070,706 10.6347
26 Mexico iNaturalistMX 31,246,248 10.4663
27 Iraq 72,541 8.8419
28 Macedonia 118,106 8.1918
29 Malaysia 3,341,240 8.0139
30 Mongolia 571,726 7.9358
31 Cyprus 522,317 7.5020
32 Cyprus 522,489 7.4995
33 Bahrain 22,772 7.4697
34 Argentina ArgentiNat 14,442,370 7.3516
35 Serbia 865,944 7.2042
36 Dominican Republic 784,391 7.1786
37 Slovakia 1,688,147 7.0103
38 Taiwan iNaturalistTW 21,421,162 6.9228
39 Kyrgyzstan 203,073 6.7690
40 Yemen 91,962 6.7506
41 Vietnam 783,215 6.7150
42 Jordan 140,953 6.5376
43 Brunei 49,391 6.4060
44 Vanuatu 165,949 6.0513
45 Luxembourg iNaturalist.LU 3,354,337 5.9892
46 Philippines 2,079,909 5.8888
47 Thailand 6,122,726 5.7396
48 Japan 8,472,177 5.1061
49 Palau 193,504 5.1053
50 Canada iNaturalist.ca 178,493,991 5.1031
51 Solomon Islands 185,161 5.0497
52 Portugal BioDiversity4All 19,715,797 5.0231
53 Uruguay NaturalistaUY 1,681,236 4.9915
54 China 9,433,222 4.9189
55 Bulgaria 1,835,892 4.7743
56 Samoa 45,966 4.7731
57 Sri Lanka 2,146,804 4.4949
58 Trinidad and Tobago 1,127,557 4.4920
59 Ecuador iNaturalistEc 11,650,113 4.4398
60 Georgia 1,181,861 4.3536
61 Georgia 1,181,992 4.3531
62 Germany 62,252,275 4.3359
63 Australia iNaturalistAU 135,376,182 4.1283
64 Turkey 3,309,188 4.1221
65 Turkey 3,309,436 4.1218
66 Bolivia 2,200,010 4.1167
67 Myanmar 333,389 4.0925
68 Poland 14,780,502 4.0749
69 Latvia 604,673 4.0703
70 Jamaica 680,756 3.9274
71 Moldova 121,784 3.8347
72 Haiti 189,174 3.7553
73 Brazil 26,616,164 3.7527
74 Kuwait 311,359 3.6989
75 Qatar 135,904 3.6460
76 South Korea 5,403,467 3.6406
77 Laos 242,832 3.5642
78 Iran 818,764 3.4275
79 Chile iNaturalistCL 10,730,394 3.3230
80 Azerbaijan 242,165 3.2565
81 Honduras 3,459,120 3.2295
82 Tonga 133,783 3.1147
83 Lebanon 147,284 3.0723
84 Tajikistan 84,825 3.0380
85 Bahamas 987,306 3.0325
86 Spain Natusfera 72,566,816 2.9929
87 Cambodia 950,623 2.9868
88 Syria 310,449 2.9493
89 Panama iNaturalistPa 7,850,521 2.9067
90 Cuba 1,845,461 2.8590
91 Palestine 581,583 2.8202
92 Suriname 463,888 2.7455
93 Saudi Arabia 501,270 2.7037
94 Nicaragua 2,045,924 2.6963
95 North Korea 47,941 2.6950
96 El Salvador 1,250,974 2.6841
97 Iceland 2,138,710 2.6692
98 Liechtenstein 86,720 2.5023
99 Nepal 1,290,694 2.4486
100 Oman 523,644 2.4125
101 Peru 8,881,610 2.3764
102 Ireland 5,073,295 2.2550
103 Bhutan 564,497 2.2252
104 Bangladesh 653,738 2.0881
105 United Kingdom iNaturalistUK 179,386,804 2.0854
106 Israel iNaturalistil 7,023,671 2.0637
107 India 50,945,385 2.0362
108 Federated States of Micronesia 130,557 2.0068
109 Costa Rica NaturalistaCR 31,581,986 1.8071
110 Switzerland 28,174,048 1.6532
111 Pakistan 572,636 1.5437
112 Colombia NaturalistaCO 32,226,610 1.4854
113 United Arab Emirates 1,968,642 1.4806
114 Guatemala iNaturalistGT 4,598,258 1.2107
115 Guyana 830,244 1.2107
116 France 192,598,919 1.2021
117 Kiribati 157,257 1.1618
118 Belize 6,646,850 1.0838
119 Turkmenistan 22,062 1.0742
120 Denmark 60,344,753 1.0540
121 Finland iNaturalistFi 45,743,721 1.0521
122 Papua New Guinea 1,704,726 0.8613
123 Nauru 4,714 0.8061
124 Venezuela 4,189,542 0.7974
125 Belgium 39,976,677 0.7507
126 Paraguay 1,412,769 0.7230
127 Afghanistan 65,829 0.6806
128 Estonia 7,446,938 0.5402
129 Netherlands 123,944,227 0.3671
130 Norway 53,254,830 0.3433
131 Sweden 141,826,712 0.2148

Number of users recording in the country

Code
ids <- data_variables %>% 
  select(country_name, site_name, n_users) %>% 
  arrange(desc(n_users)) %>% 
  with(which(!is.na(site_name)))

data_variables %>% 
  select(country_name, site_name, n_users) %>% 
  arrange(desc(n_users)) %>%
  mutate('#'= row_number()) %>% relocate('#') %>% 
  rename(`Country` = country_name,
         `iNat site` = site_name,
         `Users recording on iNat`=n_users) %>% 
  kableExtra::kbl(digits=4, format.args = list(big.mark = ',')) %>% 
  kableExtra::kable_material('striped') %>% 
  kableExtra::row_spec(ids, bold = T, color = "white", background = "#228A22") %>% 
  kableExtra::scroll_box(height = '600px')
# Country iNat site Users recording on iNat
1 Canada iNaturalist.ca 249,842
2 Mexico iNaturalistMX 166,175
3 United Kingdom iNaturalistUK 145,086
4 France 128,279
5 Australia iNaturalistAU 116,817
6 Italy 85,798
7 Germany 79,925
8 Spain Natusfera 75,563
9 Denmark 74,527
10 Brazil 69,968
11 Taiwan iNaturalistTW 57,783
12 Colombia NaturalistaCO 54,068
13 New Zealand iNaturalistNZ 46,991
14 India 46,458
15 Portugal BioDiversity4All 36,431
16 Ecuador iNaturalistEc 36,348
17 Costa Rica NaturalistaCR 35,332
18 Czechia 33,922
19 Finland iNaturalistFi 29,066
20 Netherlands 29,007
21 Bolivia 28,962
22 Austria 28,649
23 Argentina ArgentiNat 24,765
24 Belgium 24,564
25 Japan 23,220
26 Thailand 22,958
27 Chile iNaturalistCL 22,833
28 Switzerland 21,953
29 Greece iNaturalistGR 21,690
30 Indonesia 21,514
31 Sweden 19,000
32 Malaysia 18,659
33 China 17,797
34 Panama iNaturalistPa 17,553
35 Poland 17,070
36 Peru 16,733
37 Philippines 15,739
38 Turkey 13,594
39 Turkey 13,594
40 Croatia 12,772
41 Ireland 12,725
42 Norway 12,583
43 Ukraine 12,060
44 Singapore 9,985
45 Lithuania 9,104
46 Honduras 8,713
47 South Korea 7,949
48 Guatemala iNaturalistGT 7,816
49 Hungary 7,081
50 Luxembourg iNaturalist.LU 6,843
51 Iceland 6,417
52 Israel iNaturalistil 6,374
53 Slovenia 6,300
54 Romania 6,116
55 Dominican Republic 6,106
56 Vietnam 6,037
57 Slovakia 5,431
58 Belize 4,653
59 Sri Lanka 4,259
60 United Arab Emirates 4,161
61 Bahamas 4,159
62 Uruguay NaturalistaUY 4,041
63 Bulgaria 3,473
64 Nepal 3,177
65 Jamaica 3,138
66 Nicaragua 2,966
67 Kazakhstan 2,958
68 El Salvador 2,832
69 Cuba 2,707
70 Estonia 2,686
71 Cyprus 2,437
72 Serbia 2,409
73 Cambodia 2,407
74 Trinidad and Tobago 2,326
75 Latvia 2,273
76 Venezuela 2,197
77 Montenegro 2,118
78 Albania 1,842
79 Malta 1,672
80 Fiji 1,650
81 Maldives 1,568
82 Pakistan 1,509
83 Bhutan 1,452
84 Bosnia and Herzegovina 1,404
85 Iran 1,404
86 Armenia 1,339
87 Saudi Arabia 1,330
88 Paraguay 1,280
89 Myanmar 1,260
90 Jordan 1,255
91 Laos 1,254
92 Mongolia 1,241
93 Macedonia 1,100
94 Palestine 1,093
95 Oman 1,041
96 Uzbekistan 969
97 Bangladesh 904
98 Kyrgyzstan 898
99 Lebanon 703
100 Azerbaijan 702
101 Papua New Guinea 655
102 Guyana 598
103 Cyprus 532
104 Suriname 485
105 Vanuatu 478
106 Qatar 461
107 Haiti 436
108 Palau 414
109 Moldova 388
110 Iraq 377
111 Liechtenstein 375
112 Brunei 346
113 Federated States of Micronesia 345
114 Kuwait 326
115 Tajikistan 300
116 Georgia 293
117 Georgia 293
118 Bahrain 291
119 Solomon Islands 255
120 Samoa 202
121 Syria 172
122 Tonga 168
123 Yemen 129
124 Afghanistan 128
125 Marshall Islands 88
126 Turkmenistan 84
127 North Korea 76
128 Kiribati 35
129 Tuvalu 24
130 Nauru 11
131 Timor Leste

Associations between variables

Population of the country

Code
## records
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(pop/100000, n_records_inat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Population of the country (hundred thousand)',
       y='Number of records on iNaturalist (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of records on iNaturalist vs Population of the country
Code
## records in GBIF
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(pop/100000, n_records_gbif_iNat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Population of the country (hundred thousand)',
       y='Number of iNat records on GBIF (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of iNat records on GBIF vs Population of the country
Code
## users
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(pop/100000, n_users, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', col= 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Population of the country (hundred thousand)',
       y='Number of users recording on iNaturalist') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of users recording on iNaturalist vs Population of the country

Area of the country in km2

Code
## records
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(area/1000,n_records_inat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Area of the country (thousand km2)',
       y='Number of records on iNaturalist (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of records on iNaturalist vs Area of the country
Code
## records in gbif
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(area/1000, n_records_gbif_iNat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Area of the country (thousand km2)',
       y='Number of iNat records on GBIF (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of iNat records on GBIF vs Area of the country
Code
## users
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(area/1000, n_users, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Area of the country (thousand km2)',
       y='Number of users recording on iNaturalist') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of users recording on iNaturalist vs Area of the country

Country’s centroid latitude

Code
## records
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(abs(latitude), n_records_inat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Absolute decimal latitude of the country\'s centroid',
       y='Number of records on iNaturalist (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of records on iNaturalist vs country’s centroid latitude
Code
## records in gbif
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(abs(latitude), n_records_gbif_iNat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Absolute decimal latitude of the country\'s centroid',
       y='Number of iNat records on GBIF (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of iNat records on GBIF vs country’s centroid latitude
Code
## users
ggplot(data_variables %>% 
         mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(abs(latitude), n_users, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='Absolute decimal latitude of the country\'s centroid',
       y='Number of users recording on iNaturalist') +
  #scale_x_log10() + 
  scale_y_log10() +
  theme_bw()

Number of users recording on iNaturalist vs country’s centroid latitude

GDP per capita

Code
## records
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(gdp/1000, n_records_inat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='GDP per capita (thousand USD)',
       y='Number of records on iNaturalist (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of records on iNaturalist vs country’s GDP per capita
Code
## records in gbif
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(gdp/1000, n_records_gbif_iNat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='GDP per capita (thousand USD)',
       y='Number of iNat records on GBIF (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of iNat records on GBIF vs country’s GDP per capita
Code
## users
ggplot(data_variables %>% 
         mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(gdp/1000, n_users, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='GDP per capita (thousand USD)',
       y='Number of users recording on iNaturalist') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of users recording on iNaturalist vs GDP per capita

% of the GDP of the country dedicated to research

Code
## records
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(gdp_research, n_records_inat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='GDP of the country dedicated to research (%)',
       y='Number of records on iNaturalist (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of records on iNaturalist vs % of the country’s GDP dedicated to research
Code
## records in gbif
ggplot(data_variables %>% mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(gdp_research, n_records_gbif_iNat/1000, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='GDP of the country dedicated to research (%)',
       y='Number of iNat records on GBIF (thousand)') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of iNat records on GBIF vs % of the country’s GDP dedicated to research
Code
## users
ggplot(data_variables %>% 
         mutate(site_on_iNat = ifelse(!is.na(site_id), 'yes', 'no')), 
       aes(gdp_research,n_users, label = site_name)) +
  geom_point(aes(col=site_on_iNat), size=2, show.legend = F) +
  scale_color_manual(values = c('black', '#74AC00')) +
  geom_smooth(method='lm', colour = 'black') +
  geom_label_repel(aes(fill=site_on_iNat), 
                   colour = "black", #fontface = "bold",
                   segment.color = 'black',
                   show.legend = F, max.overlaps= Inf) +
  scale_fill_manual(values = c('#F7F7F7', '#74AC00')) +
  labs(x='GDP of the country dedicated to research (%)',
       y='Number of users recording on iNaturalist') +
  scale_x_log10() + scale_y_log10() +
  theme_bw()

Number of users recording on iNaturalist vs % of the country’s GDP dedicated to research

Modelling

Code
data_regressions <- data_variables %>% 
  mutate(has_site = ifelse(!is.na(site_name), 1, 0)) %>% 
  mutate(has_site = as.factor(has_site)) %>% 
  mutate(n_gbif_inat = n_records_gbif_iNat) %>% 
  dplyr::select(country_code, n_records_inat, n_gbif_inat, n_users,  
                area, gdp, gdp_research, pop, latitude, has_site) %>% 
  filter(!is.na(gdp_research) & !is.na(latitude)) %>% # remove NAs
  mutate(log_n_gbif_inat = log(n_gbif_inat),
         log_n_records_inat = log(n_records_inat),
         log_n_users = log(n_users), 
         log_area = log(area),
         log_pop = log(pop),
         log_gdp = log(gdp))

Total number of records on iNaturalist per country

Code
fit_n_records <- lm(log_n_records_inat ~ log_area + log_gdp + gdp_research + log_pop + latitude + has_site, data=data_regressions)
step_n_records <- stepAIC(fit_n_records, direction = 'both')
Start:  AIC=52.97
log_n_records_inat ~ log_area + log_gdp + gdp_research + log_pop + 
    latitude + has_site

               Df Sum of Sq    RSS    AIC
- latitude      1    0.1881 144.26 51.098
- log_area      1    1.4312 145.50 51.922
<none>                      144.07 52.973
- gdp_research  1    3.7472 147.82 53.438
- log_gdp       1   19.8464 163.92 63.362
- log_pop       1   28.4500 172.52 68.273
- has_site      1   31.1315 175.20 69.754

Step:  AIC=51.1
log_n_records_inat ~ log_area + log_gdp + gdp_research + log_pop + 
    has_site

               Df Sum of Sq    RSS    AIC
- log_area      1     1.534 145.79 50.113
<none>                      144.26 51.098
- gdp_research  1     3.562 147.82 51.439
+ latitude      1     0.188 144.07 52.973
- log_gdp       1    19.663 163.92 61.365
- log_pop       1    28.826 173.09 66.586
- has_site      1    38.820 183.08 71.975

Step:  AIC=50.11
log_n_records_inat ~ log_gdp + gdp_research + log_pop + has_site

               Df Sum of Sq    RSS    AIC
- gdp_research  1     2.700 148.49 49.875
<none>                      145.79 50.113
+ log_area      1     1.534 144.26 51.098
+ latitude      1     0.291 145.50 51.922
- log_gdp       1    21.329 167.12 61.221
- has_site      1    42.573 188.37 72.708
- log_pop       1    87.294 233.09 93.159

Step:  AIC=49.87
log_n_records_inat ~ log_gdp + log_pop + has_site

               Df Sum of Sq    RSS     AIC
<none>                      148.49  49.875
+ gdp_research  1     2.700 145.79  50.113
+ log_area      1     0.672 147.82  51.439
+ latitude      1     0.004 148.49  51.872
- has_site      1    40.945 189.44  71.253
- log_gdp       1    66.355 214.85  83.336
- log_pop       1   118.799 267.29 104.304
Code
step_n_records$anova # display results 
Stepwise Model Path 
Analysis of Deviance Table

Initial Model:
log_n_records_inat ~ log_area + log_gdp + gdp_research + log_pop + 
    latitude + has_site

Final Model:
log_n_records_inat ~ log_gdp + log_pop + has_site

            Step Df  Deviance Resid. Df Resid. Dev      AIC
1                                    89   144.0720 52.97261
2     - latitude  1 0.1880533        90   144.2600 51.09784
3     - log_area  1 1.5337988        91   145.7938 50.11314
4 - gdp_research  1 2.7002138        92   148.4940 49.87487

Number of records from iNaturalist on GBIF per country

Code
fit_gbif <- lm(log_n_gbif_inat ~ log_area + log_gdp + gdp_research + log_pop + latitude + has_site, data=data_regressions)
step_gbif <- stepAIC(fit_gbif, direction = 'both')
Start:  AIC=48.5
log_n_gbif_inat ~ log_area + log_gdp + gdp_research + log_pop + 
    latitude + has_site

               Df Sum of Sq    RSS    AIC
- latitude      1     0.175 137.69 46.622
- log_area      1     1.285 138.80 47.392
- gdp_research  1     2.033 139.55 47.908
<none>                      137.51 48.499
- log_pop       1    24.130 161.64 62.020
- log_gdp       1    27.362 164.88 63.920
- has_site      1    32.035 169.55 66.603

Step:  AIC=46.62
log_n_gbif_inat ~ log_area + log_gdp + gdp_research + log_pop + 
    has_site

               Df Sum of Sq    RSS    AIC
- log_area      1     1.213 138.90 45.464
- gdp_research  1     2.502 140.19 46.351
<none>                      137.69 46.622
+ latitude      1     0.175 137.51 48.499
- log_pop       1    23.971 161.66 60.030
- log_gdp       1    28.214 165.90 62.517
- has_site      1    35.531 173.22 66.660

Step:  AIC=45.46
log_n_gbif_inat ~ log_gdp + gdp_research + log_pop + has_site

               Df Sum of Sq    RSS    AIC
- gdp_research  1     1.855 140.76 44.737
<none>                      138.90 45.464
+ log_area      1     1.213 137.69 46.622
+ latitude      1     0.104 138.80 47.392
- log_gdp       1    30.091 168.99 62.289
- has_site      1    38.791 177.69 67.108
- log_pop       1    72.071 210.97 83.588

Step:  AIC=44.74
log_n_gbif_inat ~ log_gdp + log_pop + has_site

               Df Sum of Sq    RSS    AIC
<none>                      140.76 44.737
+ gdp_research  1     1.855 138.90 45.464
+ log_area      1     0.566 140.19 46.351
+ latitude      1     0.469 140.29 46.417
- has_site      1    37.546 178.30 65.437
- log_gdp       1    82.612 223.37 87.069
- log_pop       1    96.957 237.71 93.045
Code
step_gbif$anova # display results 
Stepwise Model Path 
Analysis of Deviance Table

Initial Model:
log_n_gbif_inat ~ log_area + log_gdp + gdp_research + log_pop + 
    latitude + has_site

Final Model:
log_n_gbif_inat ~ log_gdp + log_pop + has_site

            Step Df  Deviance Resid. Df Resid. Dev      AIC
1                                    89   137.5124 48.49911
2     - latitude  1 0.1754503        90   137.6878 46.62151
3     - log_area  1 1.2131426        91   138.9010 45.46365
4 - gdp_research  1 1.8551255        92   140.7561 44.73731

Number of users recording in the country

Code
fit_users <- lm(log_n_users ~ log_area + log_gdp + gdp_research + log_pop + latitude + has_site, data=data_regressions)
step_users <- stepAIC(fit_users, direction = 'both')
Start:  AIC=32.78
log_n_users ~ log_area + log_gdp + gdp_research + log_pop + latitude + 
    has_site

               Df Sum of Sq    RSS    AIC
- latitude      1    0.2131 116.95 30.950
- log_area      1    0.7316 117.47 31.375
- gdp_research  1    2.1621 118.90 32.537
<none>                      116.74 32.775
- log_gdp       1   21.8258 138.56 47.230
- has_site      1   22.2782 139.01 47.543
- log_pop       1   22.7042 139.44 47.836

Step:  AIC=30.95
log_n_users ~ log_area + log_gdp + gdp_research + log_pop + has_site

               Df Sum of Sq    RSS    AIC
- log_area      1    0.8075 117.76 29.611
- gdp_research  1    1.9581 118.91 30.544
<none>                      116.95 30.950
+ latitude      1    0.2131 116.74 32.775
- log_gdp       1   21.6167 138.57 45.233
- log_pop       1   23.0462 140.00 46.218
- has_site      1   28.2007 145.15 49.689

Step:  AIC=29.61
log_n_users ~ log_gdp + gdp_research + log_pop + has_site

               Df Sum of Sq    RSS    AIC
- gdp_research  1     1.496 119.25 28.823
<none>                      117.76 29.611
+ log_area      1     0.808 116.95 30.950
+ latitude      1     0.289 117.47 31.375
- log_gdp       1    22.978 140.74 44.724
- has_site      1    30.635 148.39 49.810
- log_pop       1    66.009 183.77 70.335

Step:  AIC=28.82
log_n_users ~ log_gdp + log_pop + has_site

               Df Sum of Sq    RSS    AIC
<none>                      119.25 28.823
+ gdp_research  1     1.496 117.76 29.611
+ log_area      1     0.345 118.91 30.544
+ latitude      1     0.032 119.22 30.797
- has_site      1    29.639 148.89 48.132
- log_gdp       1    63.604 182.86 67.858
- log_pop       1    88.145 207.40 79.948
Code
step_users$anova # display results 
Stepwise Model Path 
Analysis of Deviance Table

Initial Model:
log_n_users ~ log_area + log_gdp + gdp_research + log_pop + latitude + 
    has_site

Final Model:
log_n_users ~ log_gdp + log_pop + has_site

            Step Df  Deviance Resid. Df Resid. Dev      AIC
1                                    89   116.7371 32.77534
2     - latitude  1 0.2131326        90   116.9503 30.95046
3     - log_area  1 0.8075393        91   117.7578 29.61106
4 - gdp_research  1 1.4958103        92   119.2536 28.82281

Best models

Code
# n_records_inat ~ area + gdp_research + pop + has_site
# log_n_records_inat ~ log_gdp + gdp_research + log_pop + latitude + has_site

# n_gbif_inat ~ area + gdp_research + pop + has_site
# log_n_gbif_inat ~ log_gdp + gdp_research + log_pop + latitude + has_site

# n_users ~ area + gdp_research + latitude + has_site
# log_n_users ~ log_gdp + gdp_research + log_pop + latitude + has_site

modelo_n_records <- lm(log_n_records_inat ~ log_gdp + log_pop + has_site, data=data_regressions)
modelo_gbif <- lm(log_n_gbif_inat ~ log_gdp + log_pop + has_site, data=data_regressions)
modelo_users <- lm(log_n_users ~ log_gdp + log_pop + has_site, data=data_regressions)

Total number of records on iNaturalist per country

Code
stargazer::stargazer(modelo_n_records,
          ci = T, digits=1,
          type='html',
          title = 'Total number of records on iNaturalist')
Total number of records on iNaturalist
Dependent variable:
log_n_records_inat
log_gdp 0.8***
(0.5, 1.0)
log_pop 0.7***
(0.6, 0.9)
has_site1 1.7***
(1.0, 2.4)
Constant -7.6***
(-11.6, -3.5)
Observations 96
R2 0.6
Adjusted R2 0.6
Residual Std. Error 1.3 (df = 92)
F Statistic 47.1*** (df = 3; 92)
Note: p<0.1; p<0.05; p<0.01

Number of records from iNaturalist on GBIF per country

Code
stargazer::stargazer(modelo_gbif,
          ci = T, digits=1,
          type='html',
          title = 'Number of records from iNaturalist on GBIF')
Number of records from iNaturalist on GBIF
Dependent variable:
log_n_gbif_inat
log_gdp 0.9***
(0.6, 1.1)
log_pop 0.7***
(0.5, 0.8)
has_site1 1.6***
(1.0, 2.3)
Constant -8.1***
(-12.0, -4.1)
Observations 96
R2 0.6
Adjusted R2 0.6
Residual Std. Error 1.2 (df = 92)
F Statistic 48.0*** (df = 3; 92)
Note: p<0.1; p<0.05; p<0.01

Number of users recording in the country

Code
stargazer::stargazer(modelo_users,
          ci = T, digits=1,
          type='html',
          title = 'Number of users recording')
Number of users recording
Dependent variable:
log_n_users
log_gdp 0.8***
(0.5, 1.0)
log_pop 0.6***
(0.5, 0.8)
has_site1 1.4***
(0.9, 2.0)
Constant -9.3***
(-13.0, -5.7)
Observations 96
R2 0.6
Adjusted R2 0.6
Residual Std. Error 1.1 (df = 92)
F Statistic 46.8*** (df = 3; 92)
Note: p<0.1; p<0.05; p<0.01

Random forest

Total number of records on iNaturalist per country

Code
rf_n_records <- randomForest(log_n_records_inat ~ log_area + log_gdp + 
                               gdp_research + log_pop + 
                               latitude + has_site, data=data_regressions,
                             na.action=na.omit,
                             mtry = 2,
                             nperm=10, importance=T) 

varImpPlot(rf_n_records, main = 'Total number of records on iNaturalist', type=2)

Number of records from iNaturalist on GBIF per country

Code
rf_gbif <- randomForest(log_n_gbif_inat ~ log_area + log_gdp + 
                          gdp_research + log_pop + 
                          latitude + has_site, data=data_regressions,
                             na.action=na.omit,
                        mtry = 2,
                             nperm=10, importance=T)

varImpPlot(rf_gbif, main = 'Number of records from iNaturalist on GBIF', type=2)

Number of users recording in the country

Code
rf_users <- randomForest(log_n_users ~ log_area + log_gdp + 
                          gdp_research + log_pop + 
                          latitude + has_site, data=data_regressions,
                             na.action=na.omit,
                         mtry = 2,
                             nperm=10, importance=T)

varImpPlot(rf_users, main = 'Number of users recording', type=2)