A near-real-time global landslide incident reporting tool demonstrator using social media and artificial intelligence

.


Introduction
When landslides occur, their impacts are usually not discovered beyond the attention of first responders or government agencies until the news media are able to attend the scene or, for example in remote areas, once satellites have been able to collect imagery and their responding communities have activated the Disaster Charter (https://disasterscharter.org) and processed the data. There is currently an estimated time lag, or data latency, ranging from several hours to several days from when a disaster happens and reliable spatial data becoming available to users, particularly with respect to satellite data [1][2][3][4]. Data latency is associated with the satellite return path and the route that it takes, image quality and processing time. Landslides can be associated with rainfall or volcanoes meaning satellite data acquisition can be delayed due to poor image quality caused by cloud cover or whether the satellite passes the area in the day or night [5,6]. Interpretation of these images also requires considerable effort by specialists although recent work aims to speed this process up using automatic image recognition [7][8][9][10][11][12]. This means that locating and assessing the disaster can take considerable time and most studies are based on post-event analysis.
Social media data, while inherently imperfect, provide near-real-time information in large quantities and at spatial densities that Perceptions of what constitutes reliable information are evolving to include unstructured data, such as that published on social media. This is now becoming more valued as a tool to record hazard and hazard impact information, particularly as it can include eyewitness accounts and facilitate the reconstruction of events [68,73]. To add weight to this, it has become increasingly recognised that news media sources have reporting biases, such as factual accuracy or not reporting at all due to prioritisation of other news [74][75][76]. Despite this however, the bulk of data collection and interpretation still involves time consuming work by specialists searching the Internet for news and social media reports, directly engaging in communications with those submitting information and then interpreting the data received [38,54,60,68].
Under-representation in landslide databases can feed through to emergency planning and preparedness for landslide response when natural disasters occur, particularly in areas for which regional landslide susceptibility mapping has not been completed. In such regions, especially if they are remote with poor access to communication technologies, international responses to natural disasters are supported by attempts to better understand the distribution of triggered landslides, e.g. Nepal where over 4,000 landslides were mapped using satellite imagery after the Gorkha earthquake in 2015 [77]. It is important to understand both where landslides may have impacted communities with potential damage or loss of life, and where they may affect transport routes and impede emergency response activities. For the latter, even small landslide events can block major transport routes so these are particularly important to identify. In multi-hazard scenarios timely understanding of impacts such as damming of rivers may be important in terms of protecting communities from consequential flooding for example.

Aims
With an aim to tackle the aforementioned issues, this paper explores a well-known microblogging platform, Twitter, to identify landslide-related posts, specifically those with images containing landslides. Twitter allows users to read and post short messages called 'Tweets'. Tweets are limited to 280 characters and photos or short videos can be included. Tweets are posted to a publicly available profile or can be sent as direct messages to other users. In 2019, Twitter had 330 million monthly active users and 145 million daily active users; a total of 500 million Tweets were sent by Twitter users every day, equivalent to 5,787 Tweets per second [18].
In this paper, a new methodology is presented that harvests landslide photographs from Tweets automatically and in real-time. To do this, different types of noise and irrelevant content that can be associated with landslide-related social media imagery data are identified. Moreover, a further aim is the annotation and release of a dataset for the community to develop image filtering and landslide detection tools. Since the focus of this study is to establish a methodology for the landslide dataset creation, a technical paper, conducted in conjunction with this study, describes the underpinning theory and presents a detailed experimental approach to the model development step [78]. The specific objectives of this paper are: 1. Novel qualitative analysis of a non-traditional data source (Twitter) for capturing landslide reports 2. Image labelling methodology for landslide classification 3. Expert-labelled dataset consisting of 11,737 images A similar study was undertaken by Ref. [79] who used a smaller image dataset from different sources and recommend further work with a larger dataset before their algorithm can be used without manual intervention. The work presented here involves more extensive model training experiments and a larger dataset.
We suggest that the methodology and labelled dataset will help the disaster management community build tools to detect landslide images automatically from social media, with potential for incorporation in multi-hazard impact assessment workflows alongside other established methods. Moreover, we anticipate that such a tool will improve response times for first responders. This will enable responders to add information pertaining to their understanding of what is happening on the ground in near-real-time providing data from those affected as soon as it is published on social media. This interdisciplinary work is the result of the collaboration between computer scientists, earthquake-, social media-and landslide hazard specialists. The initial objective was for the earthquake-triggered landslides to be reported to the European Civil Protection Unit as this hazard can hamper rescue operations. The objective was then extended to incorporate all landslides regardless of their trigger. This tool will be open for any institute wanting to speed up social media harvesting on this topic.

Supervised Machine Learning approach
The process of using Artificial Intelligence (AI) or Machine Learning (ML) for the identification of landslides in photographs typically requires two steps: (1) create a large, labelled dataset for the task at hand, and (2) train a ML model to achieve the desired classification task. Fig. 1 shows a graphical representation of the workflow. This training dataset contains a collection of photographs showing particular characteristics associated with landslides. To create a diverse dataset, we curated a total of 11,737 images from three data sources: Google, Twitter and BGS's image database: GeoScenic [80] all of which contained images of landslides, and non-landslides. 6,284 images were downloaded from Google by querying landslide-related keywords such as landslide, landslip, earth slip, mudslide, rockslide and rock fall. We developed a multi-lingual list currently comprising 339 keywords in 32 languages: English, Albanian, Arabic, Basque, Bengali, Bosnian, Catalan, Chinese, Croatian, Dutch, French, Georgian, German, Greek, Hindi, Hungarian, Indonesia, Iranian, Italian, Japanese, Korean, Malaysia, Philippines, Polish, Portuguese, Romanian, Russian, Sesotho, Slovenian, Spanish, Swedish, and Turkish (Appendix 1). A total of 1,153 images were collected from Twitter through its Streaming API using the same keywords. In addition, 4,300 photographs were donated by the GeoScenic database that were known to be associated with fieldtrips involving landslides. Three landslide specialists, co-authors of this paper, then carried out an independent yes/no landslide interpretation on the 11,737 photographs using the methodology described below. Fig. 2 shows examples of collected photographs divided into 'landslides' and 'not landslides' that demonstrates the kind of noise associated with image harvesting.
Although manually curated, keywords were used to acquire images from Twitter and Google; the resultant images are not always related to landslides and often contained irrelevant and noisy content. This demonstrates why the use of text-based data collection alone is not enough to gather landslide-related reports from social media or the Internet. While the images from the GeoScenic database were known to be associated with fieldtrips involving landslides, the set included both landslide and non-landslide photographs. Therefore, the collected images needed to be evaluated manually by the landslide specialists. Since the AI task is "given an image, recognise landslide" without any other external information or expert knowledge available to the AI model, the landslide specialists were tasked to devise a labelling methodology while keeping this "computer vision" perspective in mind.

Expert-labelling methodology
The decision-making process carried out for the purpose of training the computer model to identify landslide features in photographs differs from conventional desk-or field-based landslide identification familiar to the geologist. Expert assessment of photographs involved the application of several assumptions as outlined in the following methodology.
1.There is no contextual knowledge or previous understanding of the landslide. A data-gathering exercise would usually be carried out by landslide specialists to gain as much ground information as possible before any interpretations are made. This requires a different approach. Here, information such as any landslide nomenclature, ground conditions, antecedent meteorological context or geographic region are excluded from the decision-making process. 2. Each photograph must be treated in isolation. This may show all or part of a landslide and is confined to one viewpoint. Ideally, conventional landslide analysis involves viewing the landslide from several different perspectives and scales before an interpretation is made. 3. The model does not discriminate landslide 'type' (i.e. [81,82], but aims to recognise zones of depletion (where the material has come from) and accretion (where it has been deposited). This excludes, therefore, events where the landslide debris has been removed by coastal or fluvial erosion or where a landslide has been remediated. 4. The model aims to show contemporary landslides. This means older but perhaps still active or dormant landslides are omitted from the model. Examples of this may include landslides that are slow moving or cyclic but are nonetheless active. Fully vegetated landslides may also fall into this category if there is no exposure of geological materials (e.g., rock or earth; Fig. 3). 5. In order to train the model there was a requirement for a clear representation of a landslide as the major component of the image (e. g., Fig. 4). 6. Where representation was borderline, consideration was given to whether the end user would be concerned by the image being returned as a landslide, e.g., in the situation where another geomorphological feature such as a retaining wall or a sinkhole might be returned as a landslide. Borderline cases are broadly grouped as (Fig. 5A) backscarps and extensional that could be faults (Fig. 5B), material engulfing buildings that could be the landslide deposit but could also have formed through other natural or manmade processes (Fig. 5C), debris falling onto roads that could be a landslide deposit or vegetation or mixed debris not associated with landsliding and (Fig. 5D) rivers in flow or flood channels that have a similar appearance to debris flow channels.
Once the dataset creation and model training stages were completed, the demonstrator model was run using Twitter images in realtime. Fig. 6 illustrates the workflow involved in collecting, tagging and classifying images as 'landslide' and 'not landslide'.

Expert-labelling results
Using the methodology outlined above, the three landslide specialists carried out independent yes/no interpretations of 11,737 photographs. In order to ensure reliability of the final labels, an analysis was carried out to measure their agreement using two statistical measures: Fleiss' Kappa [83] and percentage agreement (observer agreement). Despite the inherent difficulty of the labelling task, the three landslide specialists achieved good overall agreement. An overall Fleiss' Kappa score of 0.58 was achieved, which indicates an almost 'substantial' inter-annotator agreement between the three landslide specialists. The percentage agreement is 76%, which is only slightly below the 80% mark set as a rule-of-thumb by Ref. [84].
Since the ultimate goal is to develop a system that will monitor the noisy social media streams continuously to detect landslide reports in real-time, negative (i.e., not-landslide) images were also retained in the dataset to represent completely irrelevant cases (e.g., cartoons, advertisements, selfies) as well as difficult scenarios (i.e., those which may look similar to landslides) such as post-disaster images from earthquakes and floods in addition to other natural scenes without landslides for model training purposes. The distribution of the images in the final dataset across different categories and data sources are summarised in Table 1.
As suggested by the table, only about 23% of the images are labelled as landslide in the final dataset. This shows an imbalanced class distribution, which presents a challenge in model training simply because the model may decide to always predict not-landslide and achieve 77% accuracy (because of the skew in the distribution) but this would not be useful at all. Solutions to problems like this (i. e., finding a needle in the haystack) do always need to deal with the class imbalance issue meaning the training set presented here reflects this realistic scenario.

Demonstrator model results
The demonstrator model presented here is developed by training a convolutional neural network on the dataset introduced in this paper. Specifically, the dataset was split into training, validation, and test sets following a 70%, 10%, and 20% ratio, respectively. Then, a model based on ResNet-50 architecture was trained using Adam optimizer with an initial learning rate of 10 − 4 and a weight decay of 10 − 3 . On the test set, the trained model achieves an overall Accuracy = 87% and Precision = 74%, Recall (Sensitivity) = 67%, and F1-score = 70% for the landslide class. For more details about the extensive model development experiments, we refer the reader to Ref. [78]. We deployed the system online in February 2020 to monitor the live Twitter data stream and it has collected more than 54 million tweets and 15 million image URLs. Only about 2.5 million of these image URLs were deemed unique and downloaded for further analysis. The system identified about 38,000 landslide reports (including near-and-exact duplicates) worldwide, which corresponds to less than 1% of the collected image URLs and highlights the challenging nature of the problem. More details about this system deployment can be found in Ref. [85]. To verify the system performance in the real-world, 3,600 images that were deemed relevant and non-duplicate by the system were randomly sampled and labelled by the system as landslide and non-landslide images. System-predicted labels were then compared with expert annotations to evaluate the performance of the demonstrator model. There were 123 images correctly labelled by the system as landslides (true positives) and 3,395 images correctly labelled as non-landslides (true negatives). On the other hand, there were 39 images that were incorrectly labelled as landslides (false positives) and 43 images that were incorrectly labelled as non-landslides (false negatives). This quantitative verification exercise showed that the demonstrator model can detect landslide reports with Accuracy = 98%, Precision = 76%, Recall (Sensitivity) = 74%, and F1-score = 75%. Below, we further evaluate the performance of this demonstrator model qualitatively on a few example images with the help of heat maps or class activation maps [86], which highlight the discriminative parts of a photograph that the model is paying attention to (Tables 2 and 3). The confidence scores are computed by the demonstrator model using the SoftMax function.

Discussion
The aim of this work was to develop a system that monitors social media continuously and in real-time for general landslide-related content, using the landslide classification model to identify and retain the most relevant information. The system harvests photographs from these data and tags each image as landslide or not-landslide. A training model was developed through interdisciplinary working by the authors to establish a large image dataset that has then been applied to the live Twitter data stream.
The demonstrator model is currently running live and landslide images are being harvested in real-time (https://landslide-aidr.  qcri.org/service.php). This is publicly available and users can filter by date and country as well as being able to explore data spatially via a map interface. The map interface uses a range of factors to geolocate data markers. The current version prioritises location-based text within Tweets over geolocation data or stated location of the user. While not perfect, this allows the map to display landslide sites and prevents it from becoming purely a representation of user locations. If geolocation data are used, the location is downgraded to adhere to rules around viewing geodata [87,88] and to protect user privacy. Future improvements to locating data are discussed below. Also available on the demonstrator model website is the list of keywords from Appendix 1 used to initially extract Tweets. We invite users to provide feedback on both the demonstrator itself and the list of keywords via the link above. Once feedback has been collated, we plan to carry out future iterations to move this work from a demonstrator model to an operational service.
The image interpretation process used by the three landslide specialists was iterative in the initial phase of work. To maintain consistency of agreement, the methodology described above was established through much interdisciplinary discussion, which led to a phase of reinterpretation. While this put demands on the landslide specialists, the combined understanding produced this novel methodology with high levels of agreement.
The methodology aimed to identify landslide features, but the task was not to discriminate scale, meaning that images labelled as landslides may be very small (<1 m and not strictly a landslide) and aerial photographs including multiple landslide events are not captured by the model (e.g., Fig. 7). Further iterations of this work could use more sophisticated object detection or image segmentation techniques to solve this issue.
Future work will include a Geolocation Inference module that will use Tweet metadata to geolocate images following the approach   [89] for spatial analysis of various factors associated with the COVID-19 pandemic. An automated real-time geographic representation of landslide locations will be developed. Understanding the location of landslides is an important element of this work as there may not be the magnitude of data compared to other hazards such as earthquakes. However, there are ethics to be considered as part of this location-based work such as that adopted by the UK government through the Data Ethics Framework [90] and the [91]. The work described in this paper could also be adapted to complement other hazard inventories, such as snow avalanches.
It is important to reiterate that this work is not intended to be used in isolation during a disaster scenario. As well as the inherent noise within the data content itself, there are inaccuracies that could, for example in the worst case, hinder rescue operations if not combined with other data sources. Disaster managers should note that this work does not take into account: 1. Areas without mobile or internet coverage (even if temporary). As natural hazards cause damage to infrastructure, this may lead to mobile phone or internet outages meaning information cannot be published to social media. 2. The geographical variation in population density. In densely populated areas, there are likely to be more relevant Tweets due to numbers of people that could skew the data away from less densely populated areas that may have suffered greater damage. 3. Variations in use of social media (i.e. Twitter) as a result of trends in national or regional uptake or demographics. 4. Photographs that are embedded as thumbnails in web page links in Tweets. For example, an article published by the news media with photographs that was included in a Tweet is currently excluded.
For these reasons, the authors recommend that this work is used as a tool to provide additional information to established workflows for disaster management.
For landslides research, such as that involving national or regional landslide databases, it is hoped that this work will introduce considerable efficiency savings for institutions responsible for maintaining this workflow. Images of landslide events and impacts will be available automatically and social media is trawled in a systematic and continuous way. This has been adapted to the terminologies used in different countries through the list of keywords. The authors would like to improve this list to make the operational model more accurate.

Conclusion
This paper demonstrates the potential application of artificial intelligence for landslide recognition in images harvested from social media. In this study, we aimed to develop a model that can detect landslides in social media image streams automatically and in realtime. For this purpose, we created a large image collection from multiple sources with different characteristics to ensure data diversity. The collected images were assessed by three landslide specialists independently to attain high quality labels with almost substantial inter-annotator agreement. The assessment methodology is described and is the result of interdisciplinary working between geologists, computer scientists and social media specialists. The resulting model achieved high performance in terms of accuracy scores, which can be deemed sufficient for the purpose. The demonstrator model is publicly available and running in real-time and the authors invite feedback. There are a number of potential applications for this research. In this account image processing has been focused on "fresh" landslides as evidenced by the exposure of geological materials, which lends itself to the focus on the potential for Disaster Risk and Resilience. This paper is published in association with a technical paper that describes the model in detail.

Funding
This article was partially funded by the European Union's (EU) Horizon 2020 Research and Innovation Program under Grant Agreement RISE Number 821115. Opinions expressed in this article solely reflect the authors' views; the EU is not responsible for any use that may be made of information it contains.
The British Geological Survey (UK Research and Innovation) granted supporting research funding through National Capability (Shallow Geohazards) and Innovation funding streams.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments
This paper presents results from interdisciplinary research carried out by computer scientists at the Qatar Computing Research Institute (QCRI), earthquakes and social media specialists at the European-Mediterranean Seismological Centre (EMSC) and landslide hazard expertise from the British Geological Survey (BGS). The authors would like to thank those who donated photographs from their archives for the training model and for discussions around the applicability of this tool: Professor Dave Petley at Sheffield University for advice; Jo Walsh and Brian McIntyre at the British Geological Survey for collating photographs from the UK National Landslide Database and GeoScenic photograph archive. This work is about Twitter, not by Twitter; the terms 'Twitter', 'Tweet' and the Twitter Bird logo featured in Fig. 6 are trademarks of Twitter Inc. or its affiliates.