Fighting SMS-based phishing attacks
Luxembourg has been experiencing a substantial increase in SMS traffic, including malicious SMS-based attacks.
The SMS is coming back. Banking apps use it for authentication, commercial websites advertise by texting their customers, governments use text messaging to inform citizens, airlines use it to notify passengers about flight changes…
Moreover, with mobile virtual network operators (MVNO) leveraging Cloud computing and existing network operators, sending SMS has never been easier or cheaper.
However, this widespread use of SMS also benefits cyber-attackers, who are carrying more and more SMS-based attack campaigns for spamming, invading your privacy, threatening and phishing.
What phishing is
Phishing is an attack technique that consists in sending you – the rich and curious target audience – a message containing malicious URLs, luring you into following those harmful URLs. You are then asked to provide sensitive information, such as banking credentials or personal data. This technique proves to be effective as subscribers tend to trust SMS more than other means of communication.
At POST Luxembourg, we are committed to fighting SMS phishing through cutting-edge machine learning and real-time big-data technologies.
How machine learning automatically detects SMS phishing
The challenge we have to deal with is large-scale and real-time SMS phishing detection. It is interesting yet challenging at the same time. Some obstacles are:
– Short content
The short SMS only features a URL and a few words inviting you to open the link.
– The link cannot be inspected
Opening the link would invalidate it, just like with a reset password link.
– Inspection has to be 100% automatic
Manual inspection is neither allowed – due to privacy concerns – nor possible – due to the amount of data involved.
How to train the machine
From a machine learning perspective, we have to classify good URLs, like google.com, yourcompany.xx , etc. from bad URLs, such as apple-iforget.com or gooole.com.
This classification problem involves natural language processing. To solve it, we need to ask ourselves: What constitutes a bad URL? What are the tricks used by attackers? And so on. It turns out that there are over 30 features that define a bad URL. Below are some examples:
– Unusual top/second level domain (TLD)
Such as .landing.myjino.ru, .abira.hokkaido.jp as opposed to very common ones such as .com, .org, or .co.uk. Thankfully, Mozilla maintains a list of 8,000+ TLDs we can use for this purpose.
– Repetitive characters
Like googgle, amaazon are easy to detect and a common feature of bad URLs.
– Word concatenation
If you have a string like GreatTechnology, it’s useful to split it logically in order to extract words from it. So GreatTechnology can be interpreted as (great,technology), but it can also be read as (eat,tech,nolo,ology,log,logy,great…). So we use technology to match the strings with actual words from a dictionary.
– Randomness in domain name and words used
In machine-generated URLs used for phishing, random strings of letters are commonly used.
To measure the randomness of a string, we train the machine using a large corpus of real words, allowing it to determine the likelihood that a certain sequence of characters is actually a real word. So h-a-p-p-y has a high probability and y-p-p-a-h has a low probability.
To sum up, we succeeded in engineering in-house infrastructure and machine learning technologies to cope with SMS traffic and detect SMS phishing in real-time. We also use public sources of SMS and phishing URLs like the Phishtank.com database for the training purposes.
The battle goes on
POST has detected approximately one phishing campaign per week since the deployment of its solution.
Our SMS phishing detection is up to 96% accurate, meaning that we detect almost all phishing attacks with a negligible false positive rate.
Nevertheless, attackers are frequently changing their tactics, which is why our team is continuously innovating to build machine learning models that can efficiently fight the newest phishing methods.
I have 15+ years of proven broad and deep experience in machine learning, computer security, and secure software development. I received my Ph.D. degree with a distinguished dissertation in the field of Artificial Intelligence and Software Engineering from the University of Trento (Trento, Italy) in 2009. Before joining POST Luxembourg as a Data Scientist and Security Expert, I was a researcher at the University of Luxembourg and have published 50+ scientific papers to prestigious international conferences and journals. Many of the published works aimed to address practical problems in the industry using advances in machine learning and software security research.