PhishAri : Real-Time Phishing Detection on Twitter

We, at PreCog, not only do research but also try to build products based on our work for end-users. More often than not, developing scalable, real systems can be a challenging task; much more than just developing the underlying algorithm. It feels good to be part of a research group which has given me perspective to understand the need to create a bridge between research and real-world solutions. Here goes my first PreCog blog entry on one such product we (where I’m the lead) are developing, which aims to detect phishing on Twitter.

There has been a lot of research and publications on spam detection on online social media, but there do not exist many real-world products which use these intelligent solutions. When we started with detection of phishing on Twitter, we decided to build a real-time system for Internet users based on our research which we named – PhishAri. Before we move on to how we built PhishAri, any guesses on what the name means? Well, its a combination of two words – Phish + Ari. “Phish” stands for “phishing” in short and “Ari” means “enemy” in Sanskrit; PhishAri combats phishing by detecting phishing URLs spread through Twitter.

From our previous studies and some prior work in this area, we identified various features which we could use for phishing detection on Twitter. Some of these features include attributes of the URL, properties of the tweet and Twitter user who posts the tweet. We thought that the best way to reach out to most Internet users would be by using a browser extension. So, now after someone installs PhishAri browser extension, whenever he logs on to Twitter, he sees a small color-coded indicator in front of any URL in the tweets in his timeline or Twitter search results; green indicates that the URL is safe and red indicates a phishing URL. Since this solution is seamlessly built into the browser, it is hassle free and requires no other additional software or packages to be installed other than the browser you use and the PhishAri extension. Currently, PhishAri extension is available only for Chrome browser, but we’ll soon launch it for FireFox and other browsers too.

Now, let’s dive into the nitty-gritty of PhishAri. The browser extension (written in JavaScript) is the front-end of the entire system which does very little processing and only shows the appropriate indicator beside every URL. Now comes the meat of the solution : a web-application hosted on a separate server which the extension uses to make decisions on which indicator to show in front of each URL. The web-application is written in python using web.py framework hosted on an Apache server. The extension takes the URL from tweet & the tweet id and sends it to the web-application as a GET request. The web-application takes this URL & the tweet id and creates the feature-vector based on the attributes of the URL and the tweet which are used for phishing detection. The web-application then uses machine learning classification to classify the URL as phish / legitimate. The extension again makes a GET request to the web-application to receive a JSON object which is a string, indicating class of the URL; accordingly, extension shows a red indicator if the class is ‘phishing‘ and a green indicator if it is ‘legitimate‘.

Currently, PhishAri works with an accuracy of 87.2%, we are still in process of making it stronger and more effective. The extension is easily downloadable from Chrome Web Store. We are trying to add more features and strengthen the underlying classifier to make PhishAri more efficient. Any feedback is warmly welcomed. If you use Twitter, do give it a try!

anupama