Security Notes WWW Conference 2017

Who controls the internet?

“The Internet is built on top of intertwined network services, e.g., email, DNS, and content distribution networks operated by private or governmental organizations. Recent events have shown that these organizations may, knowingly or unknowingly, be part of global-scale security incidents including state-sponsored mass surveillance programs and large-scale DDoS attacks.

For example, in March 2015 the Great Cannon attack has shown that an Internet service provider can weaponize millions of Web browsers and turn them into DDoS bots by injecting malicious JavaScript code into transiting TCP connections. While attack techniques and root cause vulnerabilities are routinely studied, we still lack models and algorithms to study the intricate dependencies between services and providers, reason on their abuse, and assess the attack impact.

To close this gap, we present a technique that models services, providers, and dependencies as a property graph. Moreover, we present a taint-style propagation-based technique to query the model, and present an evaluation of our framework on the top 100k Alexa domains.” – WWW Conference Session Description

Presented by Milivoj Simeonovski

Nobody owns the internet, there is no centralized control. It was designed to be decentralized and distributed

Today the internet is more complex, yet still no central authority. The internet is governed collaboratively by a number of operators/governments

How important are these organizations?

These operators have played the role of an attacker or were attacked

Example hacks:

  • PRISM (coercion attack)
  • the great cannon (DDoS)
  • DynDNS (DDoS)

What can we do?

  • We understand techniques and vulnerabilities
  • Analysis techniques tend to focus on a single core service

New study

  • Framework to model a reason on global scale threats
  • Technique for quantifying the impact
and assess the technique

Modeling Framework

Methodology

  • Framework for identifying dependences on the internet
  • Labeled property graph
  • Once you understand the relationships, you can do what-if analysis.
  • Propagation rules define how the roles are compromised

Alexa top 100k domains were evaluated

  • DNS lookup
  • public available resources
  • 8 m nodes

Distribution of JS malicious content, the great cannon attack

  • Players on the internet
  • Identify attack victims and/or attackers
  • based on popularity and potential influence
  • 6 metrics were measured
First order metrics
  • hosted JS libraries providers
  • The vast majority of JS providers is in the united states
  • Attacker uses in-path content injection to achieve their goal.
  • Around 83% of JS is distributed via http, not https.
  • Google and Cloudflare were the top providers.

Conclusion

  • Framework that supports what-if analysis
  • Evaluated the framework built upon the top 100k alexa domains
  • Assessed the impact of the attackers an identified the most influential internet players
  • Countries can influence each other by JS injection on in-path TCP connections

Tools for Automated Analysis of Cybercriminal Markets

We have developed an automated, top-down approach for analyzing underground forums. Using natural language processing and machine learning, our approach can automatically generate high-level information about underground forums with high accuracy.
Automated Analysis of Cybercriminal Markets

Social networks are the backbone for underground/criminal activities. These were analyzed for the sale of stolen information. They provide  valuable insight into cyber crime. The December 2013 Target data breach was found via monitoring the forums.

Analyzing forums is challenging

Manual analysis is time consuming – hundreds of thousands of posts  and require domain experience. Sample forum posts:

  • “hai hai, I wan na buy 1000 clean US installs”
  • “i’m buying people stuff from amazon for 90% of the price”

Analysis framework

  • SVM classifier detects buy/sell
  • latent SVM model extracts product name
  • SVM classifier extracts price/payment amount
  • SVMextracts currently exchanged erate and denomination

Pipeline

  • Annotate
  • Extract Features
  • Train Model
  • Test Model

Sample Query used in presentation:

“I need someone to make me a FUD auto run every week”

Translation:

They need someone to create a  Fully UnDetectable build of their malware every week

Categorized as a Buy
  • frequency of n-grams
  • length
  • part of speech
  • dependences relations
  • user information: ranking noob
Product extractor
  • “a keylogger coded completely in ASM”
  • “need somebody too mod DCIBot for me add… Update Cmd”

back to original example. FUD Autorun is the product

I need someone to make me a FUD auto run every week

  • frequency of n-grams
  • part of speech
  • dependency relations
  • word position
  • line position
Price extractor

Bootkit:

  • price: up to 15k
  • payment: WM, LR
  • PM with offers if any

extract price

  • Frequency of n-grams
  • part of speech
  • word position
  • line position
End to end assessment

All tools never made an error in same post

Tracking Phishing Attacks over Time

Unique phishing sites are growing at a steady rate

hypothesis

Instead of building new phishing sites, blocked ones are modified and released again.

reality

This is relatively slow, as it may require human intervention

Methodology

  • Extract tag vector
  • Calculate proportional distance
  • Clustering Algorithm

Data Set

Phishing sites: Phish Tank (community gathered site)

  • 21000 verified sites
  • Compared to legitimate sites in Alexa
  • Repeatedly reported attacks vs relaunched attacks

Security challenges in an increasingly tangled web

The web is getting more challenging. The LA Times’ home page makes 1,597 total requests

What is the state of web complexity?

Median dependencies have almost doubled dependencies in past 6 years. 2x overall and external resources since 2011. Who do they depend on?

  • Google (82.2% of top million web sites)
  • Facebook
  • Amazon
  • Cloudflare
  • Akamai
Smaller providers
  • MaxCDN (Bootstrap CDN) 19%
  • Edgecast
  • Fastly
  • SoftLayer
  • Twitter

Websites are increasingly dependent on common providers

Type of resource

  • analytics/tracking 75%
  • CDN
  • Advertising
  • Social Media
  • API/service

Why do we care?

Attackers care!

2013 – Bootstrap CDN was compromised and served malware

How does a complex web impact user’s trust?

Trust

Website -> Appnexus, google, rubicon, aol (trusted ad delivery) -> talk925.pw trackmytraffic.biz (implicitly trusted provider)

What happens when a site no longer knows what will be provided by secondary resources with implicit trust

33% of sites load at least one implicitly trusted resource

  • tv loads 103
  • ru loads implicit resources at depth of 17

Advertisers are primarily loading implicit trust resources

How a complex web impacts widespread https deployment

To use https, a web site’s dependencies must also be available as https. When these are not available, the user will be alerted about the mixed security state. Therefor, many web sites continue to load as http instead of https.

http website requests resources from the following ads/tracking

To move to https, those secondary resources must also be in https. Google ads are available,  but scorecard research is not https. Most top websites still don’t use https, because of dependent resources (https blockers)

  • 65% of sites loaded over http
  • 45% of sites can immediately upgrade
  • 55% are blocked

Biggest offenders

40% are ads, 32% are analytics/tracking

When it comes to security, we always must remember the weakest link. Measure the web recently and call out problems when you see it. Build and deploy mechanisms the tenable widespread resource integrity

Leave a Reply

Your email address will not be published. Required fields are marked *