Who controls the internet?
“The Internet is built on top of intertwined network services, e.g., email, DNS, and content distribution networks operated by private or governmental organizations. Recent events have shown that these organizations may, knowingly or unknowingly, be part of global-scale security incidents including state-sponsored mass surveillance programs and large-scale DDoS attacks.
For example, in March 2015 the Great Cannon attack has shown that an Internet service provider can weaponize millions of Web browsers and turn them into DDoS bots by injecting malicious JavaScript code into transiting TCP connections. While attack techniques and root cause vulnerabilities are routinely studied, we still lack models and algorithms to study the intricate dependencies between services and providers, reason on their abuse, and assess the attack impact.
To close this gap, we present a technique that models services, providers, and dependencies as a property graph. Moreover, we present a taint-style propagation-based technique to query the model, and present an evaluation of our framework on the top 100k Alexa domains.” – WWW Conference Session Description
Presented by Milivoj Simeonovski
Nobody owns the internet, there is no centralized control. It was designed to be decentralized and distributed
Today the internet is more complex, yet still no central authority. The internet is governed collaboratively by a number of operators/governments
How important are these organizations?
These operators have played the role of an attacker or were attacked
Example hacks:
- PRISM (coercion attack)
- the great cannon (DDoS)
- DynDNS (DDoS)
What can we do?
- We understand techniques and vulnerabilities
- Analysis techniques tend to focus on a single core service
New study
- Framework to model a reason on global scale threats
- Technique for quantifying the impact
and assess the technique
Methodology
- Framework for identifying dependences on the internet
- Labeled property graph
- Once you understand the relationships, you can do what-if analysis.
- Propagation rules define how the roles are compromised
Alexa top 100k domains were evaluated
- DNS lookup
- public available resources
- 8 m nodes
Distribution of JS malicious content, the great cannon attack
- Players on the internet
- Identify attack victims and/or attackers
- based on popularity and potential influence
- 6 metrics were measured
First order metrics
- hosted JS libraries providers
- The vast majority of JS providers is in the united states
- Attacker uses in-path content injection to achieve their goal.
- Around 83% of JS is distributed via http, not https.
- Google and Cloudflare were the top providers.
Conclusion
- Framework that supports what-if analysis
- Evaluated the framework built upon the top 100k alexa domains
- Assessed the impact of the attackers an identified the most influential internet players
- Countries can influence each other by JS injection on in-path TCP connections
Tools for Automated Analysis of Cybercriminal Markets
We have developed an automated, top-down approach for analyzing underground forums. Using natural language processing and machine learning, our approach can automatically generate high-level information about underground forums with high accuracy.
Automated Analysis of Cybercriminal Markets
Social networks are the backbone for underground/criminal activities. These were analyzed for the sale of stolen information. They provide valuable insight into cyber crime. The December 2013 Target data breach was found via monitoring the forums.
Analyzing forums is challenging
Manual analysis is time consuming – hundreds of thousands of posts and require domain experience. Sample forum posts:
- “hai hai, I wan na buy 1000 clean US installsâ€
- “i’m buying people stuff from amazon for 90% of the priceâ€
Analysis framework
- SVM classifier detects buy/sell
- latent SVM model extracts product name
- SVMÂ classifier extracts price/payment amount
- SVMextracts currently exchanged erate and denomination
Pipeline
- Annotate
- Extract Features
- Train Model
- Test Model
Sample Query used in presentation:
“I need someone to make me a FUD auto run every weekâ€
Translation:
They need someone to create a Fully UnDetectable build of their malware every week
Categorized as a Buy
- frequency of n-grams
- length
- part of speech
- dependences relations
- user information: ranking noob
Product extractor
- “a keylogger coded completely in ASMâ€
- “need somebody too mod DCIBot for me add… Update Cmdâ€
back to original example. FUD Autorun is the product
I need someone to make me a FUD auto run every week
- frequency of n-grams
- part of speech
- dependency relations
- word position
- line position
Price extractor
Bootkit:
- price: up to 15k
- payment: WM, LR
- PM with offers if any
extract price
- Frequency of n-grams
- part of speech
- word position
- line position
End to end assessment
All tools never made an error in same post
Tracking Phishing Attacks over Time
Unique phishing sites are growing at a steady rate
hypothesis
Instead of building new phishing sites, blocked ones are modified and released again.
reality
This is relatively slow, as it may require human intervention
Methodology
- Extract tag vector
- Calculate proportional distance
- Clustering Algorithm
Data Set
Phishing sites: Phish Tank (community gathered site)
- 21000 verified sites
- Compared to legitimate sites in Alexa
- Repeatedly reported attacks vs relaunched attacks
Security challenges in an increasingly tangled web
The web is getting more challenging. The LA Times’ home page makes 1,597 total requests
- only 21 from http://www.latimes.com/
- 80 external networks
- 8 countries
What is the state of web complexity?
Median dependencies have almost doubled dependencies in past 6 years. 2x overall and external resources since 2011. Who do they depend on?
- Google (82.2% of top million web sites)
- Amazon
- Cloudflare
- Akamai
Smaller providers
- MaxCDN (Bootstrap CDN) 19%
- Edgecast
- Fastly
- SoftLayer
Websites are increasingly dependent on common providers
Type of resource
- analytics/tracking 75%
- CDN
- Advertising
- Social Media
- API/service
Why do we care?
Attackers care!
2013 – Bootstrap CDN was compromised and served malware
How does a complex web impact user’s trust?
Trust
Website -> Appnexus, google, rubicon, aol (trusted ad delivery) -> talk925.pw trackmytraffic.biz (implicitly trusted provider)
What happens when a site no longer knows what will be provided by secondary resources with implicit trust
33% of sites load at least one implicitly trusted resource
- tv loads 103
- ru loads implicit resources at depth of 17
Advertisers are primarily loading implicit trust resources
How a complex web impacts widespread https deployment
To use https, a web site’s dependencies must also be available as https. When these are not available, the user will be alerted about the mixed security state. Therefor, many web sites continue to load as http instead of https.
http website requests resources from the following ads/tracking
To move to https, those secondary resources must also be in https. Google ads are available, but scorecard research is not https. Most top websites still don’t use https, because of dependent resources (https blockers)
- 65% of sites loaded over http
- 45% of sites can immediately upgrade
- 55% are blocked
Biggest offenders
40% are ads, 32% are analytics/tracking
When it comes to security, we always must remember the weakest link. Measure the web recently and call out problems when you see it. Build and deploy mechanisms the tenable widespread resource integrity