Latest news
Spam Filtering with gzip
Loosely speaking, the LZ (Zip) and the related gzip compression algorithms look for repeated strings within a text, and replace each repeat with a reference to the first occurrence. The compression ratio achieved therefore measures how many repeated fragments, words or phrases occur in the text.
A related technique allows us to measure how much a given, "test" text has in common with a corpus of possibly similar documents. If we concatenate the corpus and the test text, and gzip them together, the test text will get a better compression ratio if it has more fragments, words, or phrases in common with the corpus, and a worse ratio if it is dissimilar. Since the LZ algorithm scans the entire input for repetitions, it tends to map pieces of the test text to previous occurrences in the corpus, thereby achieving a high "appended compression ratio" if the test text is similar to what it's appended to.
In this case, we wish to compare an incoming email message against two possible corpora: spam and non-spam (ham). If we maintain archives of both, we can compare the appended compression ratios relative to each, to judge how similar a new message is to spam or ham.
[ Read more ]
![]()
Related items
- Software: MailScanner
- Software: Xyria:DNSd
- Software: Super Webscan
- Software: SpamPal
- Software: Revelation
- Software: Mail Snoop Pro
- Article: WorldCom Announces their Anti-Spam Solution (9 December 2002)
- Article: Network Associates Fights Spam (30 October 2002)
- Article: Spam Wars - Rise of the Spam (16 May 2002)
- Article: Spam: The problems with junk e-mail (8 April 2002)
- Article: The six headed spam monster (1 April 2002)
Spotlight

A closer look at Mega cloud storage
Posted on 21 May 2013. | Once a novelty, nowadays many cloud storage services are fighting for their piece of the market in the virtual world. Mega offers 50GB of free space with great pricing on Pro accounts.

The CSO perspective on healthcare security and compliance
Posted on 20 May 2013. | Randall Gamby is the CSO of the Medicaid Information Service Center of New York. In this interview he discusses healthcare security and compliance challenges and offers a variety of tips.

Cyber espionage campaign uses professionally-made malware
Posted on 20 May 2013. | A massive cyber espionage campaign has been hitting government ministries, IT companies, academic research institutions, and more.

Ransomware adds password stealing to its arsenal
Posted on 17 May 2013. | Microsoft researchers are warning about a new variant of the well-known Reveton ransomware doing rounds.

IT security jobs: What's in demand and how to meet it
Posted on 15 May 2013. | Let's say you want a career in information security, where do you start? What credentials do you need? What are employers looking for? Read on to find some answers.
By subscribing to our early morning news update, you will receive a daily digest of the latest security news published on Help Net Security.
With over 500 issues so far, reading our newsletter every Monday morning will keep you up-to-date with security risks out there.






