Exploring Linguistic Features for Web Spam Detection: A Preliminary Study
We study the usability of linguistic features in theWeb spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, we make them publicly available for other researchers. Preliminary analysis seems to indicate that certain linguistic features may be useful for the spam-detection task when combined with features studied elsewhere.
SYDOW Marcin;
WEISS Dawid;
PISKORSKI Jakub;
2008-07-08
ACM
JRC45828
Additional supporting files
File name | Description | File type | |