Call for Papers (Archived)
The ubiquitous use of search engines to discover and access internet content shows clearly the success of information retrieval algorithms. However, unlike controlled collections, the vast majority of the Web pages lack an authority asserting their quality. This openness of the Web has been the key to its rapid growth and success, but this openness is also a major source of new adversarial challenges for information retrieval methods.
Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a good ranking on them is strongly correlated with more traffic, which often translates to more revenue.
This, the third AIRWeb workshop, builds on the previous successful meetings at Chiba, Japan as part of WWW2005, and at Seattle, USA as part of SIGIR2006. This workshop provides a focused venue for both mature and early-stage work in web-based adversarial IR.
We solicit both full and short submissions on any aspect of adversarial information retrieval on the Web. Particular areas of interest include, but are not limited to:
- Link spam: nepotistic linking, collusion, link farms, link exchanges and link bombing.
- Content spam: keyword stuffing, phrase stitching, and other techniques for generating synthetic text.
- Cloaking: sending different content to a search engine than to regular visitors of a web site, which is often used in combination with other spamming techniques.
- Comment spam in legitimate sites: in blogs, forums, wikis, etc.
- Spam-oriented blogging: splogs, spings, etc
- Click fraud detection: including forging clicks for profit, or to deplete a competitor's advertising funds
- Reverse engineering of ranking algorithms
- Web content filtering: as used by governments, corporations or parents to restrict access to inappropriate content
- Advertisement blocking: developing software for blocking advertisements during browsing
- Stealth crawling: crawling the Web while avoiding detection
- Malicious tagging: for injecting keywords or for self-promotion in general
Papers addressing higher-level concerns, e.g., whether "open" algorithms can succeed in an adversarial environment, whether permanent solutions are possible, how the problem has evolved over years, what are the differences between "black-hat", "white-hat", and "gray-hat" techniques, where is the line between search engine optimization and spamming, etc., are also welcome.
At least three anonymous reviews will be provided per paper, judged on the basis of relevance, originality, quality, and presentation. Proceedings of the workshop will be included in the ACM Digital Library. Substantially longer papers may also be submitted to ACM Transactions on the Web (TWEB): special issue on adversarial issues in Web search.
Full papers are limited to 8 pages; work-in progress will be permitted 4 pages. See submission instructions for details.
Web Spam Challenge
This year, we are introducing a novel element: a Web Spam Challenge for testing web spam detection systems. This challenge is supported by the EU Network of Excellence PASCAL Challenge Program, and by the DELIS EU-FET research project. Participation is open to all.
We will be using the WEBSPAM-UK2006 collection for Web Spam Detection that comprises a large set of web pages, a web graph, and human-provided labels for a set of hosts. To reduce the amount work due to data processing, we provide a set of features extracted from the contents and links in the collection, which may be used by the participant teams in addition to any automatic technique they choose to use.
We ask that participants of the Web Spam Challenge submit predictions (normal/spam) for all unlabeled hosts in the collection. Predictions will be evaluated and results will be announced at the AIRWeb 2007 workshop. See the Web Spam Challenge website for details.
Participation on the challenge does not require a paper submission, and researchers submitting papers are not required to participate in the challenge. However, we encourage that participants of the Web Spam Challenge also submit research articles describing their systems, and we encourage authors submitting research articles on Web Spam detection to participate in the challenge.
Students who author or co-author accepted papers at AIRWEB 2007, are eligible for a grant to support their travel or registration. These grants are possible by a sponsorship from Microsoft.
Up to three students will be supported, with an expected level of support of USD$500 each. If you are a student authoring or co-authoring an accepted paper at AIRWEB 2007, please send an e-mail to the organizers by March 30th, 2007, indicating your name, school, the purpose of the grant (travel/registration) and the amount requested.
7 February 2007: E-mail intention to submit (optional, but helpful) 26 February 2007: Extended deadline for submissions
- 20 March 2007: Notification of acceptance
- 30 March 2007: Camera-ready copy due
- 30 March 2007: Challenge submissions due
- 8 May 2007: Date of workshop
2007 Organizing Committee
- Carlos Castillo, Yahoo! Research
- Kumar Chellapilla, Microsoft Live Labs
- Brian D. Davison, Lehigh University
2007 Program Committee
- Einat Amitay, IBM Research
- András Benczúr, Hungarian Academy of Sciences
- Andrei Broder, Yahoo! Research
- Soumen Chakrabarti, Indian Institute of Technology Bombay
- Paul-Alexandru Chirita, University of Hannover
- Tim Converse, Powerset
- Nick Craswell, Microsoft Research
- Matt Cutts, Google
- Ludovic Denoyer, University Paris 6
- Aaron D'Souza, Google
- Dennis Fetterly, Microsoft Research
- Tim Finin, University of Maryland
- Edel García, Mi Islita.com
- Natalie Glance, Nielsen BuzzMetrics
- Antonio Gulli, Ask.com
- Zoltán Gyöngyi, Stanford University
- Monika Henzinger, Google & Ecole Polytechnique Fédérale de Lausanne (EFPL)
- Jeremy Hylton, Google
- Ronny Lempel, IBM Research
- Mark Manasse, Microsoft Research
- Gilad Mishne, University of Amsterdam
- Marc Najork, Microsoft Research
- Jan Pedersen, Yahoo!
- Tamás Sarlós, Hungarian Academy of Sciences
- Erik Selberg, Microsoft Search Labs
- Mike Thelwall, University of Wolverhampton
- Andrew Tomkins, Yahoo! Research
- Matt Wells, Gigablast
- Baoning Wu, Snap.com
- Tao Yang, Ask.com