TY - GEN
T1 - How to improve your google ranking
T2 - 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010
AU - Su, Ao Jan
AU - Hu, Y. Charlie
AU - Kuzmanovic, Aleksandar
AU - Koh, Cheng Kok
PY - 2010/12/13
Y1 - 2010/12/13
N2 - Search engines have greatly influenced the way people access information on the Internet as such engines provide the preferred entry point to billions of pages on the Web. Therefore, highly ranked web pages generally have higher visibility to people and pushing the ranking higher has become the top priority for webmasters. As a matter of fact, search engine optimization (SEO) has became a sizeable business that attempts to improve their clients' ranking. Still, the natural reluctance of search engine companies to reveal their internal mechanisms and the lack of ways to validate SEO's methods have created numerous myths and fallacies associated with ranking algorithms; Google's in particular. In this paper, we focus on the Google ranking algorithm and design, implement, and evaluate a ranking system to systematically validate assumptions others have made about this popular ranking algorithm. We demonstrate that linear learning models, coupled with a recursive partitioning ranking scheme, are capable of reverse engineering Google's ranking algorithm with high accuracy. As an example, we manage to correctly predict 7 out of the top 10 pages for 78% of evaluated keywords. Moreover, for content-only ranking, our system can correctly predict 9 or more pages out of the top 10 ones for 77% of search terms. We show how our ranking system can be used to reveal the relative importance of ranking features in Google's ranking function, provide guidelines for SEOs and webmasters to optimize their web pages, validate or disapprove new ranking features, and evaluate search engine ranking results for possible ranking bias.
AB - Search engines have greatly influenced the way people access information on the Internet as such engines provide the preferred entry point to billions of pages on the Web. Therefore, highly ranked web pages generally have higher visibility to people and pushing the ranking higher has become the top priority for webmasters. As a matter of fact, search engine optimization (SEO) has became a sizeable business that attempts to improve their clients' ranking. Still, the natural reluctance of search engine companies to reveal their internal mechanisms and the lack of ways to validate SEO's methods have created numerous myths and fallacies associated with ranking algorithms; Google's in particular. In this paper, we focus on the Google ranking algorithm and design, implement, and evaluate a ranking system to systematically validate assumptions others have made about this popular ranking algorithm. We demonstrate that linear learning models, coupled with a recursive partitioning ranking scheme, are capable of reverse engineering Google's ranking algorithm with high accuracy. As an example, we manage to correctly predict 7 out of the top 10 pages for 78% of evaluated keywords. Moreover, for content-only ranking, our system can correctly predict 9 or more pages out of the top 10 ones for 77% of search terms. We show how our ranking system can be used to reveal the relative importance of ranking features in Google's ranking function, provide guidelines for SEOs and webmasters to optimize their web pages, validate or disapprove new ranking features, and evaluate search engine ranking results for possible ranking bias.
UR - http://www.scopus.com/inward/record.url?scp=78649895829&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78649895829&partnerID=8YFLogxK
U2 - 10.1109/WI-IAT.2010.195
DO - 10.1109/WI-IAT.2010.195
M3 - Conference contribution
AN - SCOPUS:78649895829
SN - 9780769541914
T3 - Proceedings - 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010
SP - 50
EP - 57
BT - 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010
Y2 - 31 August 2010 through 3 September 2010
ER -