The key aim of this project is to develop a system that will enhance the ability of tutors, lecturers, and professors to provide typed feedback while evaluating their students’ work. This will be based on the fact that most feedback is similar as students tend to make similar mistakes in their coursework. Therefore, the system should enable lecturers to re-use comments as is necessary.
In order to achieve its aim, the software system must achieve the following objectives:
Allow academic staff to enter new feedbackSearch and match text to feedback given in previous coursework as the tutor is typing (in real-time)Order search results by similarityHighlight closest matching fragmentsDisplay similar feedback from previous coursework in a pop-up windowProvide tutor with a mechanism to select appropriate feedback from the pop-up windowInsert the feedback the lecturer selects at the “current” location of the cursorDisplay marks breakdown for each category of feedback in the search resultsIn order to make the project a success, there has to be a good understanding of information retrieval concepts. These concepts will be necessary in assessing similarities between feedback comments. It is, therefore, necessary to evaluate different concepts in an attempt to identify the most appropriate one for the project. The most appropriate methodology should be able to use similarity between feedback comments to find appropriate text fragments. This section of this paper analyses existing literature on information retrieval, with emphasis on its history, definition, concepts, and models. It goes on to explore literature on the operation of search engines, explaining the concepts behind them, components, as well as the algorithms they use. Finally, the literature review analyzes the concept of auto-completion by outlining its history, components, and pros and cons.
3.1 Information retrieval
3.1.1 Definition of Information Retrieval
From a layman’s point of view, information retrieval is a culmination of two words, information and retrieval; information refers to knowledge that can be shared between persons; retrieval is the act of getting or picking something from a location where it is kept or stored. In simple terms, information retrieval is the process of accessing knowledge stored in a particular device or location. In information technology, however, the term “Information Retrieval” has a deeper meaning.
Information retrieval is the act of locating unstructured material that is relevant to some information need in large stacks of information (Manning et al., 2009).
According to Lancaster (1968), information retrieval does not educate or inform people, but rather informs them of the existence of the information they are seeking.
3.1.2 History of Information Retrieval
The concept of information retrieval has been there as long as humanity has been. However, information retrieval using technological tools is a fairly new concept. Just as many tools that rely on technology, information retrieval, as we know it today, started with the invention of punched cards in 1801.
The major breakthrough in information retrieval came about in, the early 20th Century, when Emanuel Goldberg patented a system that used pattern recognition to identify documents.
By the mid-20th Century, the US military became aggressively involved in information retrieval due to the challenges it faced analyzing documents it seized from the Germans during the Second World War. This was further stimulated by the perception that the former USSR, then a great challenger of the United States, was outpacing the US in technology. Thus, the US government invested heavily in researches meant to fast-track the development of information retrieval systems (Mooers, 2011). Around the same period, American universities also took a keen interest on the subject. The first documented experiment on computerized document retrieval took place at the Massachusetts Institute of Technology. Several conferences on Information Retrieval were conducted to find ways of using the concept to find solutions to prevailing problems.
The 1970s saw the deployment of the first online information Retrieval systems. These were majorly text-based systems. Many researches and conferences followed these were meant to provide a deeper understanding of the concept of information retrieval, especially on how to make it available to end-users (Only the military had access to effective information retrieval systems at that time). These efforts finally led to the realization of the World Wide Web, which initially came as a proposal by Tim Berners-Lee.
Presently, there are numerous information retrieval systems ranging from student records database systems, government health, tax, and immigration records. The World Wide Web, itself an information retrieval system, plays host to numerous information retrieval systems, including search engines such as Google, Bing, DuckDuckGo, Baidu, and Yahoo! (Howells, 2001).
3.1.3 The Concepts behind Information Retrieval
Technically, one can only retrieve what he or she has; hence, in essence, information has to be stored somewhere before it can be retrieved. In addition, for any information to be retrieved, a user needs to make an effort to do so. This is mostly achieved by entering a query into a search box.
Queries, normally strings of text, audio, or graphics, contain the user’s information needs. These needs are compared to the information stored in a database and the most similar results displayed. To achieve this, most searching algorithms rank results based on similarity to the query the users enter, and display the results from the most similar to the least (Levene, 2011).
It is important to note that information retrieval does not necessarily happen in the primary database where the information is stored. Instead, documents and files are represented in the information retrieval system through metadata (Frakes, 1992). This allows such systems to analyze large chunks of data hosted on servers in different geographical locations. This is common among search engines that operate on the internet (Berry and Browne, 2005).
3.1.4 Information Retrieval Models
Information retrieval happens based on mathematical frameworks that define how it takes place (Blank et al., n.d.). These frameworks, called models, determine how information retrieval systems work, and also determine how researchers and academics approach information retrieval (Hiemstra, n.d.). They, therefore, foretell and explain, the results of any query, based on user input. These models differ depending on the algorithms they apply while retrieving information.
These constitute the first models to be implemented in information retrieval; hence, are the most widely used. It uses a query as a definite definition of sets of information. A query will, therefore, define all documents or pieces of information indexed with the contents of the query.
Boolean models use three Boolean operators AND, OR, and NOT to sift through documents to find the most suitable combination. The AND operator allows systems to only give results that have both characteristic; OR gives results that have either characteristics; NOT rejects results with certain characteristics. These operators help users to refine their searches; hence, help them retrieve the correct information (Hiemstra, n.d.).
These models function in the same way as Boolean models, except that they act on arbitrary sections of textual data (extents, regions, or segments) (Manning et al., 2009). These models define start and stop positions in linear strings of text. This makes them suitable for retrieving large pieces of information such as excerpts from books.
These models use frequencies to establish the relevance of information to particular queries. There are two main statistical models:
This model is based on the idea that information retrieval systems are supposed to rank results based on their applicability to the query (Allied Publishers, 2005). The model analyzes the distribution of terms in both relevant and irrelevant documents, making comparison to the query input. The documents are ranked based on a scalar quantity (Boughanem et al., 2009). Documents with the highest probability of similarity are displayed at the top of the list.
ii.Latent Semantic Indexing
This uses associations between terms and documents to generate results during retrieval (Boughanem et al., 2009). The model uses statistical techniques to tabulate the latent structure that arises in words as users key in their queries. This enables it dig deeper into the documents; hence, it is able to generate results even if there is no match in wording (Landauer et al., 2013).
3.2 Search Engines
3.2.1 Definition of a Search Engine
Search Engine refers to a general class of programs that are used to search documents for given keywords and returns a list of those documents where the word is contained. Typical examples are web search engines, which work by sending a WebCrawler or spider (program that fetches web pages automatically) to fetch as many documents as possible with the given key word from the internet. An indexer, which is another program, reads the documents and based on the words contained in each document, creates an index (a list of keys for which each identifies a particular record or document in this case). Different search engines use different proprietary algorithms for creating indices aiming at obtaining meaningful results only. Bing, Yahoo and Google are examples of well-known search engines in use today (Basic Concepts of Search Engines, 2011).
3.2.2 History of Search Engines
Modern day search engines have the incredible capability of processing a search query and returning accurate and valuable nuggets of information amidst the vast amounts of information available on the internet. This was not always the case. During the period before 1990, there was absolutely no way to search data from the internet. There was just a handful of websites and the information accessible from them could be downloaded via FTP. Knowledge of the existence of a website was conveyed through word of mouth. Archie, a creation of Alan Emtage, Ben Wheelan and Peter Deusch was the first program created to search the internet in 1990. It was less of a search engine and more of a searchable list of files. One needed to know the exact name of a file and with that, Archie would provide the FTP site from which you could download the file (A Short History of Search Engines, n.d.).
In 1991, the University of Nevada Computing Services created Veronica, a search engine that was created to search Gopher servers (servers that store only plain text documents) for files. At around the same time, another search engine by the name Jughead came around. It was used to search Gopher servers as well and thus was very much like Veronica (Wiley, n.d.). At around 1993, the internet was beginning to take a different shape with proliferation of websites in addition to FTP, Gopher and email servers. World Wide Web Wanderer, a program consisting of a series of robots that scored the web for URLs and stored them in a database called Wandex was introduced (A Short History of Search Engines, n.d.). This was the first typical search engine with robots, the capability to track the growth of the web and the first web database. Later in 1993, ALIWEB, was developed. It was a webpage equivalent of Archie and Veronica. As opposed to using robots for indexing, webmasters who wanted their sites listed provided an index file with their sites information. This method created an advantage in that there was no need of robots that would eat up bandwidth (Kim, n.d).
Later, in 1993, spiders were introduced into search engines as advancement to cataloguing of sites information. These programs scoured the internet searching for web page information. At this stage, spiders searched page titles, header information and URLs as the sources of key words. This method was still primitive though in that results were still ranked depending by relationship to the search key words except by one of the search engines of the three available by then (Kim, n.d).
In 1994, undergraduate students at Stanford released the first popular search engine, Excite. It used statistical analysis of word relationships to provide efficient searches from the vast amount of information available on the internet (Wiley, n.d.). The deficiency of the robots not having intelligence on what they were indexing prompted the creation of the first searchable/browsable web directory in 1994, Tradewave Galaxy. As a directory, Galaxy links were organized hierarchically, starting from a top level to lower levels with the sub-directories. Still in 1994, Jerry Yang and David Filo, Ph.D. candidates at Stanford introduced web pages with links to documents and called them Yahoo. They then developed a hierarchical listing as the number of links increased and a way to search through all the links as the pages gained popularity. It was not considered a search engine since all the links on the paged were manually and not automatically adapted and it only searched through those links (A Short History of Search Engines, n.d.).
WebCrawler was the first full-text search engine on the internet introduced at an undergraduate seminar conference in the University of Washington. It enabled users to search full text of an entire document as opposed to URLs and web page descriptions. July 1994 saw the introduction of Lycos by Michael Mauldin of Carnegie Mellon University. Lycos provided ranking relevance retrieval and prefix matching and word proximity. Infoseek was the next big player in the search engines market providing a user friendly interface and additional services but most importantly, becoming Netscape’s default search engine replacing Yahoo in December 1995. Alpha Vista was also introduced in 1995 running on Alpha based computers. Alpha based computers had the most powerful processors at the time thus enabling the search engine to run even on very high traffic without slowing down. It also allowed users to enter questions rather than just key words and employed Boolean operators further refining a user’s search (Kim, n.d).
HotBot, a project of the University of California, Berkley was later introduced its primary feature being its ability to update its entire index in a day thus ensuring that it had the most up-to-date information of all major search engines. It also had the ability to index up to 10 million pages in a day. In addition, HotBot mad use of cookie technology (a cookie is a small file stored on a user’s computer by a search engine) to store information on a user’s search preferences (Kim, n.d).
The year 1995 also saw the introduction of a new type of search engines, the metasearch engine. This engine worked by taking a user’s query, submitting it to all leading search engines, receiving the results and formatting all of them to a single page. Metacrawler was the first search engine of this type. An entourage of many other metasearch engines followed Metacrawler (Kim, n.d).
At around this time, it was realized that search engines were receiving more hits than any other web page on the internet. This created a good venture for advertisers. Search engines bagun running adverts thereby changing the fortunes for the entire market. Having laid down the fundamentals for searching the internet, the number of search engines increased dramatically. There was also the need by coroporations to make their information private from all people but available to their employees alone. This created another niche for search engines thereby fuelling production of more of them. The existing players such as Netscape and Infoseek largely capitalized on these opportunities. As time went on, many new web pages were being introduced on the internet daily thus the need to update search engines indices. Some websites went further to create search engines for their web pages (Wiley, n.d.).
Essentially, the stage was set with vast information on how to create effective search engines. The venues to do business were also increasing and very lucrative. The search engine market has seen witnessed the endurance of older players such as Yahoo and Netscape to large corporations and the entrance of players search as Internet Explorer from Microsoft and Google among many others. Improvements on the foundations of search engines have also advanced to even greater levels translating to the effectiveness observed in present day search engines. The complexity of these search engines has extended their capabilities to be able to suite the end user irrespective of their need or knowledge with even the introduction of algorithm based searches. A particular example is the Google search engine which has the capability to search through web documents and databases as well thus yielding more valuable results to the user.
3.2.3 Concept: How Search Engines Work
The basic ideology of search engines is the ability to take a user’s search phrase, consisting of key words or simply phrased in text-based natural language. The search engine should then be able to search the vast documents on the internet by the phrases or keywords in the users query, index them and provide a set of results. Based on search algorithms, the search engine should be able to deliver the most meaningful and valuable results from the scores of information on the internet. In addition, it should also be able to rank the results obtained based on a user’s priority. It is also essential that the search engine should be able to do this within the shortest amount of time possible, deliver and display results in the most user friendly way (Basic Concepts of Search Engines, 2011).
As much as there are different types of search engines, this is the basic concept of any search engine. However, some search engines may contain peculiar differences specifically structured to fit their use. For example, databases search engines are unique since the data held in databases has a specific and consistent structure. Mixed search engines that search both databases and web documents are customized to sift through structured and unstructured resources to scour for data.
3.2.4 Components of Search Engines
A search engine has 3 major components, that is, a crawler or a spider, an indexer and a ranker based on the basic concept of search engines. A crawler is basically a robot that downloads the pages of a website and scours (or crawls) them for links. It then further downloads these links and crawls them for other links as well. A crawler will periodically visit a webpage searching for changes in their contents and thereby modify the ranking of these pages accordingly. These periodic visits may change from once in a month to as frequent as daily for popular websites depending on the frequency by which the site is updated and the quality of the website (SEO Consult, 2013).
After crawling web pages, the crawler passes the crawled pages to the indexer for ranking. The indexer stores the pages it receives from the crawler in database referred to as an index. This is similar to a normal paper book index where for a given word, the index provides the pages in which you can find it. The index is dynamic and it changes whenever the crawler re-crawls the already indexed page or when it finds a new page. Given the size of this database (index) it takes time for changes to be committed and thus creating the possibility of pages that have been crawled but yet to be indexed (Basic Concepts of Search Engines, 2011).
The ranker or the search engine software is the component of the search engine that a user interacts with. It takes a user’s search query, and finds pages that are relevant to the query by sifting all the indexed pages. It then sorts the results based on relevance to the user query and presents them to the user. These are the basic component for any web crawler based web search engine. The difference in how these parts are tuned is the reason for the differences in results for queries on the different search engines.
3.2.5 Search Engine Algorithms
Initially, search engines depended on Boolean operators to perform searches on the internet following a user’s query. This was particularly limited since the search engine attempted to return pages with the best match as opposed to trying to answer the underlying question. Search engine algorithms were introduced to fix this issue. They are proprietary acquisitions for search engine companies and thus closely guarded secrets (Gillespie, n.d).
22.214.171.124 Classes of Search Engine algorithms
126.96.36.199.1 Definition: What is a search engine algorithm?
A search engine algorithm is a unique formula or a set of rules that a search engine uses to determine the significance of a web page. Using algorithms, a search engine can determine if a page has data that is of significance to users, whether it is a spam or a real page and other features on the page that can be used to list and rank it for specific search queries (What is Search Engine Algorithm?, 2013).
188.8.131.52.2 Characteristics of Search Engine Algorithms
Algorithms used by search engines differ from one search engine to another. However, there are fundamental features that are common to all search engine algorithms. Relevancy is a major characteristic for search engine algorithms. Algorithms determine the relevance of a given web page particular to a specific search whether by checking how keywords are used in the given web page or by simply scanning the page for keywords. The location of keywords on the web page is also an important factor in determining the relevance of a page. Pages that have keywords in the headlines, the page title or in the first few lines of text will get a higher relevance in a search. Frequent appearance of keywords in a page will also result to the page gaining relevance if this frequency is not a result of keyword stuffing.
The individual factors contained in a given algorithm are another important trait of a search engine algorithm. These individual factors are the ones that determine an algorithm’s uniqueness. It is this factors that also make a search engine different from another search engine given that the results obtained from one search engine differ from those of another search engine based on the algorithm used. A common individual factor among search engine algorithms is the number of web pages it indexes. Different results will emanate from different engines be it from the frequency of indexing or from number of pages indexed. In some scenarios, the individual factor that sets the algorithms apart may be whether a given algorithm penalizes for spamming while another does not.
Off-page factors constitute another characteristic of search engine algorithms. Off-page factors refer to things such as linking and click-through measurement the frequency of whom can be used to determine the relevance of a web page to a specific query. The frequency of these factors can also cause a page to be ranked higher. The effects of off-page factors to a search engine performance are enormous depending on the algorithm being used.
Another important trait of search engine algorithms is their efficiency. This refers to how fast an algorithm can run to deliver valuable and relevant results. It is particularly important since it determines the bandwidth used for a user query and thus the time it takes to yield a result. An inefficient algorithm will take long to produce results, and is wasteful on bandwidth.
184.108.40.206.3 Strengths of Search Engine Algorithms
A measure of the speed at which an algorithm can run and thereby the bandwidth it uses can be used to determine the strength of an algorithm. Typically, algorithms are supposed to be time and data efficient. With search engines, this becomes a major factor because the amount of data involved is tremendous and it keeps on growing. How fast an algorithm used in a search engine is able to deliver is thus a critical issue as far as the algorithm is concerned (Silvestri, 2004).
Deep crawling refers to the ability to harvest as many links as possible, thereby provide a larger set of web pages to index, and search. This gives a search engine a broad reach on the web and thus, the strength of a search engine algorithm can be determined by the extent of web reach it delivers to the search engine (Brocolo, Frieder, Nardini, Perego, and Silvestri, n.d).
Determining the relevance and rank of a webpage in a specific user search is elemental to search engines. To do so, a search engine needs to consider numerous factors that are analytically and qualitatively determined. The computational capability of an algorithm therefore is an important trait to its strength.
Search Engine Optimization is a technique that used by web masters and web marketers to achieve higher ranking from search engines. This can however be exploited by scrupulous persons thereby using factors such as keyword stuffing and spamdexing. The strength of a search engine algorithm can thus be determined by the ability of the algorithm to determine and overcome such factors.
220.127.116.11.4 Weaknesses of Search Engine Algorithms
A major weakness in search engine algorithms is how they influence how we think and consume information available on line. When doing a search, the search engine algorithm will give a result based on how it interprets the query which may not be what a user intended. As such, the way research, academic and professional, is conducted using information on the internet may be altered (Henzingar, n.d.).
18.104.22.168.5 Comparison of Algorithms
Across the board, search engine algorithms used by different search engines are closely guarded secrets. They are what that differentiates the good from the worse. These algorithms are under constant change and review to keep up with competition as well as changing trends in the search market. A major element is that these algorithms only differ in tuning of the various elements but are essentially designed in a similar logic (Sun, Lebanon and Collins-Thompson, n.d.).
22.214.171.124 Ranking algorithms
A search engine query gives numerous results based on the keywords in the query. The results differ from each other with some being more relevant and thus more valuable than others. This creates the needs to list the results based on a user’s preferred priority.
126.96.36.199.1 Definition of Ranking Algorithms
Ranking algorithms are formulas and rules used by search engines, specifically by the ranker component of the engine, to determine the relevance of a given web page resulting from a users search. The relevance of a web page is reliant on the underlying question from the user’s search (Monash, 2004).
188.8.131.52.2 Characteristics of Ranking Algorithms
Ranking algorithms rely on a given set of factors to determine a web page’s rank. Different algorithms from different search engines use different factors. . Google’s ranking algorithm for example uses the authority of the domain name of searched web page, anchor text found in the web page’s external links, raw page rank, one-page keyword usage among many others. Other search engines use a different set of factors and thus the ranks derived from similar searches are normally different across search engines (Beel, and Gipp, 2009).
A ranking algorithm associates a specific level of importance to any given factor used in ranking. This means that for a given set of factors, each factor is designated a specific level of importance. The levels of importance linked to specific factors vary across search engines (Panda, n.d.).
Algorithms used for ranking are dynamic. The number of web pages and the content there in continues to increase and thus to best suite users’ needs, the ranking algorithms are often changed. In addition, scrupulous web marketers may exploit the functioning of a particular algorithm to suit their needs, contrary to what that should be the norm.
184.108.40.206.3 Strengths of Ranking Algorithms
Complexity of a ranking algorithm happens to be one of its major strengths. The purpose of a ranking algorithm is to give precedence to the most relevant result emanating from a web search. Through search engine optimization, it is possible to corrupt this virtue if the ranking algorithm is easily comprehensible (Sharma and Sharma, n.d.).
Ability to compare, contrast and compute relevance from a large set of factors. The larger the set of factors, the higher the quality of results. A strong ranking algorithm should thus be able to use a large set of factors and qualitatively be able to rank search results (Beel and Gipp, 2009).
Given that the higher the set of factors the better the result, a strong ranking algorithm should have a high computational capability. This will be effective in associating the given levels of importance to the factors in consideration and comparatively, rank the search results.
220.127.116.11.4 Weaknesses of Ranking Algorithms
For web marketers and web masters, ranking algorithms may be a misery and constant source of trouble. To influence the discoverability of their sites, marketers and web masters spend a good deal of time optimizing contents on their web pages so as to achieve a higher ranking with search engines. This however may apply for one search engine but completely fail in another. Similarly, a change in algorithm may throw their efforts into disarray (Fair Pay, 2011).
A weak set of factors to determine page ranking creates a big weakness in ranking algorithms. For example, most search engines determine the rank of a page based on the traffic that it gets, that is, from the number of clicks. This means that well designed bots may be used to generate a large number of clicks for a given page thereby raising its ranking inappropriately (Bar-Ilan, 2007).
18.104.22.168.5 Comparison of Algorithms
Due to the need for fairness in ranking of web pages, ranking algorithms use a large number of factors to determine a page rank across the board. These algorithms are proprietary acquisitions by search engine companies and are thus only known to the parent companies. They also keep changing from time to time to keep up with changing trends, number of web pages as well as varying content on web pages (Signorini, 2005).
According to Hart-Davis (2009), Auto-complete, also termed as word completion refers, to the feature provided by various e-mail programs, web browsers, search engine interfaces, database query tools, command line interpreters, word processors, and source code editors. Jones (2008) argued that auto-completion feature, which help users type texts in the search box is present in both smaller search engines as well as small search engines. Most text editors are fitted with the auto-complete feature to help users get information on their searches. Through the auto-complete feature at Google and in phones users ends up accessing eased the typing task. Ishida (2010) argues that the feature had the capability of completing the word that the user wanted to type in without the need for the user to complete typing the word in complete. The effectiveness of the feature depends on the ease of the system can easily predict the word that the user wants to type. This depends on the predictability offered by the already typed word. This literature aimed at understanding the auto-completion feature availed during the retrieval of data in Information Technology Systems.
Osman (2006) argued that the auto-complete feature helped the users because it predicted the phrase or word, which the user desired to type. The feature helped users identify correct spelling of the words and completes the wording without the user actually typing the words. Systems like Google and phones have the auto-complete system, which eases the typing task of most users. Nudelman (2013) held that the predictability of the word depends on the ease of typing an already typed word. This became effective based on the limit in the number of common and possible number of commonly used words. The feature also applies when editing text compiled using in highly-structured language, which is easy to predict. Source code editor information is usually easy to predict. Most e-mail programs, command line interpreters, and web browsers have auto-completion feature. The feature can also be used by text editors, which becomes even easier when the prediction use words applicable in one or even multiple languages. According to White and Roth (2009), most auto-complete programs have the capability of learning new words, especially when typed into the system frequently. The system can also predict alternative words relating to the habits of frequent users. The auto-complete system has the capability of speeding up human interaction with the computer, especially when placed in a well suited environment.
3.3.1 Definition of Auto-completion
‘Vivisimo Adapts to Employees with ‘Intelligent Auto-Completion’ (2010) argues that the auto-completion service helps users access their search intention without completely typing the word. The feature is usually fitted in the program to facilitate search and relies on the frequency of the word on the system. Users usually type the desired texts in the search box, where the word usually completes immediately the user type the first one to three letters. The feature is also available for phrases. Google and some phones have been installed with the feature.
According to ‘Google Scribe Auto-Completes Text Anywhere You Type on the Web [Text Editor]’ (2010), most library federated searching tools identify both common errors made when typing words on Google and the patterns of usage of varied sites and tools in searches. In library searches, Google detects approximately 29% of spelling suggestions on all searches. Additionally, 51% of the searches involves search on known items. The patterns of search errors provides a clear implication that when users are provided with some form of guidance on keyword selection, they world experience ease in composing searches, which will ultimately result into relevant search results.
3.3.2 The History of Auto-completion
Auto-completion was initially introduced to assist physically disabled individuals’ typing experience Audrey (2011). The feature assists the group increase their speed in typing. Additionally, the system feature also aimed at helping the group reduce the number of keystrokes required to complete sentences and words. Kevin (2007) argues that individuals who use speech generated devise are approximately 10% faster, than the individuals using oral speech. Similarly, the system also proves helpful for individuals, who write texts. Google auto-completion service is equally helpful to individuals who frequent the use of hard and long words in spelling medical and technical words, especially medical doctors.
Google first introduced its testing service in 2004. In 2008, the feature matured, which means that the auto-complete feature on Google is not a new idea to the organization. Google was not among the first search engines to provide users with auto-select options. However, the search engine has attracted attention from a wide audience because of its popularity. Google Instant Search, which led to increased attention of the suggestions, provided by Google, was launched in 2012. This interactivity led to increased attention into the suggestions made by users, including attention on blocked suggestions (Shokouhi and Radinsky, 2012).
3.3.3 The Concept of Auto-completion
The Auto-complete feature works in a way that enables writers to write the first words or letters, after which the program makes predictions of the subsequent words, in form of choices. The writer is allowed to select the words if his or her intended words are in the choices. This can also be facilitated through the use of number keys. Google service and some modernized phones make use of this feature as it eases typing. In case the writer realizes that the word he or she is searching is not in the list, he or she can continue into typing the subsequent letter in the word. The search engine refines the words selections in alteration to ensure that the choices change to suit the needs of the writer. The words change based on the letters selected. When the user finally gets to the word he intends to use, he or she selects the word (Foster, Griswold, and Lerner, 2012).
Bar-Yossef and Kraus (2011) argue that Google offers suggestions, which actually come from real searches that individuals make. The suggestions reflect real searches that individuals have actually been making. Google also considers a number of factors that determines the extent of the popularity of words on display. The display of suggestions varies with region and language. This means that not everyone has the capability of accessing some suggestions. According to McLaughlin (2013), language also plays a vital part on the suggestions option. The language is usually determined by the setting of the browsers.
Megan (2011) holds that other forms of predictions are made when the system predicts that the written word is usually followed by the specified one. This prediction is based on the most recent pair of words. Language modeling is a concept where a given group of vocabularies, which calculates the most possible word to occur when the writer types. This system of calculation is usually referred to as the word prediction system. The AAC devices use the recency model and language modeling strategy to determine the most basic word prediction. According to Miguel and Antonio (n.d), most word prediction software provides users with an opportunity to enter their desired prediction. Users can enter their predictions through the use of prediction dictionaries, which can be accessed either directly or via learning. The system gets acquainted with the common words entered in the system.
Additionally, Luis, Ricardo, Juan, Javier, and José (n.d) identify stand-alone tools, which complement the existing auto-complete applications. The program assumes the role of monitoring the keystrokes of most users and suggests a list of words in accordance to the first typed letters into the system. Some of the programs include Letmetype and Typingaid. Perrinet, Pañeda, Cabrero Melendi, García, and García, 2011) argue that the LetMeType service allows individuals to continue with the development of an existing source code typed by the author. On the other hand, Michel, Fabien, Guillaume, Peter, and Catherine, (n.d) view that Typingaid freeware was developed sometimes back. The intellicomplete service with both the payware and the freeware service are effective only when the programs are linked with the intellicomplete server. Most of the auto-complete functions can be used in the creation of a shorthand list (Bassett, 2009).
Bao, Pierce, Whittaker, and Zhai (2011) point out that the user interface used in auto-complete feature gives users a choice on their desired results, whenever they type their questions in the search option box. These suggestions are normally referred to as incremental search or autosuggestions. Similarly, Pini, Han, and Wallace (2010) argue that the autosuggestions search usually depends on the matching algorithms, which always ignore entry errors. These errors include Levenshtein algorithm and sound algorithms. However, it remains challenging to search for large indices lists in a few milliseconds. This means that large wording during the search brings in diverse results on the desired topic.
Muûlu, Brun, Holmes, Ernst, and Notkin (2012) conducted a study on the usability testing on Google and use of smart phones to determine the effectiveness of the Auto-completion service. The test employed the use of Nielsen’s guidelines, with a minimum of 3 users for each of the rounds. The changes were made according to the feedback received from the groups. The study involved the use of qualitative standards. The users, who participated in the study, were recruited through the use of advertisement procedures conducted on the website, and the use of referrals.
There is a text editor termed as the context completion feature. This feature allows for correction of phrases or even words based on the existing context or context of similar words which exists in the same document. The words can also be among the training data group. The context completion feature has the capability of predicting the expected word quite precisely. The feature can also predict words even without having the initial letter in the word. However, the context completion feature requires some form of training data set. The training appears quite complex than the feature in the world completion. Context completion is commonly used in the language applied by advanced programmer editors and IDES. This is because the training data set is naturally available. The users of the context completion feature have a perfect understanding of its value and use, than its use in broad word completion (Hart-Davis, 2009).
Another type of context completion is the use of line completion feature. The feature was founded by Juraj Simlovicin 2006, through TED Notepad. The current line is the context of the line completion, whereas the current document stands as training data set. The moment a user begins his or her line with a commonly used phrase, the editor automatically completes the word. The phrase only separates at a point where the line differs or gives rise to a series of words, which might be used to complete the word (Nudelman, 2013).
3.3.4 The Pros and Cons of Auto-completion
According to Bao et al (2011), the auto-complete feature helped users in spelling of the words. Most users are usually excited with the feature and promised to explain the same to friends. Researchers have confirmed that 60 percent of the respondents completely used the dropdown feature on Google to spell the words; whereas the remaining 40 percent used the dropdown feature interchangeably to help in the spelling of the words.
Most researchers argue that it is quite easy to spell words when an individual has the idea of the word, yet do not have the correct spelling of the words. This is because the feature helps users in searching for the correct information. Once the user chooses the word they wanted they fail to realize that the word they had previously chosen was wrong. This increases the chance of choosing a wrong word and hence searching the wrong information. An example is when an individual was searching for the word Chormo, yet the auto-complete feature completes the word as Chroma. This means that users of the service requires to, have correct information in regard to wording and spelling if they want to enjoy a nice experience (Hart-Davis, 2009).
Additionally, researchers argue that the auto-complete feature allowed users to find information even when they do not have the full information. This is particularly useful to students. A student can type a name of the title given by the teacher and find a full combination of the book. Google implementation combines various information to form a l list of book titles for the student to choose. This echoes the discoveries of von Reischach, Dubach, Michahelles, and Schmidt (2010) which held that the service improved the quality of the initial search for exploratory tasks and initial searches.
The use of auto completion feature on Google improved confidence level even when searching unfamiliar words. Most users provide response on the ease they experienced when performing research. When the instructor provides students with an incomplete name of an author, after typing the first word the second word automatically appears (Bao et al, 2011).
Megan (2011) argues that the auto- completion feature on Google enables students to attain confidence when attending to a research paper. This is because they realize that the research is already conducted. The students then realize that they may not be the first ones to conduct research on the specified topic. Most users of the auto-completion feature realizes that the research they are conducting is real, in the sense that someone else had already conducted research on the topic. The students also reported that the use of the auto-complete feature on their computers boosts their confidence in spelling of the word. This is prompted by the idea that as the students typed the words on the search box, the complete word appeared, such that the students gains familiarity with the correct spellings.
Most of the researchers reported that the auto-complete feature increased students typing skills. This results from the knowledge that the word or phrase appears even before the student complete typing the required term. However, sometimes students ignore the auto-complete feature on Google. The case mainly applies when these students know what they are want to search. These students usually feel that looking at the search option when they already know what they want to type will reduce their speed. This means that speed is relative to the use of auto-complete feature on Google. This is because respondents held that the stopping and looking at the search option will waste their time, while others hold that typing the words in full will end up wasting their time (White and Roth, 2009).
Auto-complete feature has also been seen as a tool of brainstorming. Most of the researchers held that the use of Google, especially when doing research on a given topic, helped complete their intended thought. When typing a certain vocabulary relating to a topic the auto-complete feature on Google helps the researcher relate his or her ideas in the required tense. This helps students relate to instructions and also find proper references. This saves the students on the time they would have used to attend to the library. In situation where the participants were in need of a wide range of information the shortening of the words was not necessary. The case is mainly present when the research demands a lot of information. The multiple suggestions displayed on Google when the researchers were conducting their research mind prompt the students into changing their minds. Some participants held that the terms, which are seen options immediately after the typing the required words prompted the participants into changing their minds on the topics. This is usually the case when the student realize that the suggested topic is quite interesting than the topic that they chose first (Bar-Yossef and Kraus, 2011).
Generally, the results showed that most of users are quite satisfied with the use of auto-complete feature on Google and some of their phones. However, most users of the feature argue that Google should add on to the number of suggestions provided by the auto-complete service. This is because such undertakings would help them narrow down their search time. Additionally, the users feel that the feature will also help them expand their field of study. Most users of the auto-complete feature on Google also recommend Google to, increase the display time. This will help to reduce the time span of text wrapping. Some of the users of the feature propose that increasing the height of the dropdown menu will reduce the time used by the users scrolling down to reach their desired search options. Further, the use of cosmetic changes also plays a vital role in increasing the attractiveness of Google, which will in turn attract a large number of users (Osman, 2006).Therefore, the auto-complete feature on Google can be used by students as a tool to aid their spellings. Most users of the site recommend Google to their fellow students as it provides users with options on word usage. Most users acknowledge the auto-complete feature on Google boosts their confidence levels. The auto-complete feature reduce the time used by students to go back to the libraries to search for a recommended books provided they have an idea of what to key in the computer. The auto-complete feature tends to reduce the time used by students in stroking the keyboards. This is because the word appears immediately the student types the first letters of the words. Once a certain word has been searched on Google users can easily access the words. These words are a reflection of all web activities or even the content of web pages, which are indexed by Google. Enabling of Web History of the computer system and signing in to Google allows individuals to see the searches that are relevant to the subjects in the study. Google plus also has the tendency of bringing in a person’s name when searched by the users.
Allied Publishers, 2005. National Conference on Frontiers in Applied and Computational Mathematics (FACM-2005). In National Conference on Frontiers in Applied and Computational Mathematics., 2005. Allied Publishers.
Audrey, W., 2011, ‘Google Scribe Updated, But Automatic Text Completion Still More Fun Than Functional’, ReadWriteWeb.com (USA), 26 May, NewsBank, EBSCOhost, viewed 14 July 2013.
Bahtia, S., Tuarob, S., Mitra, P., and Giles, L.C., n.d., An Algorithm Search Engine for Software Developers. The Pennsylvania State University, University Park.
Bao, P, Pierce, J, Whittaker, S, and Zhai, S 2011, ‘Smart phone use by non-mobile business users’, Proceedings Of The 13Th International Conference On Human Computer Interaction With Mobile Devices and Services: Mobilehci 2011, p. 445, Publisher Provided Full Text Searching File, EBSCOhost, viewed 15 July 2013.
Bar-Ilan, J., 2007, Manipulating search engine algorithms: The case of Google, Journal of Information, Communication and Ethics in Society, vol. 5 Iss: 2/3, pp.155 – 166.
Bar-Yossef, Z, and Kraus, N 2011, ‘Context-sensitive query auto-completion’, International World Wide Web Conference, p. 107, Publisher Provided Full Text Searching File, EBSCOhost, viewed 14 July 2013.
Basic Concepts of Search Engines, 2011. Available from http://www.affiliateseeking.com/forums/search-engines-and-seo/7129-basic-concepts- of-search-engines.html> [17 July 2013]
Bassett, D., 2009. ‘E-Pitfalls: Ethics and E-Discovery’, Northern Kentucky Law Review, 36, p. 449, LexisNexis Academic: Law Reviews, EBSCOhost, viewed 15 July 2013.
Beel, J. and Gipp, B., 2009. Google Scholar’s Ranking Algorithm: An Introductory Overview. Otto-von-Guericke University, Magdeburg, Germany.
Berry, M.W. and Browne, M., 2005. Understanding Search Engines: Mathematical Modeling and Text Retrieval. Philadelphia: SIAM.
Blank, D. et al., n.d. Information Retrieval: Concepts and Practical Considerations for Teaching a. [Online] Available at: “http://www.eecs.qmul.ac.uk/~thor/2009/dbspek-lehreIR-2009.pdf [Accessed 15 July 2013].
Boughanem, M., Berrut, C., Mothe, J. and Soule-Dupuy, C., 2009. Advances in Information Retrieval. In 31th European Conference on IR Research, ECIR 2009, Toulouse, France, April 6-9, 2009, Proceedings. New York, 2009. Springer.
Brin, S. and Page, L., n.d., The Anatomy of Large Scale Hypertextual Web Search Engine. Stanford University, CA, USA.
Brocolo, D., Frieder, O., Nardini, F.M., Perego, R. and Silvestri, F., n.d., Incremental Algorithms for Effective Query Recommendation. Washington: Georgetown University.
Elser, J. K., 2012. Search Engine Tuning with Genetic Algorithms. Montana: Montana State University.
Fair Pay 2011, The Possible Weakness in Google’s Ranking Algorithm. Available from
[17 July 2013]
Foster, S, Griswold, W, and Lerner, S 2012, ‘WitchDoctor: IDE support for real-time auto- completion of refactorings’, 2012 34Th International Conference On Software Engineering (ICSE), p. 222, Publisher Provided Full Text Searching File, EBSCOhost, viewed 14 July 2013.
Foster, S, Griswold, W., and Lerner, S., 2012. ‘WitchDoctor: IDE Support for Real-Time Auto- Completion of Refactorings’, ICSE: International Conference On Software Engineering, p. 222, Publisher Provided Full Text Searching File, EBSCOhost, viewed 14 July 2013.
Frakes, W.B., 1992. Information Retrieval Data Structures and Algorithms. Upper Saddle River: Prentice-Hall.
Gillespie, T., n.d.. The Relevance of Algorithms. Cambridge, MA: MIT Press
‘Google Scribe Auto-Completes Text Anywhere You Type on the Web [Text Editor]’, 2010, Newstex Blogs (USA), 8 September, NewsBank, EBSCOhost, viewed 14 July 2013.
Hart-Davis, G. (2009). IWork ’09. Indianapolis, IN, Wiley Pub.
Henzingar, M., n.d., Algorithmic Challenges in Web Search Engines. Internet Mathematics, vol. 1, no. 1
Henzingar, M., n.d.. Combinational Algorithms for Web Search Engines – Three Success Stories
Hiemstra, D., n.d. Information retrieval models. Enschede: University of Twente.
Howells, C., 2001. Cyndi’s List: A Comprehensive List of 70,000 Genealogy Sites on the Internet, Volume 2. Baltimore: Genealogical Publishing Com.
Hursh, P., 2005. Chasing Search Engine Algorithms: Wisdom or Folly? Available from [17 July 2013]
Information Technology for Libraries and Information Agencies n.d., A Short History of Search Engines. Available from: [17 July 2013]
Ishida, T., 2010. Culture and computing computing and communication for crosscultural interaction. Berlin: Springer.
Jones, W. P., 2008. Keeping found things found the study and practice of personal information management. Amsterdam: Morgan Kaufmann Publishers.
Kevin, P., 2007, ‘Turn On Auto-Complete on Google [Google]’, Lifehacker, 27 November, NewsBank, EBSCOhost, viewed 14 July 2013.
Kim, L n.d., The History of Search Engines – An Inforgraphic. Available from [17 July 2013]
Lancaster, F.W., 1968. Information Retrieval Systems: Characteristics, Testing and Evaluation. New York: Wiley.
Landauer, T.K., McNamara, D.S., Dennis, S. and Kintsch, W., 2013. Handbook of Latent Semantic Analysis. Florence: Psychology Press.
Levene, M., 2011. An Introduction to Search Engines and Web Navigation. New York: John Wiley and Sons.
Luis-Fernando, D., Ricardo-de, C., Juan-Manuel, M., Javier, F., and José-Manuel, P., n.d., ‘Application of backend database contents and structure to the design of spoken dialog services’, Expert Systems With Applications, 39, pp. 5665-5680, ScienceDirect, EBSCOhost, viewed 15 July 2013.
Manning, C.D., Raghavan, P. and Schutze, H., 2009. An Introduction to Information Retrieval. Cambridge: Cambridge University Press.
McLaughlin, K. P., 2013. ‘Sharing You with You: Informational Privacy, Google, and The Limits of Use Limitation’, Albany Law Journal of Science and Technology, 23, p. 55, LexisNexis Academic: Law Reviews, EBSCOhost, viewed 14 July 2013.
Michel, B., Fabien, G., Guillaume, E., Peter, S., and Catherine, F., n.d., ‘SweetWiki: A semantic wiki’, Web Semantics: Science, Services And Agents On The World Wide Web, 6, Semantic Web and Web 2.0, pp. 84-97, ScienceDirect, EBSCOhost, viewed 15 July 2013.
Miguel, L, and Antonio, F., n.d.. ‘How to make a natural language interface to query databases accessible to everyone: An example’, Computer Standards and Interfaces, ScienceDirect, EBSCOhost, viewed 15 July 2013.
Monash, C. A., 2004, Search Engine Ranking Algorithms. Available from [17 July 2013]
Mooers, C.N., 2011. Theory Digital Handling Non-numerical Information. Zator Technical Bulletin, 48(5).
Muûlu, K, Brun, Y, Holmes, R, Ernst, M, and Notkin, D., 2012. ‘Speculative analysis of integrated development environment recommendations’, ACM / SIGPLAN Notices, 47, 10, p. 669, Publisher Provided Full Text Searching File, EBSCOhost, viewed 15 July 2013.
Nudelman, G. (2013). Android Design Patterns: Interaction Design Solutions for Developers. Indianapolis: Wiley.
Osman, H., 2006. Securing your information in an insecure world: What you must know about hackers and identity thieves. New York: BookSurge.
Panda, A., n.d., Google Search Algorithms and SEOs.
Perrinet, J., Pañeda, X., Cabrero, S., Melendi, D., García, R., and García, V., 2011, ‘Evaluation of Virtual Keyboards for Interactive Digital Television Applications’, International Journal Of Human-Computer Interaction, 27, 8, pp. 703-728, Academic Search Premier, EBSCOhost, viewed 15 July 2013.
Pini, S., Han, S., and Wallace, D., 2010. ‘Text entry for mobile devices using ad-hoc abbreviation’, AVI: International Working Conference On Advanced Visual Interfaces, p. 181, Publisher Provided Full Text Searching File, EBSCOhost, viewed 15 July 2013.
[email protected] Megan, L., 2011. ‘Google updates Maps for mobile browsers’, Newstex Blogs (USA), 20 May, NewsBank, EBSCOhost, viewed 14 July 2013.
Search Engine Optimization, 2011. Available from [17 July 2013]
SEO Consult, 2013. Search Engine Algorithms. Available from [17 July 2013]
Sharma, D. K., and Sharma, A. K., n.d., A Comparative Analysis of Web Page Ranking Algorithms. International Journal of Computer Science and Engineering, vol. 02, no. 08.
Shokouhi, M, and Radinsky, K., 2012, ‘Time-sensitive query auto-completion’, SIGIR: Annual ACM Conference On Research and Development In Information Retrieval, p. 601, Publisher Provided Full Text Searching File, EBSCOhost, viewed 14 July 2013.
Signorini, A., 2005. A Survey of Ranking Algorithms. Iowa: University of Iowa.
Silvestri, F., 2004. High Performance Issues in Web Search Engines: Algorithm and Techniques. Ph.D Thesis, Univasita Deldi Studi Di Pisa, pp. 20-21.
Singla, A., White, R., W., and Huang J., n.d., Studying Trailfinding Algorithms for Enhanced Web Search. University of Washington, Seattle, USA.
Sun, M., Lebanon, G. and Collins-Thompson, K., n.d.. Visualizing Differences in Web Search Algorithms Using the Expected Weighted Hoeffding Distance. Georgia Institute of Technology, Georgia, USA.
‘Vivisimo Adapts to Employees with ‘Intelligent Auto-Completion,’ Picks Up Where Google ‘Instant’ Leaves Off; New Employee-Centered Information Optimization Technologies Shifts Emphasis from ‘Searchability’ to ‘Findability’ within an Organization’, 2010, PR Newswire (USA), 9 September, NewsBank, EBSCOhost, viewed 14 July 2013.
Von Reischach, F, Dubach, E, Michahelles, F, and Schmidt, A 2010, ‘An evaluation of product review modalities for mobile phones’, ACM International Conference Proceeding Series, p. 199, Publisher Provided Full Text Searching File, EBSCOhost, viewed 15 July 2013.
What is Search Engine Algorithm? 2013. Available from [17 July 2013]
White, R. W., and Roth, R. A., 2009. Exploratory search: beyond the query-response paradigm. [San Rafael, Calif.], Morgan and Claypool Publishers.
Wiley, n.d., A History of Search Engines. Available from [17 July 2013]
Useful study materials: https://essays.io/dissertation-methodology-examples-samples/