Education-line Home Page

Cognitive Style and Its Effect on Internet Searching: A Quantitative Investigation.

By Nicola Moss and Greg Hale

Department of Information Studies, University of Sheffield, Western Bank, Sheffield,

S10 2TN, UK.

E-mail: n.c.moss@sheffield.ac.uk, g.hale@sheffield.ac.uk

Paper Presented at the European Conference on Educational Research, Lahti, Finland 22 - 25 September 1999

ABSTRACT

This paper describes the methodology and analysis used in the first stage of the quantitative strand of a project investigating Internet searching behaviour [1]. The other strand is a qualitative strand (see Hale and Moss, 1999). The project is funded by the Arts and Humanities Research Board, United Kingdom and runs from May 1999 to the end of April 2000.

The quantitative strand investigates links between cognitive style and choice of Internet search strategy. Internet search strategies were explored through an examination of search patterns from pre-specified search problems (tasks), using semi-automated data gathering of one hundred and fifty-nine search episodes undertaken by fifteen participants. Though specific research hypotheses concerning links between cognitive style and search behaviour will be tested in the main phase, the emphasis of this stage of the project is to explore novel relationships between cognitive style and search strategies, facilitating new theoretical understandings.

The open, exploratory nature of the study made possible the development of novel methods of data analysis, combining both qualitative and quantitative methods. The aim of this methodological approach is to deepen our understanding of how people search the Internet, which may have implications for education and training in Internet searching. The paper discusses the analytic methods developed in the study, and illustrates some of the themes revealed by this approach to data analysis.

INTRODUCTION

The Internet Searching Project is based at the Department of Information Studies, University of Sheffield. The project has both quantitative and qualitative strands. The quantitative strand is investigating links between cognitive style and choice of Internet strategy. The qualitative strand is using a grounded theory methodology to investigate emergent issues related to Internet search strategies. Both strands will inform each other and the project is designed to allow specific issues of interest to be further investigated throughout its duration. The project is funded by the Arts and Humanities Research Board of the United Kingdom and runs from May 1999 to the end of April 2000. Further details can be found at the project website, (open 1st October 1999, http://dis.shef.ac.uk/ahrb/default.htm).

The quantitative strand examines search patterns from pre-specified search problems, collected using semi-automated data gathering. Though specific research hypotheses concerning links between cognitive style and search behaviour will be tested, the main emphasis of this stage of the project is to explore novel relationships between cognitive style and search strategies, leading to new theoretical understandings. The stage of the research as reported here was by design exploratory in nature, allowing for the investigation of novel relationships between cognitive styles and strategies not revealed by previous studies. This approach seeks to avoid the atomistic dangers of the quantitative paradigm by drawing on holistic perspectives related to the development of theory as used in the qualitative strand of the research.

The qualitative strand (Hale and Moss, 1999) uses iterative interviewing, with participants' views deepened and reflected back to them by the interviewer, to explore Internet search strategies. Participants undertook out-of-system and in-system interviews (using real information problems they were facing). This facilitated not only an understanding of the individual meanings and strategies that searchers bring to search episodes but also allowed the consideration of wider issues related to Internet searching.

Information retrieval and Internet searching

Previous studies of information retrieval behaviour have reported links between cognitive style and information retrieval strategy (see for example Ford et al, 1994), with suggestions that differences in individuals’ search strategies, the effectiveness of searches and the satisfaction with the results of searches, are significantly linked to differences in cognitive style [2] (i.e. pervasive differences in the way individuals process information, learn and solve problems). This study aims to examine such findings in the context of Internet searching.

The study aims to explore the effects of different cognitive styles (and associated linguistic factors), on strategies and search effectiveness in the context of Internet searching. This paper describes the analytic methods which were developed in the study, and highlights some of the key themes which were revealed in the data, focussing particularly on those themes which are most likely to be relevant in an educational context.

Much of the previous research which has examined information retrieval behaviour has adopted a quantitative approach to data analysis (Hsieh-yee, 1993; Spink and Saracevic, 1997; Jansen et al, 1998; Schacter et al, 1998). However, the exploratory nature of this study has resulted in the development of a novel method of data analysis, which may more fully explain Internet searching. The method is a form of bi-modal data analysis, which combines the benefits of quantitative and qualitative approaches in the synthesis of these methods. The underlying approach involved drawing on the holistic principles more commonly used in qualitative analysis, but combining these with the rigorous statistical analyses and assumptions of the quantitative paradigm. In the context of this study, bi-modal data analysis refers to the fact that both quantitative and qualitative themes emerged from the data.

METHOD

Data collection

The study examined the Internet search strategies of 15 participants from the Department of Information Studies, 13 of whom were postgraduate students in the department. The mean age of the sample was 24 years, and 60% of the participants were female. Data was also collected to determine the participants’ cognitive style, user profile (including data on gender, and Internet use and experience) and linguistic flexibility (from measures built into the search material).

The Internet search data was generated by each participant across three pre-specified tasks [3], which were specially devised to investigate different aspects of searching. One of the tasks required participants to determine the presence or absence of information on the web pertaining to a specific news event, in order to investigate persistence. A second task required searchers to formulate their query at a broader level than the task set, and the third required participants to formulate more narrowly focused queries. The participants were required to generate appropriate search terms, in order to answer the questions posed by the tasks. The search terms were applied to the advanced search screen of Alta-Vista [4], and participants were encouraged to continue searching until they were satisfied with their results on each task. Most participants carried out more than one search episode per task, therefore refining their original search. The study used a JavaScript-based HTML form to record the data entered by the participants (which consisted of query terms and ancillary query information, e.g., dates). Participants e-mailed this to the investigator on completion of each query. One hundred and fifty-nine search episodes were recorded in the study, an average of 10.6 searches per participant.

Data analysis

Internet search strategies were explored through an examination of the participants' search episodes (individual queries), which were generated to resolve the queries posed by the pre-specified search problems (tasks). The development of the analysis was an iterative process, with key issues (themes) emerging from the data. The original aim had been to carry out a purely quantitative analysis, but as the analysis developed it became clear that a more qualitative, grounded approach was the most useful way of identifying all the themes in the data. Although this strand of the research still encompasses many of the underlying assumptions common to the quantitative paradigm, particularly in terms of the statistical analysis, the nature of the data provided scope for a more holistic approach.

The analytic process was two-fold: first the identification and definition of the themes or codes within the data; and second the coding of all 159 search episodes across each of the 32 codes identified. This process therefore converted the qualitative data obtained from the search tasks into quantitative data, which is suitable for statistical analysis (correlation and factor analysis). The analysis was carried on the basis of search terms, concepts, combinations and strategies, as revealed in the data. The following section reports the analytic development process and some of the findings.

FINDINGS: DEVELOPMENT OF THE ANALYTIC PROCESS; SEARCH EPISODE PROCESSES

Development of the analytic process

This section reports the results of the developmental process underlying the coding scheme. The scheme was the result of an iterative process of analysis, and was developed through a detailed examination of the data (individual search episodes or queries). Two separate coding schemes were devised, highlighting the bi-modal nature of the analysis. These two schemes have been labelled 'quantitative themes' and 'qualitative themes'. These labels serve to highlight the distinction between the two sets of codes, particularly in terms of the underlying processes involved in their development. The quantitative themes reflect issues which have commonly been investigated in studies of information retrieval (e.g., number of terms used; number of Boolean operators), whilst the qualitative themes reflect novel issues which have emerged from the data (e.g., changes in behaviour following an error message; or the use of 'normal' strategy, defined below).

A key process underlying both the qualitative and quantitative codes was the development of the ‘custom dictionary’ [5]. This was developed prior to the coding of the data, and clearly defines each of the themes within the coding schemes, thus ensuring reliability in the coding. The categories within each theme are clearly defined, as are the parameters for inclusion within a particular category. Different levels of data are also defined, from the level of the individual search (one query), to the level of the task (multiple queries encompassed within one pre-specified search problem), to the level of the session (the entire experimental period for one participant, usually three tasks, composed of multiple queries).

In complement to the custom dictionary, a sub-dictionary ('general dictionary of terms') was also developed, in order to define specific phrases which were used in the coding schemes. For example the term 'style' refers to the use of special characters such as brackets, wildcards and speech marks, and the term 'focus' refers to the particular aspect of a task that is highlighted by the choice of search terms. The focus is therefore grounded in the way a participant chooses to interpret a particular task. Coding on the quantitative scheme involved a frequency count of the data. The qualitative data was coded using a Likert scale (3- or 5-point), or on a binomial system where more appropriate.

The quantitative coding scheme drew partly on general theoretical sensitivity related to Internet searching and information retrieval, but a strong grounded element was also introduced, such that the coding scheme reached theoretical saturation, i.e., all salient themes within the data were identified. Twenty-three different quantitative themes are identified in the custom dictionary [5], the majority of which were analysed at the level of the individual search. The themes were coded using a frequency count of the data. In most cases the Ranking and Boolean boxes (Alta-Vista advanced search) were coded separately. The coded themes include: the number of queries in which search terms were entered in both the Boolean and Ranking boxes (combined search); the number of searches carried out per task; whether a search was limited by date; the use of phrases; the occurrence of technical errors.

In comparison, the main focus of the qualitative coding scheme was on the emergence of themes from the data, rather than from previous literature. The identification of the themes followed directly from the in-depth study of the data. In this way the data is meaningfully represented by the themes on which it is coded. There is potential for some of the identified themes to be context dependent, and this issue will be examined in the main stage of the project. The process of identifying the themes was iterative, to ensure that the coding was theoretically saturated. Nine key qualitative themes were identified, and then defined in the custom dictionary [5]. These themes were coded on different levels, including the level of the search, the interval (i.e., change between subsequent searches) and the results returned by Alta-Vista. Many of the codes highlight issues that have not previously been identified using conventional analysis, thereby illustrating the benefits of the bi-modal approach. The codes include: change in strategy (whether participants change their general strategy between one search and the next, see dictionary); change in behaviour following the return of nil documents (changes in search strategy following a search in which Alta-Vista could not return any results); and the complexity of a search (which was measured on a clearly defined 5-point scale, see dictionary).

Search episode processes

This section presents results selected from the searches of four study participants, in order to illustrate key differences between individuals. The searches are grouped around two different themes that were identified in the qualitative coding scheme. The examples are drawn from real searches, recorded in the initial phase of the study. These searches are used to illustrate the use of a 'normal' strategy, and the changes in strategy following a search that produced nil document returns. The data reports all search terms which were entered by participants in the ranking (keyword) and Boolean boxes (Alta-Vista advanced search screen [4]).

The searches are reported sequentially, such that Search 1 represents the first search carried out by an individual, and Search 2 represents the subsequent search. Transitions between tasks are also represented in the examples below, for example Participant 1 changes from one task to another between searches 1 and 2. The reason for the transition cannot be ascertained from the data, due to the fact that participants were not interviewed in this strand of the project. However, these transitions may represent either a positive (successful) search result on the first task, or frustration with lack of success. During the search sessions participants were free to determine the 'state' of the query, i.e., whether to use keyword terms, Boolean terms, or a combination of both.

Process One: Use of Normal Strategy

The first two searches are exemplars of the theme 'use of normal strategy'. This is defined in the custom dictionary and is used to determine whether participants begin a session with a 'normal' (schematic) strategy (which in this context refers to a strategy which is typical of an individual, and not to a common strategy). The presence of a 'normal' strategy implies that a participant used a similar strategy across (at least) the first three searches of a session. A similar strategy is defined as one in which a participant searches using the same state (e.g., Boolean query), similar style (e.g., brackets or phrases), but not (necessarily) the same terms or focus, across queries. The code is, by necessity, defined in rather general terms, e.g., participants do not have to use precisely the same number of Boolean operators across the searches, rather they must use a similar Boolean query, such as multiple terms linked with operators.

The results reported below are, by definition, the first three searches carried out by participants within a session (see above). Participants were free to change from one task to another at any point during the session, and therefore although the searches are sequential, they do not necessarily relate to only one task.

Participant 1

Search 1 (keyword search):

Ranking = JavaScript learning

Search 2 (combined search):

Boolean = sportsperson and attack and emergency services

Ranking = famous sportsperson attack emergency.

Search 3 (Boolean search):

Boolean = (sport and celebrity) and (attack and emergency services)

 

Participant 2

Search 1 (Boolean search):

Boolean = (sportsman OR sportswoman) AND (attack* OR abuse*) AND (fireman

OR police* OR ambulance*)

Search 2 (Boolean search):

Boolean = employee* NEAR injur* NEAR (*self OR *selves) AND legal*

Search 3 (Boolean search):

Boolean = (employee* OR worker*) NEAR injur* NEAR (*self OR *selves) AND

legal* AND sheffield university

It is clear from the examples that the first participant does not use a 'normal' strategy, whilst the second participant does, despite changing task between searches 1 and 2. Statistical analysis will enable further investigation of this theme, for example an examination of differences between individuals who tend to use a ‘normal’ strategy and those who do not.

Process Two: Change in Search Behaviour Following the Return of Nil Documents

The following searches are exemplars of the theme 'change in search behaviour following the return of nil documents'. Fourteen different changes in behaviour were identified and then defined in the ‘general dictionary of terms’. These included: Change (logical) state (e.g., change from a Boolean to a keyword search); Change focus (i.e., change the highlighted topic); End session; Broaden search (e.g., by using the Boolean operator OR, or by broadening the search terms); Access help. The number of behaviours identified per search was not limited, i.e., if multiple behaviours were found to have changed, then the incidence of each was recorded.

Although the following results report the searches sequentially, the searches have been selected from the middle of search sessions, rather than the beginning (see Process One, above). The searches were selected because they provide clear illustrations of a number of the behavioural categories. The examples below report two searches per participant. The first search (e.g., Search 12 in the case of Participant 3) is that which led to the return of no documents. The second (subsequent) search is a result of behavioural changes to the strategy following the nil return. Both searches are therefore reported to enable comparisons to be made between the two.

Participant 3

Search 12 (Boolean search):

Boolean = "injuries at work" and legal and problems

Search 13 (keyword search):

Ranking = what are the health and safety responsibilities of an employer?

Following search 12 (which led to the return of nil documents by Alta-Vista), Participant 3 made changes across four behavioural categories: 1. Change (logical) state (from Boolean to keyword); 3. Change focus (from injuries and legal, to health and safety); 7. Broaden search (removes Boolean AND’s and broadens focus); 9. Use synonyms. This refined search led to the return of 104,780 documents. The searcher was therefore successful in broadening the search, although other factors would need to be cross-correlated in order to determine the effectiveness of the search (e.g., the large number of returned documents may suggest that many would not be relevant).

Participant 4

Search 5 (Boolean search):

Boolean = sportsperson AND Emergency staff

Search 6 (Boolean search):

Boolean = "Technician cuts finger" AND "Legal Implications"

Following search 5 (which led to the return of nil documents by Alta-Vista), Participant 4 made changes across three behavioural categories: 2. Change style (adopt use of phrases); 4. Change task; 10. Access help. However, this refined search also led to the return of nil documents, and therefore the behavioural change in search strategy was not successful.

DISCUSSION

The examples reported above have illustrated some of the themes and categories identified in the qualitative coding scheme. Many of the themes defined within this scheme (and also within the quantitative coding scheme) may have implications for work which addresses applied educational issues related to training in Internet searching. A number of the themes may be particularly relevant, and are issues which appear not to have been addressed by previous research which has sought to improve information retrieval skills. One such theme was illustrated above (use of 'normal' strategy). This theme was developed following the finding that a number of participants (53%) applied the same strategy to (at least) their first three searches, irrespective of whether or not they changed task over this duration, i.e., although the task may change, the strategy remains the same. The implication of this is that individuals may be relying on a 'schema' of searching, based possibly on previous experience or on training. What is crucial therefore, is the fact that this should be exploited positively, i.e., if searchers are relying on a normal strategy, then this strategy must be both appropriate and effective, within a topic and across novel queries. This is particularly salient if the query is within a new domain, where domain knowledge may well be lacking.

Another theme with applied educational implications is 'change in behaviour following the return of nil documents', which was also illustrated above. This theme identifies the changes in search strategy which occur after an ineffective search, i.e., a search in which Alta-Vista does not return any documents. The key issue of educational relevance is the fact that searchers must be able to identify the possible reasons behind such a return, in order to refine and improve their search (these reasons may include: the fact that the search was too narrow; or that a participant made the type of error about which the system cannot advise, e.g., a date error). However, as the examples above illustrated (see p.6-7), the identification of the underlying problem is not always successfully achieved, and even if the subsequent search does lead to document returns, these may not necessarily be relevant or effective. The results from Participant 3 (reported above) illustrate that whilst searches may be improved, refinement of the search needs to be focused, an issue which appropriate training would therefore need to address. In contrast, Participant 4 did not successfully refine his search strategy, thus highlighting the problems involved in the unsuccessful identification of the reasons behind an ineffective (nil returns) search.

This theme does not imply that all participants were unsuccessful searchers, although forty-seven percent of the sample formulated at least one search which resulted in nil returns. This finding may also be context dependent, with the possibility that a different sub-set of the sample may produce nil document returns across different search tasks. Further research (see below) will therefore investigate this phenomenon.

CONCLUSION AND FURTHER RESEARCH

This paper has begun to develop deeper understandings of how people search the Internet, which may have implications for education and training in Internet searching. The findings from the development of the analytic coding procedure have highlighted the benefits of the analytic approach adopted. The exploratory nature of the study allowed the adoption of a bi-modal method of data analysis, combining both qualitative and quantitative approaches. The qualitative approach in particular was found successfully to reveal themes within the data that had not been identified in previous studies relying on conventional analysis. The synthesis of qualitative and quantitative methods therefore combined the benefits of both approaches.

However, this paper has only summarised the analytic procedure, and papers currently in preparation will report the data analysis in more detail and the results of the statistical analysis, which investigates the correlation of cognitive style and Internet searching. The results reported in this paper were drawn from the initial phase of the research, the main phase of which is about to be undertaken. This aims to test a broader sample, and will also enable the grounded themes (qualitative and quantitative) to be tested on more extensive data, to examine the validity and reliability of the codes.

NOTES

[1] Thanks to colleagues Nigel Ford and David Miller for their comments on this manuscript.

[2] "…an individual's preferred and habitual approach to organising and representing information", Riding and Rayner (1999), p.8.

[3] Search tasks:

Task 1: A technician cuts his finger badly in the Information Studies Departmental office. What are the legal implications of this for the university? Find relevant information on the Web.

Task 2: You want to learn about the JavaScript programming language but have no previous experience. Find suitable materials - at an appropriate level for people with no experience - on the Web.

Task 3: Discover as quickly as possible the answer to this question: Is there (yes or no) any information on the Web of a recent incident in which a famous sportsperson attacked a member of the emergency services?

[4] See http://www.altavista.com/ (Advanced search).

[5] The complete 'custom dictionary' (i.e., both quantitative and qualitative codes) and the 'general dictionary of terms' can be found at the project website: http://dis.shef.ac.uk/ahrb/datadic.htm

REFERENCES

Ford, N., Wood, F. and Walsh, C. (1994) Cognitive Styles and Searching. Online and CDRom Review, 18(2), 79-86.

Hale, G.G. and Moss, N.C. (1999) "So tell me about it!": a qualitative investigation of internet search strategies. Presented at the European Conference on Educational Research 22-26th September 1999, Lahti, Finland.

Hsieh-yee, I. (1993) Effects of Search Experience and Subject Knowledge on the Search Tactics of Novice and Experienced Searchers. Journal of the American Society for Information Science, 44(3), 161-174.

Jansen, B.J., Spink, A., Bateman, J. and Saracevic, T. (1998) Real Life Information Retrieval: A Study of User Queries on the Web. SIGIR Forum, 32(1), 5-17.

Riding, R.J. and Rayner, S. (1999) Cognitive Styles and Learning Strategies. Understanding Style Differences in Learning and Behaviour. London: David Fulton Publishers.

Schacter, J., Chung, G.K. and Dorr, A. (1998) Children's Internet Searching on Complex Problems: Performance and Process Analysis. Journal of the American Society for Information Science, 49(9), 840-849.

Spink, A. and Saracevic, T. (1997) Interactive Information Retrieval: Sources and Effectiveness of Search Terms During Mediated Online Searching. Journal of the American Society for Information Science, 48 (8), 741-761.

This document was added to the Education-line database 05 October 1999