Monday, September 26, 2016

2 Mysteries Solved!

One of the disadvantages of a long tradition is that the reasons behind certain practices can be lost over time. This is definitely the case with many practices in libraries, and in particular in practices affecting the library catalog. In U.S. libraries we tend to date our cataloging practices back to Panizzi, in the 1830's, but I suspect that he was already building on practices that preceded him.

A particular problem with this loss of history is that without the information about why a certain practice was chosen it becomes difficult to know if or when you can change the practice. This is compounded in libraries by the existence of entries in our catalogs that were created long before us and by colleagues whom we can no longer consult.


I was recently reading through volume one of the American Library Journal from the year 1876-1877. The American Library Association had been founded in 1876 and had its first meeting in Philadelphia in September, 1876. U.S. librarianship finally had a focal point for professional development. From the initial conference there were a number of ALA committees working on problems of interest to the library community. A Committee on Cooperative Cataloguing, led by Melvil Dewey, (who had not yet been able to remove the "u" from "cataloguing") was proposing that cataloging of books be done once, centrally, and shared, at a modest cost, with other libraries that purchased the same book. This was realized in 1902 when the Library of Congress began selling printed card sets. We still have cooperative cataloging, 140 years later, and it has had a profound effect on the ability of American libraries to reduce the cost of catalog creation.

Other practices were set in motion in 1876-1877, and two of these can be found in that inaugural volume. They are also practices whose rationales have not been obvious to me, so I was very glad to solve these mysteries.

Title case

Some time ago I asked on Autocat, out of curiosity, why libraries use sentence case for titles. No one who replied had more than a speculative answer. In 1877, however, Charles Ammi Cutter reports on The Use of Capitals in library cataloging and defines a set of rules that can be followed. His main impetus is "readability;" that "a profusion of capitals confuses rather than assists the eye...." (He also mentions that this is not a problem with the Bodleian library catalog, as that is written in Latin.)

Cutter would have preferred that capitals be confined to proper names, eschewing their use for titles of honor (Rev., Mrs., Earl) and initialisms (A.D). However, he said that these uses were so common that he didn't expect to see them changed, and so he conceded them.

All in all, I think you will find his rules quite compelling. I haven't looked at how they compare to any such rules in RDA. So much still to do!

Centimeters

I have often pointed out, although it would be obvious to anyone who has the time to question the practice, that books are measured in centimeters in Anglo-American catalogs, although there are few cultures as insistent on measuring in inches and feet than those. It is particularly un-helpful that books in libraries are cataloged with a height measurement in centimeters while the shelves that they are destined for are measured in inches. It is true that the measurement forms part of the description of the book, but at least one use of that is to determine on which shelves those books can be placed. (Note that in some storage facilities, book shelves are more variable in height than in general library collections and the size determination allows for more compact storage.) If I were to shout out to you "37 centimeters" you would probably be hard-pressed to reply quickly with the same measurement in inches. So why do we use centimeters?

The newly-formed American Library Association had a Committee on Sizes. This committee had been given the task of developing a set of standard size designations for books. The "size question" had to do with the then current practice to list sizes as folio, quarto, etc. Apparently the rise of modern paper making and printing meant that those were no longer the actual sizes of books. In the article by Charles Evans (pp. 56-61) he argued that actual measurements of the books, in inches, should replace the previous list of standard sizes. However, later, the use of inches was questioned. At the ALA meeting, W.F. Poole (of Poole's indexes) made the following statement (p. 109):
"The expression of measure in inches, and vulgar fractions of an inch, has many disadvantages, while the metric decimal system is simple, and doubtless will soon come into general use."
The committee agreed with this approach, and concluded:
"The committee have also reconsidered the expediency of adopting the centimeter as a unit, in accordance with the vote at Philadelphia, querying whether it were really best to substitute this for the familiar inch. They find on investigation that even the opponents of the metric system acknowledge that it is soon to come into general use in this country; that it is already adopted by nearly every other country of importance except England; that it is in itself a unit better adapted to our wants than the inch, which is too large for the measurement of books." (p. 180)

The members of the committee were James L. Whitney, Charles A. Cutter, and Melvil Dewey, the latter having formed the American Metric Bureau in July of 1876, both a kind of lobbying organization and a sales point for metric measures. My guess is that the "investigation" was a chat amongst themselves, and that Dewey was unmovable when it came to using metric measures, although he appears not to have been alone in that. I do love the fact that the inch is "too large," and that its fractions (1/16, etc.) are "vulgar."

Dewey and cohort obviously weren't around when compact discs came on the scene, because those are measured in inches ("1 sound disc : digital ; 4 3/4 in"). However, maps get the metric treatment: "1 map : col. ; 67 x 53 cm folded to 23 x 10 cm". Somewhere there is a record of these decisions, and I hope to come across them.

It would have been ideal if the U.S. had gone metric when Dewey encouraged that move. I suspect that our residual umbilical chord linking us to England is what scuppered that. Yet it is a wonder that we still use those too large, vulgar measurements. Dewey would be very disappointed to learn this.



So there it is, two of the great mysteries solved in the record of the very first year of the American library profession. Here are the readings; I created separate PDFs for the two most relevant sections:

American Library Journal, volume 1, 1876-1877 (from the Internet Archive)
Cutter, Charles A. The use of capitals. American Library Journal, v.1, n. 4-5, 1877. pp. 162-166
The Committee on Sizes of Books, American Library Journal, v.1, n. 4-5, 1877, pages 178-181

Also note that beginning on page 92 there is a near verbatim account of every meeting at the first American Library Association conference in Philadelphia, September, 1876. So verbatim that it includes the mention of who went out for a smoke and missed a key vote. And the advertisements! Give it a look.

Wednesday, August 31, 2016

User tasks, Step one

Brian C. Vickery, one of the greats of classification theory and a key person in the work of the Classification Research Group (active from 1952 to 1968), gave this list of the stages of "the process of acquiring documentary information" in his 1959 book Classification and Indexing in Science[1]:

  1. Identifying the subject of the search. 
  2. Locating this subject in a guide which refers the searcher to one or more documents. 
  3. Locating the documents. 
  4. Locating the required information in the documents. 

These overlap somewhat with FRBR's user tasks (find, identify, select, obtain) but the first step in Vickery's group is my focus here: Identifying the subject of the search. It is a step that I do not perceive as implied in the FRBR "find", and is all too often missing from library/use interactions today.

A person walks into a library... 

Presumably, libraries are an organized knowledge space. If they weren't the books would just be thrown onto the nearest shelf, and subject cataloging would not exist. However, if this organization isn't both visible and comprehended by users, we are, firstly, not getting the return on our cataloging investment and secondly, users are not getting the full benefit of the library.

In Part V of my series on Catalogs and Context, I had two salient quotes. One by Bill Katz: "Be skeptical of the of information the patron presents"[2]; the other by Pauline Cochrane: "Why should a user ever enter a search term that does not provide a link to the syndetic apparatus and a suggestion about how to proceed?"[3]. Both of these address the obvious, yet often overlooked, primary point of failure for library users, which is the disconnect between how the user expresses his information need vis-a-vis the terms assigned by the library to the items that may satisfy that need.

Vickery's Three Issues for Stage 1 


Issue 1: Formulating the topic 

Vickery talks about three issues that must be addressed in his first stage, identifying the subject on which to search in a library catalog or indexing database. The first one is "...the inability even of specialist enquirers always to state their requirements exactly..." [1 p.1] That's the "reference interview" problem that Katz writes about: the user comes to the library with an ill-formed expression of what they need. We generally consider this to be outside the boundaries of the catalog, which means that it only exists for users who have an interaction with reference staff. Given that most users of the library today are not in the physical library, and that online services (from Google to Amazon to automated courseware) have trained users that successful finding does not require human interaction, these encounters with reference staff are a minority of the user-library sessions.

In online catalogs, we take what the user types into the search box as an appropriate entry point for a search, even though another branch of our profession is based on the premise that users do not enter the library with a perfectly formulated question, and need an intelligent intervention to have a successful interaction with the library. Formulating a precise question may not be easy, even for experienced researchers. For example, in a search about serving persons who have been infected with HIV, you may need to decide whether the research requires you to consider whether the person who is HIV positive has moved along the spectrum to be medically diagnosed as having AIDS. This decision is directly related to the search that will need to be done:

HIV-positive persons--Counseling of
AIDS (Disease)--Patients--Counseling of

Issue 2: from topic to query 

The second of Vickery's caveats is that "[The researcher] may have chosen the correct concepts to express the subject, but may not have used the standard words of the index."[1 p.4] This is the "entry vocabulary" issue. What user would guess that the question "Where all did Dickens live?" would be answered with a search using "Dickens, Charles -- Homes and haunts"? And that all of the terms listed as "use for" below would translate to the term "HIV (Viruses)" in the catalog? (h/t Netanel Ganin):

As Pauline Cochrane points out[4], beginning in the latter part of the 20th century, libraries found themselves unable to include the necessary cross-reference information in their card catalogs, due to the cost of producing the cards. Instead, they asked users to look up terms in the subject heading reference books used by catalog librarians to create the headings. These books are not available to users of online catalogs, and although some current online catalogs include authorized alternate entry points in their searches, many do not.* This means that we have multiple generations of users who have not encountered "term switching" in their library catalog usage, and who probably do not understand its utility.

Even with such a terminology-switching mechanism, finding the proper entry in the catalog is not at all simple. The article by Thomas Mann (of Library of Congress, not the German author) on “The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries” [5] shows not only how complex that process might be, but it also indicates that the translation can only be accomplished by a library-trained expert. This presents us with a great difficulty because there are not enough such experts available to guide users, and not all users are willing to avail themselves of those services. How would a user discover that literature is French, but performing arts are in France?:

French literature
Performing arts -- France -- History

Or, using the example in Mann's piece, the searcher looking for in information on tribute payments in the Peloponnesian war needed to look under "Finance, public–Greece–Athens".  This type of search failure fuels the argument that full text search is a better solution, and a search of Google Books on "tribute payments Peloponnesian war" does yield some results. The other side of the argument is that full text searches fail to retrieve documents not in the search language, while library subject headings apply to all materials in all languages. Somehow, this latter argument, in my experience, doesn't convince.

Issue 3: term order 

The third point by Vickery is one that keyword indexing has solved, which is "...the searcher may use the correct words to express the subject, but may not choose the correct combination order."[1 p.4] In 1959, when Vickery was writing this particular piece, having the wrong order of terms resulted in a failed search. Mann, however, would say that with keyword searching the user does not encounter the context that the pre-coordinated headings provide; thus keyword searching is not a solution at all. I'm with him part way, because I think keyword searching as an entry to a vocabulary can be useful if the syndetic structure is visible with such a beginning. Keyword searching directly against bibliographic records, less so.

Comparison to FRBR "find" 

FRBR's "find" is described as "to find entities that correspond to the user’s stated search criteria". [6 p. 79] We could presume that in FRBR the "user's stated search criteria" has either been modified through a prior process (although I hardly know what that would be, other than a reference interview), or that the library system has the capability to interact with the user in such a way that the user's search is optimized to meet the terminology of the library's knowledge organization system. This latter would require some kind of artificial intelligence and seems unlikely. The former simply does not happen often today, with most users being at a computer rather than a reference desk. FRBR's find seems to carry the same assumption as has been made functional in online catalogs, which is that the appropriateness of the search string is not questioned.

Summary 

There are two take-aways from this set of observations:

  1. We are failing to help users refine their query, which means that they may actually be basing their searches on concepts that will not fulfill their information need in the library catalog. 
  2. We are failing to help users translate their query into the language of the catalog(s). 

I would add that the language of the catalog should show users how the catalog is organized and how the knowledge universe is addressed by the library. This is implied in the second take-away, but I wanted to bring it out specifically, because it is a failure that particularly bothers me.

Notes

*I did a search in various catalogs on "cancer" and "carcinoma". Cancer is the form used in LCSH-cataloged bibliographic records, and carcinoma is a cross reference. I found a local public library whose Bibliocommons catalog did retrieve all of the records with "cancer" in them when the search was on "carcinoma"; and that the same search in the Harvard Hollis system did not (carcinoma: 1889 retrievals; cancer 21,311). These are just two catalogs, and not a representative sample, to say the least, but the fact seems to be shown.

References

[1] Vickery, B C. Classification and Indexing in Science. New York: Academic Press, 1959.
[2] Katz, Bill. Introduction to Reference Work: Reference Services and Reference Processes. New York: McGraw-Hill, 1992. p. 82 http://www.worldcat.org/oclc/928951754. Cited in: Brown, Stephanie Willen. The Reference Interview: Theories and Practice. Library Philosophy and Practice 2008. ISSN 1522-0222
[3] Modern Subject Access in the Online Age: Lesson 3 Author(s): Pauline A. Cochrane, Marcia J. Bates, Margaret Beckman, Hans H. Wellisch, Sanford Berman, Toni Petersen, Stephen E. Wiberley and Jr. Source: American Libraries, Vol. 15, No. 4 (Apr., 1984), pp. 250-252, 254-255 Stable URL: http://www.jstor.org/stable/25626708
[4] Modern Subject Access in the Online Age: Lesson 2 Pauline A. Cochrane American Libraries Vol. 15, No. 3 (Mar., 1984), pp. 145-148, 150 Stable URL: http://www.jstor.org/stable/25626647
[5] Thomas Mann, “The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries” (June 13, 2007). PDF, 41 pp. http://guild2910.org/Pelopponesian%20War%20June%2013%202007.pdf
[6] IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records, 2009. http://archive.ifla.org/VII/s13/frbr/frbr_2008.pdf.

Sunday, August 21, 2016

Wikipedia and the numbers falacy

One of the main attempts at solutions to the lack of women on Wikipedia is to encourage more women to come to Wikipedia and edit. The idea is that greater numbers of women on Wikipedia will result in greater equality on the platform; that there will be more information about women and women's issues, and a hoped for "civilizing influence" on the brutish culture.

This argument is so obviously specious that it is hard for me to imagine that it is being put forth by educated and intelligent people. Women are not a minority - we are around 52% of the world's population and, with a few pockets of exception, we are culturally, politically, sexually, and financially oppressed throughout the planet. If numbers created more equality, where is that equality for women?

The "woman problem" is not numerical and it cannot solved with numbers. The problem is cultural; we know this because attacks against women can be traced to culture, not numbers: the brutal rapes in India, the harassment of German women by recent-arrived immigrant men at the Hamburg railway station on New Year's eve, the racist and sexist attacks on Leslie Jones on Twitter -- none of these can be explained by numbers. In fact, the stats show that over 60% of Twitter users are female, and yet Jones was horribly attacked. Gamergate arose at a time when the number of women in gaming is quite high, with data varying from 40% to over 50% of gamers being women. Women gamers are attacked not because there are too few of them, and there does not appear to be any safety in numbers.

The numbers argument is not only provably false, it is dangerous if mis-applied. Would women be safer walking home alone at night if we encouraged more women to do it?  Would having more women at frat parties reduce the rape culture on campus? Would women on Wikipedia be safer if there were more of them? (The statistics from 2011 showed that 13% of editors were female. The Wikimedia Foundation had a goal to increase the number to 25% by 2015, but Jimmy Wales actually stated in 2015 that the number of women was closer to 10% than 25%.) I think that gamergate and Twitter show us that the numbers are not the issue.

In fact, Wikipedia's efforts may have exacerbated the problem. The very public efforts to bring more women editors into Wikipedia (there have been and are organized campaigns both for women and about women) and the addition of more articles by and about women is going to be threatening to some members of the Wikipedia culture. In a recent example, an edit-a-thon produced twelve new articles about women artists. They were immediately marked for deletion, and yet, after analysis, ten of the articles were determined to be suitable, and only two were lost. It is quite likely that twelve new articles about male scientists (Wikipedia greatly values science over art, another bias) would not have produced this reaction; in fact, they might have sailed into the encyclopedia space without a hitch. Some editors are rebelling against the addition of information about women on Wikipedia, seeing it as a kind of reverse sexism (something that came up frequently in the attack on me).

Wikipedia's culture is a "self-run" society. So was the society in the Lord of the Flies. If you are one of the people who believe that we don't need government, that individuals should just battle it out and see who wins, then Wikipedia might be for you. If, instead, you believe that we have a social obligation to provide a safe environment for people, then this self-run society is not going to be appealing. I've felt what it's like to be "Piggy" and I can tell you that it's not something I would want anyone else to go through.

I'm not saying that we do not want more women editing Wikipedia. I am saying that more women does not equate to more safety for women. The safety problem is a cultural problem, not a numbers problem. One of the big challenges is how we can define safety in an actionable way. Title IX, the US statute mandating equality of the sexes in education,  revolutionized education and education-related sports. Importantly, it comes under the civil rights area of the Department of Justice. We need a Title IX for the Internet; one that requires those providing public services to make sure that there is no discrimination based on sex. Before we can have such a solution, we need to determine how to define "non-discrimination" in that context. It's not going to be easy, but it is a pre-requisite to solving the problem.

Wednesday, August 17, 2016

Classification, RDF, and promiscuous vowels

"[He] provided (i) a classified schedule of things and concepts, (ii) a series of conjunctions or 'particles' whereby the elementary terms can be combined to express composite subjects, (iii) various kinds of notational devices ... as a means of displaying relationships between terms." [1]

"By reducing the complexity of natural language to manageable sets of nouns and verbs that are well-defined and unambiguous, sentence-like statements can be interpreted...."[2]

The "he" in the first quote is John Wilkins, and the date is 1668.[3] His goal was to create a scientifically correct language that would have one and only one term for each thing, and then would have a set of particles that would connect those things to make meaning. His one and only one term is essentially an identifier. His particles are linking elements.

The second quote is from a publication about OCLC's linked data experiements, and is about linked data, or RDF. The goals are so obviously similar that it can't be overlooked. Of course there are huge differences, not the least of which is the technology of the time.*

What I find particularly interesting about Wilkins is that he did not distinguish between classification of knowledge and language. In fact, he was creating a language, a vocabulary, that would be used to talk about the world as classified knowledge. Here we are at a distance of about 350 years, and the language basis of both his work and the abstract grammar of the semantic web share a lot of their DNA. They are probably proof of some Chomskian theory of our brain and language, but I'm really not up to reading Chomsky at this point.

The other interesting note is how similar Wilkins is to Melvil Dewey. He wanted to reform language and spelling. Here's the section where he decries alphabetization because the consonants and vowels are "promiscuously huddled together without distinction." This was a fault of language that I have not yet found noted in Dewey's work. Could he have missed some imperfection?!





*Also, Wilkins was a Bishop in the Anglican church, and so his description of the history of language is based literally on the Bible, which makes for some odd conclusions.



[1]Schulte-Albert, Hans G. Classificatory Thinking from Kinner to Wilkins: Classification and Thesaurus Construction, 1645-1668. Quoting from Vickery, B. C. "The Significance of John Wilkins in the History of Bibliographical Classification." Libri 2 (1953): 326-43.
[2]Godby, Carol J, Shenghui Wang, and Jeffrey Mixter. Library Linked Data in the Cloud: Oclc's Experiments with New Models of Resource Description. , 2015.
[3] Wilkins, John. Essay Towards a Real Character, and a Philosophical Language. S.l: Printed for Sa. Gellibrand, and for John Martyn, 1668.

Tuesday, August 16, 2016

The case of the disappearing classification

I'm starting some research into classification in libraries (now that I have more time due to having had to drop social media from my life; see previous post). The main question I want to answer is: why did research into classification drop off at around the same time that library catalogs computerized? This timing may just be coincidence, but I'm suspecting that it isn't.

 I was in library school in 1971-72, and then again in 1978-80. In 1971 I took the required classes of cataloging (two semesters), reference, children's librarianship, library management, and an elective in law librarianship. Those are the ones I remember. There was not a computer in the place, nor do I remember anyone mentioning them in relation to libraries. I was interested in classification theory, but not much was happening around that topic in the US. In England, the Classification Research Group was very active, with folks like D.J. Foskett and Brian Vickery as mainstays of thinking about faceted classification. I wrote my first published article about a faceted classification being used by a UN agency.[1]

 In 1978 the same school had only a few traditional classes. I'd been out of the country, so the change to me was abrupt. Students learned to catalog on OCLC. (We had typed cards!) I was hired as a TA to teach people how to use DIALOG for article searching, even though I'd never seen it used, myself. (I'd already had a job as a computer programmer, so it was easy to learn the rules of DIALOG searching.) The school was now teaching "information science". Here's what that consisted of at the time: research into term frequency of texts; recall and precision; relevance ranking; database development.

I didn't appreciate it at the time, but the school had some of the bigger names in these areas, including William Cooper and M. E. "Bill" Maron. (I only just today discovered why he called himself Bill - the M. E., which is what he wrote under in academia, stands for "Melvin Earl". Even for a nerdy computer scientist, that was too much nerdity.) 1978 was still the early days of computing, at least unless you were on a military project grant or worked for the US Census Bureau. The University of California, Berkeley, did not have visible Internet access. Access to OCLC or DIALOG was via dial-up to their proprietary networks. (I hope someone has or will write that early history of the OCLC network. For its time it must have been amazing.)

The idea that one could search actual text was exciting, but how best to do it was (and still is, to a large extent) unclear. There was one paper, although I so far have not found it, that was about relevance ranking, and was filled with mathematical formulas for calculating relevance. I was determined to understand it, and so I spent countless hours on that paper with a cheat sheet beside me so I could remember what uppercase italic R was as opposed to lower case script r. I made it through the paper to the very end, where the last paragraph read (as I recall): "Of course, there is no way to obtain a value for R[elevance], so this theory cannot be tested." I could have strangled the author (one of my profs) with my bare hands.

Looking at the articles, now, though, I see that they were prescient; or at least that they were working on the beginnings of things we now take for granted. One statement by Maron especially strikes me today:
A second objective of this paper is to show that about is, in fact, not the central concept in a theory of document retrieval. A document retrieval system ought to provide a ranked output (in response to a search query) not according to the degree that they are about the topic sought by the inquiring patron, but rather according to the probability that they will satisfy that person‘s information need. This paper shows how aboutness is related to probability of satisfaction.[2] 
This is from 1977, and it essentially describes the basic theory behind Google ranking. It doesn't anticipate hyperlinking, of course, but it does anticipate that "about" is not the main measure of what will satisfy a searcher's need. Classification, in the traditional sense, is the quintessence of about. Is this the crux of the issue? As yet, I don't know. More to come.

[1]Coyle, Karen (1975). "A Faceted Classification for Occupational Safety and Health". Special Libraries. 66 (5-6): 256–9.
[2]Maron, M. E. (1977) "On Indexing, Retrieval, and the Meaning of About". Journal of the American Society for Information Science, January, 1977, pp. 38-43

Thursday, August 11, 2016

This is what sexism looks like: Wikipedia

We've all heard that there are gender problems on Wikipedia. Honestly there are a lot of problems on Wikipedia, but gender disparity is one of them. Like other areas of online life, on Wikipedia there are thinly disguised and not-so thinly disguised attacks on women. I am at the moment the victim of one of those attacks.

Wikipedia runs on a set of policies that are used to help make decisions about content and to govern behavior. In a sense, this is already a very male approach, as we know from studies of boys and girls at play: boys like a sturdy set of rules, and will spend considerable time arguing whether or not rules are being followed; girls begin play without establishing a set of rules, develop agreed rules as play goes on if needed, but spend little time on discussion of rules.

If you've been on Wikipedia and have read discussions around various articles, you know that there are members of the community that like to "wiki-lawyer" - who will spend hours arguing whether something is or is not within the rules. Clearly, coming to a conclusion is not what matters; this is blunt force, nearly content-less arguing. It eats up hours of time, and yet that is how some folks choose to spend their time. There are huge screaming fights that have virtually no real meaning; it's a kind of fantasy sport.

Wiki-lawyering is frequently used to harass. It is currently going on to an amazing extent in harassment of me, although since I'm not participating, it's even emptier. The trigger was that I sent back for editing two articles about men that two wikipedians thought should not have been sent back. Given that I have reviewed nearly 4000 articles, sending back 75% of those for more work, these two are obviously not significant. What is significant, of course, is that a woman has looked at an article about a man and said: "this doesn't cut it". And that is the crux of the matter, although the only person to see that is me. It is all being discussed as violations of policy, although there are none. But sexism, as with racism, homophobia, transphobia, etc., is almost never direct (and even when it is, it is often denied). Regulating what bathrooms a person can use, or denying same sex couples marriage, is a kind of lawyering around what the real problem is. The haters don't say "I hate transexuals" they just try to make them as miserable as possible by denying them basic comforts. In the past, and even the present, no one said "I don't want to hire women because I consider them inferior" they said "I can't hire women because they just get pregnant and leave."

Because wiki-lawyering is allowed, this kind of harassment is allowed. It's now gone on for two days and the level of discourse has gotten increasingly hysterical. Other than one statement in which I said I would not engage because the issue is not policy but sexism (which no one can engage with), it has all been between the wiki-lawyers, who are working up to a lynch mob. This is gamer-gate, in action, on Wikipedia.

It's too bad. I had hopes for Wikipedia. I may have to leave. But that means one less woman editing, and we were starting to gain some ground.

The best read on this topic, mainly about how hard it is to get information that is threatening to men (aka about women) into Wikipedia: WP:THREATENING2MEN: Misogynist Infopolitics and the Hegemony of the Asshole Consensus on English Wikipedia

I have left Wikipedia, and I also had to delete my Twitter account because they started up there. I may not be very responsive on other media for a while. Thanks to everyone who has shown support, but if by any chance you come across a kinder, gentler planet available for habitation, do let me know. This one's desirability quotient is dropping fast.

Sunday, July 10, 2016

Catalogs and Context: Part V

This entire series is available a single file on my web site.

Before we can posit any solutions to the problems that I have noted in these posts, we need to at least know what questions we are trying to answer. To me, the main question is:

What should happen between the search box and the bibliographic display?

Or as Pauline Cochrane asked: "Why should a user ever enter a search term that does not provide a link to the syndetic apparatus and a suggestion about how to proceed?"[1] I really like the "suggestion about how to proceed" that she included there. Although I can think of some exceptions, I do consider this an important question.

If you took a course in reference work at library school (and perhaps such a thing is no longer taught - I don't know), then you learned a technique called "the reference interview." The Wikipedia article on this is not bad, and defines the concept as an interaction at the reference desk "in which  the librarian responds to the user's initial explanation of his or her information need by first attempting to clarify that need and then by directing the user to appropriate information resources." The assumption of the reference interview is that the user arrives at the library with either an ill-formed query, or one that is not easily translated to the library's sources. Bill Katz's textbook "Introduction to Reference Work" makes the point bluntly:

"Be skeptical of the of information the patron presents" [2]

If we're so skeptical that the user could approach the library with the correct search in mind/hand, then why then do we think that giving the user a search box in which to put that poorly thought out or badly formulated search is a solution? This is another mind-boggler to me.

So back to our question, what SHOULD happen between the search box and the bibliographic display? This is not an easy question, and it will not have a simple answer. Part of the difficulty of the answer is that there will not be one single right answer. Another difficulty is that we won't know a right answer until we try it, give it some time, open it up for tweaking, and carefully observe. That's the kind of thing that Google does when they make changes in their interface, but we haven't got either Google's money nor its network (we depend on vendor systems, which define what we can and cannot do with our catalog).

Since I don't have answers (I don't even have all of the questions) I'll pose some questions, but I really want input from any of you who have ideas on this, since your ideas are likely to be better informed than mine. What do we want to know about this problem and its possible solutions?

(Some of) Karen's Questions

Why have we stopped evolving subject access?

Is it that keyword access is simply easier for users to understand? Did the technology deceive us into thinking that a "syndetic apparatus" is unnecessary? Why have the cataloging rules and bibliographic description been given so much more of our profession's time and development resources than subject access has? [3]

Is it too late to introduce knowledge organization to today's users?

The user of today is very different to the user of pre-computer times. Some of our users have never used a catalog with an obvious knowledge organization structure that they must/can navigate. Would they find such a structure intrusive? Or would they suddenly discover what they had been missing all along? [4]

Can we successfully use the subject access that we already have in library records?

Some of the comments in the articles organized by Cochrane in my previous post were about problems in the Library of Congress Subject Headings (LCSH), in particular that the relationships between headings were incomplete and perhaps poorly designed.[5] Since LCSH is what we have as headings, could we make them better? Another criticism was the sparsity of "see" references, once dictated by the difficulty of updating LCSH. Can this be ameliorated? Crowdsourced? Localized?

We still do not have machine-readable versions of the Library of Congress Classification (LCC), and the machine-readable Dewey Decimal Classification (DDC) has been taken off-line (and may be subject to licensing). Could we make use of LCC/DDC for knowledge navigation if they were available as machine-readable files?

Given that both LCSH and LCC/DDC have elements of post-composition and are primarily instructions for subject catalogers, could they be modified for end-user searching, or do we need to develop a different instrument altogether?

How can we measure success?

Without Google's user laboratory apparatus, the answer to this may be: we can't. At least, we cannot expect to have a definitive measure. How terrible would it be to continue to do as we do today and provide what we can, and presume that it is better than nothing? Would we really see, for example, a rise in use of library catalogs that would confirm that we have done "the right thing?"


Notes

[1]*Modern Subject Access in the Online Age: Lesson 3
Author(s): Pauline A. Cochrane, Marcia J. Bates, Margaret Beckman, Hans H. Wellisch, Sanford Berman, Toni Petersen, Stephen E. Wiberley and Jr.
Source: American Libraries, Vol. 15, No. 4 (Apr., 1984), pp. 250-252, 254-255
Stable URL: http://www.jstor.org/stable/25626708

[2] Katz, Bill. Introduction to Reference Work: Reference Services and Reference Processes. New York: McGraw-Hill, 1992. p. 82 http://www.worldcat.org/oclc/928951754. Cited in: Brown, Stephanie Willen. The Reference Interview: Theories and Practice. Library Philosophy and Practice 2008. ISSN 1522-0222

[3] One answer, although it doesn't explain everything, is economic: the cataloging rules are published by the professional association and are a revenue stream for it. That provides an incentive to create new editions of rules. There is no economic gain in making updates to the LCSH. As for the classifications, the big problem there is that they are permanently glued onto the physical volumes making retroactive changes prohibitive. Even changes to descriptive cataloging must be moderated so as to minimize disruption to existing catalogs, which we saw happen during the development of RDA, but with some adjustments the new and the old have been made to coexist in our catalogs.

[4] Note that there are a few places online, in particular Wikipedia, where there is a mild semblance of organized knowledge and with which users are generally familiar. It's not the same as the structure that we have in subject headings and classification, but users are prompted to select pre-formed headings, with a keyword search being secondary.

[5] Simon Spero did a now famous (infamous?) analysis of LCSH's structure that started with Biology and ended with Doorbells.