This is a simple utility that attempts to disambiguate
male and female first names by identifying the component of a name that
corresponds to the first name, and then matching it against lists of male and
female names that are provided by the US Census. If a name appears in both
male and female lists, the program looks at the frequency with which the name
appears in each list, and chooses the higher frequency.
The program first attempts to isolate the first name, using the presence
of a comma in the name string. The first name is anything following the
comma.
If there are several names, only the first one is taken into
consideration. For example, the name "Smith, John" yields a first name of
"John." The name "Smith, John Henry" still yields a first name of "John."
If there is no comma in the name ("J. S.", or "Constant Reader"), there
is no first name, and the name cannot be disambiguated.
If there is a first name, it is compared to the names in the male and
female census lists. If it appears in only one list, then it is marked as
male or female, accordingly. If it appears in both lists, then it is marked
according to the higher frequency. For example, the name "Andrew" appears in
the male list with a frequency of xx and in the female list with a frequency
of 0.002. In this case, the person whose first name is "Andrew" is marked as
male, as it is more likely.
The resulting data is displayed in the box at the center of the page as
a series of rows with fields that are delimited by the pipe (|) character,
so they can be imported into a database or spreadsheet.
If a name has a first name, but it doesn't appear in either list, it is
marked as male. We chose this outcome because we are working with historical
materials, it is not necessarily applicable to all periods.
Caveat: This utility is intended to work on prepared
lists, and to give a rough categorization. Names will have to be further
disambiguated by hand.
For more information about the lists of names and the methodology behind
them, please see the following pages: