Sweet victory!
After struggling for a week with the same problem, I have a solution (recorded here for the aeons, of course).
Challenge: To get a drop-down menu, populated by a value list populated by a field in a related table, to display *all* the records' data in that field, not just the first record's. Apparently you can't use a calculation field (which I was trying to do) if you want to do this.
In my case, after the user enters a first name, last name, suffice, and honorific, I want to have a field calculate the complete name and title of the given person. Then I want this info to appear in a drop-down menu elsewhere.
What threw me off is that calculation fields don't seem to work properly, on account of not being directly editable. The workaround I finally used is to set the field for the calculated value to a Text field (not calculation) and then tell it to insert calculated data automatically.
I'm not sure that I've got all the bugs worked out yet. Nevertheless, it feels good to get this far!
Update: To prevent the field from being modifiable by hand (as opposed to calculation), just select the option to prevent modification during data entry.
Tuesday, December 14, 2010
Monday, December 6, 2010
Fixed MS Office on Mac!
Well, that was a royal pain!
I've had MS Office 2008, which worked fine except for Excel. "Not enough memory" every time it tried to start up. No macros to blame.
Went round and round with our (very helpful!) IT crew, to no avail. We tried:
Anyway, I got tired of not being able to do my work, so I tried troubleshooting on my own. Long story short, I wiped out 2008, reinstalled 2011, and booted the computer (not Word) in Safe Mode.
It worked!
I checked my extensions, and lo and behold! Only one thing was checked at startup: iTunesHelper.
I unchecked it.
Now Office works just fine. I really don't care all that much about iTunes, so we're good! (What is iTunesHelper anyway? Is it anything like HamburgerHelper?)
I've had MS Office 2008, which worked fine except for Excel. "Not enough memory" every time it tried to start up. No macros to blame.
Went round and round with our (very helpful!) IT crew, to no avail. We tried:
- Uninstall MS Office 2008 and install 2011—Now everything crashed!
- Revert to 2008—Again, no Excel.
- Re-install just Excel 2008—It worked, albeit with a different error message that still allowed me to use the program.
Anyway, I got tired of not being able to do my work, so I tried troubleshooting on my own. Long story short, I wiped out 2008, reinstalled 2011, and booted the computer (not Word) in Safe Mode.
It worked!
I checked my extensions, and lo and behold! Only one thing was checked at startup: iTunesHelper.
I unchecked it.
Now Office works just fine. I really don't care all that much about iTunes, so we're good! (What is iTunesHelper anyway? Is it anything like HamburgerHelper?)
Monday, October 11, 2010
"No 'Kindle' Freedom in Libraries'
Interesting take on whether libraries and e-publishing will ever get along. Especially like the analysis of the strengths and weaknesses of each at the end.
http://www.insidehighered.com/blogs/library_babel_fish/why_there_s_no_kindle_freedom_in_libraries
http://www.insidehighered.com/blogs/library_babel_fish/why_there_s_no_kindle_freedom_in_libraries
Wednesday, September 29, 2010
Epson/Colorburst workaround
So, how do we get our color printer to print now?
Send the file to the ColorBurst software as normal.
Activate it and Rip it.
If it doesn't print:
The CB software will automatically switch to "Rip Off."
Make sure that the file is not selected.
Click "Rip On".
Repeat as necessary.
Send the file to the ColorBurst software as normal.
Activate it and Rip it.
If it doesn't print:
The CB software will automatically switch to "Rip Off."
Make sure that the file is not selected.
Click "Rip On".
Repeat as necessary.
Wednesday, September 15, 2010
Been kind of quiet lately...
...at least on the IT front.
Other than the erratic un/willingness of the Epson printer to print. We still haven't figured that one out.
Other than the erratic un/willingness of the Epson printer to print. We still haven't figured that one out.
Thursday, August 26, 2010
ColorBurst explodes on me
We have an Epson 7600 color printer--a monster--and print with ColorBurst 5.8.5. A series of problems and solutions have alternately bedeviled and beangeled me since yesterday.
Problem: CB crashes on opening.
Solution: Delete "HotFolder".
Problem: Settings are lost.
Solution: Figure them out again. Test printing.
Problem: Documents printed don't show up in CB.
Solution: Print to the right printer, Dodo!
Problem: Documents show up but can't be deleted.
Solution (thanks to tech support forums!): Delete preferences and RIP support folders, then restart CB.
Problem: CB can't find printer.
Solution: Edit the printer and select "Test IP Connection." Get message: "Printer found." Voila'! It prints!
Kind of anti-climactic, but who am I to complain? :-)
Problem: CB crashes on opening.
Solution: Delete "HotFolder".
Problem: Settings are lost.
Solution: Figure them out again. Test printing.
Problem: Documents printed don't show up in CB.
Solution: Print to the right printer, Dodo!
Problem: Documents show up but can't be deleted.
Solution (thanks to tech support forums!): Delete preferences and RIP support folders, then restart CB.
Problem: CB can't find printer.
Solution: Edit the printer and select "Test IP Connection." Get message: "Printer found." Voila'! It prints!
Kind of anti-climactic, but who am I to complain? :-)
Thursday, August 12, 2010
The Trouble with Textile
...is that there doesn't seem to be any good online source of documentation. Granted, Textile doesn't seem to need a lot of documentation. But it would be helpful. As a new Textiler, I wanted to create a hyperlink to an anchor, and I had to search around until I found this helpful post:
Perhaps it should have been obvious to the author of the post, but it wasn't to me. Also unclear to me was the fact that the "." at the end is part of the Textile markup, not the English sentence--the Textile code won't compile properly without it. That shows how much of a novice I am, but it would be nice for novices like myself to have a resource to find this sort of thing out rather than hunting and testing for aeons (ok, tens of minutes).
I'm posting it here for posterity. :-)
p(#anchorName). Inserts an anchor with "anchorName" as the ID into a "p" tag.
"Link":/path/#anchorName Links to the anchor; "/path/" can be omitted if the anchor is on the same page.
Perhaps it should have been obvious to the author of the post, but it wasn't to me. Also unclear to me was the fact that the "." at the end is part of the Textile markup, not the English sentence--the Textile code won't compile properly without it. That shows how much of a novice I am, but it would be nice for novices like myself to have a resource to find this sort of thing out rather than hunting and testing for aeons (ok, tens of minutes).
I'm posting it here for posterity. :-)
An additional reason to use Acrobat
As a follow-up to the previous post, I should mention that 30 minutes before the design meeting I received an urgent call to print SIX more covers.
Our cover printer prints... S... L... O... W...
which meant I had to get the covers *ready* ASAP. And Acrobat helped me get them ready really, really quickly. Meeting was still 5-10 minutes late, but not bad. :-)
Our cover printer prints... S... L... O... W...
which meant I had to get the covers *ready* ASAP. And Acrobat helped me get them ready really, really quickly. Meeting was still 5-10 minutes late, but not bad. :-)
Thursday, August 5, 2010
Printing Book Covers
We have a design meeting coming up, and since our design person just retired it fell to me to print the flashy new cover proposals. VERY cool printer, VERY big and intimidating if you've never used it, printing on VERY VERY expensive card stock!
= Low margin for error. *gulp*
Anyway, the printer paper is 24" x 12", large enough for several book covers to be printed off in a row. The design person used to manually copy and paste each cover into a massive Adobe Photoshop document, then work some Page Setup/Print magic and voila'! Multiple covers!
I'm not a Photoshop guy--more of an Acrobat person, myself. My solution, carefully selected after hours of wrangling with PS to follow my wishes as well as my commands, was more along the lines of:
EXCEPT that one of the covers was twice as large as the others (front AND back instead of just the front). Which meant that Acrobat wanted to squish it (to use the technical vocabulary) down to a normal 8.5x11 sheet like the others when printing three-across.
But I hated to throw out a good, time-efficient idea for one little reason like that.
Picture package was one option, but I'd already determined that it wouldn't work for other reasons. Same with its twin, Contact Sheet II. PS was out of the question, as I now have an eternal despising of copying and pasting various layers in that artist's dreamworld.
Final Answer: Duplicate the last page (i.e., merge the same file into the compiled PDF again). Use Acrobat's "Crop Page" option to crop one image down to the back cover (and spine), and the other down to the front cover.
Worked like a dream. :-)
= Low margin for error. *gulp*
Anyway, the printer paper is 24" x 12", large enough for several book covers to be printed off in a row. The design person used to manually copy and paste each cover into a massive Adobe Photoshop document, then work some Page Setup/Print magic and voila'! Multiple covers!
I'm not a Photoshop guy--more of an Acrobat person, myself. My solution, carefully selected after hours of wrangling with PS to follow my wishes as well as my commands, was more along the lines of:
- Combine the PDFs into a single file in Acrobat.
- Change the Page Setup to print 24x12.
- At the Print dialogue, tell Acrobat to print multiple pages per sheet, namely 3x1.
EXCEPT that one of the covers was twice as large as the others (front AND back instead of just the front). Which meant that Acrobat wanted to squish it (to use the technical vocabulary) down to a normal 8.5x11 sheet like the others when printing three-across.
But I hated to throw out a good, time-efficient idea for one little reason like that.
Picture package was one option, but I'd already determined that it wouldn't work for other reasons. Same with its twin, Contact Sheet II. PS was out of the question, as I now have an eternal despising of copying and pasting various layers in that artist's dreamworld.
Final Answer: Duplicate the last page (i.e., merge the same file into the compiled PDF again). Use Acrobat's "Crop Page" option to crop one image down to the back cover (and spine), and the other down to the front cover.
Worked like a dream. :-)
Friday, July 30, 2010
A new job! - and - Meet Textile!
So, at long last, I have a full-time (temp) job! I'll be helping with IT at a university press. :-)
I haven't started yet, but I've been hanging around to learn more about the press and my responsibilities therein. My job will in large part consist of overseeing a MySQL database/FileMaker Pro installation and doing tech support for my coworkers.
Today, I got my hands dirty in the press for the first time. It was also my introduction to Textile. Let me explain.
My coworkers use Textile to code their website. I've never used it before. But thanks to a good search engine (Google) and Wikipedia, I quickly tracked down the solution to a problem plaguing a coworker in Marketing.
Problem: With perfect syntax, a link was nonetheless not showing up as a link.
Solution: She had to change smart-quotes to straight-quotes. Suddenly, it worked fine!
What's more interesting to me is the tools I found along the way: the Full Syntax Reference for Textile. The really nifty part of this site, though, is that it has a Textile translator. Suppose you enter
The translator converts this into XHTML...
and shows how it will appear on a webpage!
Voila'!
I ALSO learned that FileMaker Pro 10 moved its "Toggle Smart Quotes" setting out of Preferences into File Options (under the File Menu).
Unresolved: Even though the Smart Quotes option was turned off on my coworker's machine, it was still putting them in! Go figure...
I haven't started yet, but I've been hanging around to learn more about the press and my responsibilities therein. My job will in large part consist of overseeing a MySQL database/FileMaker Pro installation and doing tech support for my coworkers.
Today, I got my hands dirty in the press for the first time. It was also my introduction to Textile. Let me explain.
My coworkers use Textile to code their website. I've never used it before. But thanks to a good search engine (Google) and Wikipedia, I quickly tracked down the solution to a problem plaguing a coworker in Marketing.
Problem: With perfect syntax, a link was nonetheless not showing up as a link.
Solution: She had to change smart-quotes to straight-quotes. Suddenly, it worked fine!
What's more interesting to me is the tools I found along the way: the Full Syntax Reference for Textile. The really nifty part of this site, though, is that it has a Textile translator. Suppose you enter
"http://www.google.com":http://www.google.com
The translator converts this into XHTML...
<p><a href="http://www.google.com">http://www.google.com</a></p>
and shows how it will appear on a webpage!
http://www.google.com
Voila'!
I ALSO learned that FileMaker Pro 10 moved its "Toggle Smart Quotes" setting out of Preferences into File Options (under the File Menu).
Unresolved: Even though the Smart Quotes option was turned off on my coworker's machine, it was still putting them in! Go figure...
Thursday, July 22, 2010
Project 1 - Displaying the Results
This class is unfinished. It was to display the results of the previous operations in an HTML browser window, using the default browser of the user's computer. But as it turned out, so few results were returned from the Internet Archive--and so prohibitive would it have been to sift the good results from false positives--that the project was scrapped.
Up side: I saved my department time and money by showing that this wasn't worth pursuing!
/**
*
*/
package iasearcher;
import java.io.*;
import java.util.*;
/**
* @author slittle2
*
* IAResults.java 1.0 displays an HTML page with the results of
* the IASearcher.java search. It divides the results into three
* parts: No hits, one hit, and multiple hits. Each gives the title
* and author of the work, plus links to search the IA full text
* and the CRRA record so that the user can compare the two. It also
* prints the IA "key" next to the links. The multiple results page
* displays the multiple results in a subordinate list.
*
* To do in the future: Generate HTML and open without having to save
* save to a file (unless having the HTML results is desirable?).
*
* Uses http://java.sun.com/javase/6/docs/api/java/awt/Desktop.html#browse%28java.net.URI%29
*
*/
public class IAResults{
/**
* @param args
*/
public static void main(String[] args) throws IOException {
/* Web page should look like this:
*
* Report 1 - No hits
* * Title/Author [(Search IA) (Search CRRA) ?]
* ...
*
* Report 2 - 1 hit
* * Title/Author (Search IA) (Search CRRA) (Key)
* ...
*
* Report 3 - Multiple hits
* * Title/Author
* * (Search IA) (Search CRRA) (Key)
* * (Search IA) (Search CRRA) (Key)
* ...
*
*
* Basic code:
*
* <html>
*
* <body>
*
* <h1>Report 1 - No hits</h1>
* <ul>
* <li>Title/Author</li>
* ... (more results)
* </ul>
*
* <h1>Report 2 - 1 hit</h1>
* <ul>
* <li>Title/Author (Search IA) (Search CRRA) (Key)</li>
* ... (more results)
* </ul>
*
* <h1>Report 3 - Multiple hits</h1>
* <ul>
* <li>Title/Author
* <ul>
* <li>(Search IA) (Search CRRA) (Key)</li>
* ... (more results)
* </ul>
* </li>
* ... (more results)
* </ul>
*
* </body>
*
* </html>
*
*/
// Initialize variables
BufferedReader noResultsFile = null;
BufferedReader oneResultFile = null;
BufferedReader manyResultsFile = null;
LinkedHashSet noResultsSet = new LinkedHashSet(); // Sets to import the results data into
LinkedHashSet oneResultSet = new LinkedHashSet();
LinkedHashSet manyResultsSet = new LinkedHashSet();
String data = " "; // Generic variable used for reading Strings
// Open files and load results into appropriate sets
try {
noResultsFile = new BufferedReader((Reader) new FileReader("C:/Documents and Settings/slittle2/workspace/MarcRetriever/noResults.txt"));
oneResultFile = new BufferedReader((Reader) new FileReader("C:/Documents and Settings/slittle2/workspace/MarcRetriever/oneResult.txt"));
manyResultsFile = new BufferedReader((Reader) new FileReader("C:/Documents and Settings/slittle2/workspace/MarcRetriever/manyResults.txt"));
while ((data = noResultsFile.readLine()) != null) {
noResultsSet.add(data);
}
while ((data = oneResultFile.readLine()) != null) {
oneResultSet.add(data);
}
while ((data = manyResultsFile.readLine()) != null) {
manyResultsSet.add(data);
}
// System.out.println(noResultsSet.toString()); TODO remove test code
// System.out.println(oneResultSet.toString());
// System.out.println(manyResultsSet.toString());
}catch (FileNotFoundException e){
System.err.println("*** File Not Found ***");
e.getStackTrace();
}finally{
if(noResultsFile != null) noResultsFile.close();
if(oneResultFile != null) oneResultFile.close();
if(manyResultsFile != null) manyResultsFile.close();
}
// Output strings into a single HTML file
Iterator iter = noResultsSet.iterator(); // TODO find author/title pairs
while(iter.hasNext()){
data = (String) iter.next();
System.out.println(data); // TODO remove test code
}
iter = oneResultSet.iterator(); // TODO find author/title pairs
while(iter.hasNext()){
data = (String) iter.next();
System.out.println(data); // TODO remove test code
}
iter = manyResultsSet.iterator(); // TODO find author/title pairs; break down strings into substrings
while(iter.hasNext()){
data = (String) iter.next();
System.out.println(data); // TODO remove test code
}
// Open HTML file with .awt.Desktop class
}
}
Up side: I saved my department time and money by showing that this wasn't worth pursuing!
/**
*
*/
package iasearcher;
import java.io.*;
import java.util.*;
/**
* @author slittle2
*
* IAResults.java 1.0 displays an HTML page with the results of
* the IASearcher.java search. It divides the results into three
* parts: No hits, one hit, and multiple hits. Each gives the title
* and author of the work, plus links to search the IA full text
* and the CRRA record so that the user can compare the two. It also
* prints the IA "key" next to the links. The multiple results page
* displays the multiple results in a subordinate list.
*
* To do in the future: Generate HTML and open without having to save
* save to a file (unless having the HTML results is desirable?).
*
* Uses http://java.sun.com/javase/6/docs/api/java/awt/Desktop.html#browse%28java.net.URI%29
*
*/
public class IAResults{
/**
* @param args
*/
public static void main(String[] args) throws IOException {
/* Web page should look like this:
*
* Report 1 - No hits
* * Title/Author [(Search IA) (Search CRRA) ?]
* ...
*
* Report 2 - 1 hit
* * Title/Author (Search IA) (Search CRRA) (Key)
* ...
*
* Report 3 - Multiple hits
* * Title/Author
* * (Search IA) (Search CRRA) (Key)
* * (Search IA) (Search CRRA) (Key)
* ...
*
*
* Basic code:
*
* <html>
*
* <body>
*
* <h1>Report 1 - No hits</h1>
* <ul>
* <li>Title/Author</li>
* ... (more results)
* </ul>
*
* <h1>Report 2 - 1 hit</h1>
* <ul>
* <li>Title/Author (Search IA) (Search CRRA) (Key)</li>
* ... (more results)
* </ul>
*
* <h1>Report 3 - Multiple hits</h1>
* <ul>
* <li>Title/Author
* <ul>
* <li>(Search IA) (Search CRRA) (Key)</li>
* ... (more results)
* </ul>
* </li>
* ... (more results)
* </ul>
*
* </body>
*
* </html>
*
*/
// Initialize variables
BufferedReader noResultsFile = null;
BufferedReader oneResultFile = null;
BufferedReader manyResultsFile = null;
LinkedHashSet noResultsSet = new LinkedHashSet(); // Sets to import the results data into
LinkedHashSet oneResultSet = new LinkedHashSet();
LinkedHashSet manyResultsSet = new LinkedHashSet();
String data = " "; // Generic variable used for reading Strings
// Open files and load results into appropriate sets
try {
noResultsFile = new BufferedReader((Reader) new FileReader("C:/Documents and Settings/slittle2/workspace/MarcRetriever/noResults.txt"));
oneResultFile = new BufferedReader((Reader) new FileReader("C:/Documents and Settings/slittle2/workspace/MarcRetriever/oneResult.txt"));
manyResultsFile = new BufferedReader((Reader) new FileReader("C:/Documents and Settings/slittle2/workspace/MarcRetriever/manyResults.txt"));
while ((data = noResultsFile.readLine()) != null) {
noResultsSet.add(data);
}
while ((data = oneResultFile.readLine()) != null) {
oneResultSet.add(data);
}
while ((data = manyResultsFile.readLine()) != null) {
manyResultsSet.add(data);
}
// System.out.println(noResultsSet.toString()); TODO remove test code
// System.out.println(oneResultSet.toString());
// System.out.println(manyResultsSet.toString());
}catch (FileNotFoundException e){
System.err.println("*** File Not Found ***");
e.getStackTrace();
}finally{
if(noResultsFile != null) noResultsFile.close();
if(oneResultFile != null) oneResultFile.close();
if(manyResultsFile != null) manyResultsFile.close();
}
// Output strings into a single HTML file
Iterator iter = noResultsSet.iterator(); // TODO find author/title pairs
while(iter.hasNext()){
data = (String) iter.next();
System.out.println(data); // TODO remove test code
}
iter = oneResultSet.iterator(); // TODO find author/title pairs
while(iter.hasNext()){
data = (String) iter.next();
System.out.println(data); // TODO remove test code
}
iter = manyResultsSet.iterator(); // TODO find author/title pairs; break down strings into substrings
while(iter.hasNext()){
data = (String) iter.next();
System.out.println(data); // TODO remove test code
}
// Open HTML file with .awt.Desktop class
}
}
Project 1 - Updating the MARC Records
Once the Internet Archive's documents were mirrored locally, I had to add the local and IA URLs to the MARC records. In practice, since I was using a file and not directly accessing the MARC database, I saved the revised records to a new file, which could then be added to the database.
/**
*
*/
package iasearcher;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.Reader;
import java.io.Writer;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.net.URLConnection;
import java.util.Iterator;
import java.util.LinkedHashSet;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcStreamWriter;
import org.marc4j.MarcWriter;
import org.marc4j.marc.DataField;
import org.marc4j.marc.MarcFactory;
import org.marc4j.marc.Record;
import org.marc4j.marc.Subfield;
/**
* @author slittle2
*
* Once files have been retrieved from the Internet Archive,
* UpdateMarc updates the MARC records with two things:
*
* - the URL of the IA directory
* - the URL of the local copy
*
* Each is saved into a new 856 subfield U
*
*/
public class UpdateMarc {
/**
* @param args
* @throws IOException
*/
// Here for testing purposes
public static void main(String[] args) throws IOException {
// Values for test run - may be changed as needed
String marcFile = "C:/Documents and Settings/slittle2/Desktop/updated.marc";
String tempFile = "C:/Documents and Settings/slittle2/Desktop/temp.marc";
String oneHitLog = "C:/Documents and Settings/slittle2/workspace/MarcRetriever/Success Files 5-26/oneResult.txt";
updater(marcFile, oneHitLog, tempFile);
}
public static void updater(String marcFile, String oneHitLog, String tempFile)
throws IOException {
LinkedHashSet<KeyDatum> keyData = searchKATIL(oneHitLog);
boolean append = true;
// Find and update the appropriate MARC record:
// Open MARC database
InputStream in = null;
OutputStream out = null;
try {
in = new FileInputStream(marcFile);
out = new FileOutputStream(tempFile, append);
MarcReader reader = new MarcStreamReader(in);
MarcWriter writer = new MarcStreamWriter(out);
// While iterator.hasNext(), search the MARC records for all
// matching author/title
while (reader.hasNext()) {
Record record = reader.next();
String author = "";
String title = "";
// Create iterator over keyData
Iterator<KeyDatum> iter = keyData.iterator();
// Match current record author/title against entire keyData list
author = getFullAuthor(record);
title = getTitle(record);
while(iter.hasNext()){
KeyDatum datum = (KeyDatum) iter.next();
// If found:
// Add 856$U w/ $Z "Original Location" & IA URL
// Add 856$U w/ $Z "Local Mirror" & local URL
if(author.equalsIgnoreCase(datum.author) & title.equalsIgnoreCase(datum.title)){
System.out.println("It matches!\t" + record);
// add a data field for IA URL
MarcFactory factory = MarcFactory.newInstance();
DataField df = factory.newDataField("856", '0', '4');
df.addSubfield(factory.newSubfield('u', datum.iaURL));
df.addSubfield(factory.newSubfield('z', "ORIGINAL LOCATION"));
record.addVariableField(df);
// add another data field for local URL
DataField dq = factory.newDataField("856", '0', '4');
dq.addSubfield(factory.newSubfield('u', datum.localURL));
dq.addSubfield(factory.newSubfield('z', "LOCAL MIRROR"));
record.addVariableField(dq);
writer.write(record);
System.out.println("Updated Record:\t" + record);
break;
}
} // end while
} // end while
writer.close();
} finally {
// Close input/output streams
if (out != null)
out.close();
if (in != null)
in.close();
}
}
private static String getTitle(Record record) {
// get data field 245
DataField field = (DataField) record.getVariableField("245");
Subfield subfield;
String title = "";
try {
// get the title proper
subfield = field.getSubfield('a');
title = subfield.getData();
} catch (NullPointerException npe) {
title = " ";
}
return title;
}
private static String getFullAuthor(Record record) {
String author1 = "";
String author2 = "";
String author3 = "";
// get data field 100
DataField field = (DataField) record
.getVariableField("100");
// get the author proper, part 1
Subfield subfield;
try {
subfield = field.getSubfield('a');
author1 = subfield.getData();
} catch (NullPointerException npe) {
author1 = " ";
}
// get the author proper, part 2
try {
subfield = field.getSubfield('b');
author2 = subfield.getData();
} catch (NullPointerException npe) {
author2 = " ";
}
// get the author proper, part 3
try {
subfield = field.getSubfield('c');
author3 = subfield.getData();
} catch (NullPointerException npe) {
author3 = " ";
}
return author1 + author2 + author3;
}
// Gets the Key, Author, Title, and IA & Local URL
private static LinkedHashSet<KeyDatum> searchKATIL(String oneHitLog) throws IOException {
LinkedHashSet<KeyDatum> kati = new LinkedHashSet<KeyDatum>();
LinkedHashSet<KeyDatum> previous = new LinkedHashSet<KeyDatum>();
// Open file
BufferedReader inFile = null; // create a new stream to open a file
BufferedReader inFile2 = null;
BufferedWriter outFile = null;
final String addressRoot = "http://www.archive.org/download/";
final String localRoot = "http://zoia.library.nd.edu//sandbox/books";
final String outFileLocation = "C:/Documents and Settings/slittle2/Desktop/outFile.txt";
try {
inFile = new BufferedReader((Reader) new FileReader(oneHitLog));
inFile2 = new BufferedReader ((Reader) new FileReader(outFileLocation));
String data = " ";
String data2 = " ";
boolean old = true; // This is true because all the results should be stored in a local file now.
// Load previous results into memory
while((data2 = inFile2.readLine()) != null) {
String[] splitData2 = data2.split("\t");
previous.add(new KeyDatum(splitData2[0],splitData2[1],splitData2[2],splitData2[3],splitData2[4]));
}
inFile2.close();
outFile = new BufferedWriter((Writer) new FileWriter(outFileLocation, true));
// Retrieve URLs from file & send to Internet Archive
while ((data = inFile.readLine()) != null) {
// Extract keys
String[] splitData = data.split("\t");
// Load each Key, Author, Title into a KeyDatum; leave other two
// blank
KeyDatum keyDatum = new KeyDatum(splitData[2], splitData[0],
splitData[1], "", "");
// Check and see if already in previous results
Iterator<KeyDatum> iter = previous.iterator();
while (iter.hasNext()) {
KeyDatum next = iter.next();
if (keyDatum.compareQuick(next)) {
old = true;
kati.add(next);
break;
}
}
if (!old) {
// Generate IA URL
keyDatum.iaURL = addressRoot + keyDatum.key + "/";
// Generate local URL
data = (keyDatum.iaURL).toString();
data = redirectAndTrim(data);
keyDatum.localURL = data.replaceFirst("http:/", localRoot);
outFile.append(keyDatum.toString("\t"));
// Adds the new KeyDatum to the LHS
kati.add(keyDatum);
System.out.println(keyDatum.toString("\t"));
}
}
} catch (MalformedURLException e) {
System.err.println("*** Malformed URL Exception ***");
} catch (FileNotFoundException e) {
System.err.println("*** File not found! ***");
e.printStackTrace();
} catch (IOException e) {
System.err.println("*** IO Exception ***");
e.getStackTrace();
} finally {
if (inFile != null)
inFile.close();
if (outFile != null)
outFile.close();
}
return kati;
}
// TODO Can't I just use the one in IASearcher?
protected static String redirectAndTrim(String key) throws IOException {
// Retrieve the redirected URL from IA
URI uri = null;
URL url = null;
InputStream inURI = null;
String newURL = "";
try {
// Open connection to IA
uri = new URI(key);
url = uri.toURL();
URLConnection yc = url.openConnection();
HttpURLConnection h = (HttpURLConnection) yc;
HttpURLConnection.setFollowRedirects(true);
h.getInputStream(); // Necessary to force redirect!
newURL = h.getURL().toString();
return newURL;
// Catching errors
} catch (URISyntaxException e) {
System.err.println("*** URI Syntax Exception ***");
e.printStackTrace();
} catch (MalformedURLException e) {
System.err.println("*** Malformed URL Exception ***");
e.printStackTrace();
} catch (FileNotFoundException e) {
System.err.println("*** File not found! ***");
e.printStackTrace();
} catch (IOException e) {
System.err.println("*** IO Exception ***");
e.getStackTrace();
} finally {
if (inURI != null)
inURI.close();
}
return null;
}
}
// Class for handling the various kinds of data
// Each key maps to 1 each of: author, title, IA URL, & local URL
class KeyDatum {
protected String key;
protected String author;
protected String title;
protected String iaURL;
protected String localURL;
KeyDatum() {
key = "";
author = "";
title = "";
iaURL = "";
localURL = "";
}
KeyDatum(String k, String a, String t, String i, String l) {
key = k;
author = a;
title = t;
iaURL = i;
localURL = l;
}
// Returns all fields as a string separated by the passed string (e.g. \n or \t
public String toString(String c){
return new String(key + c + author + c + title + c + iaURL + c + localURL + c + "\n");
}
public boolean compare(KeyDatum datum){
if(this.key.equalsIgnoreCase(datum.key) &
this.author.equalsIgnoreCase(datum.author) &
this.title.equalsIgnoreCase(datum.title) &
this.iaURL.equalsIgnoreCase(datum.iaURL) &
this.localURL.equalsIgnoreCase(datum.localURL))
return true;
return false;
}
public boolean compareQuick(KeyDatum datum) {
if(this.key.equalsIgnoreCase(datum.key))
return true;
return false;
}
}
/**
*
*/
package iasearcher;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.Reader;
import java.io.Writer;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.net.URLConnection;
import java.util.Iterator;
import java.util.LinkedHashSet;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcStreamWriter;
import org.marc4j.MarcWriter;
import org.marc4j.marc.DataField;
import org.marc4j.marc.MarcFactory;
import org.marc4j.marc.Record;
import org.marc4j.marc.Subfield;
/**
* @author slittle2
*
* Once files have been retrieved from the Internet Archive,
* UpdateMarc updates the MARC records with two things:
*
* - the URL of the IA directory
* - the URL of the local copy
*
* Each is saved into a new 856 subfield U
*
*/
public class UpdateMarc {
/**
* @param args
* @throws IOException
*/
// Here for testing purposes
public static void main(String[] args) throws IOException {
// Values for test run - may be changed as needed
String marcFile = "C:/Documents and Settings/slittle2/Desktop/updated.marc";
String tempFile = "C:/Documents and Settings/slittle2/Desktop/temp.marc";
String oneHitLog = "C:/Documents and Settings/slittle2/workspace/MarcRetriever/Success Files 5-26/oneResult.txt";
updater(marcFile, oneHitLog, tempFile);
}
public static void updater(String marcFile, String oneHitLog, String tempFile)
throws IOException {
LinkedHashSet<KeyDatum> keyData = searchKATIL(oneHitLog);
boolean append = true;
// Find and update the appropriate MARC record:
// Open MARC database
InputStream in = null;
OutputStream out = null;
try {
in = new FileInputStream(marcFile);
out = new FileOutputStream(tempFile, append);
MarcReader reader = new MarcStreamReader(in);
MarcWriter writer = new MarcStreamWriter(out);
// While iterator.hasNext(), search the MARC records for all
// matching author/title
while (reader.hasNext()) {
Record record = reader.next();
String author = "";
String title = "";
// Create iterator over keyData
Iterator<KeyDatum> iter = keyData.iterator();
// Match current record author/title against entire keyData list
author = getFullAuthor(record);
title = getTitle(record);
while(iter.hasNext()){
KeyDatum datum = (KeyDatum) iter.next();
// If found:
// Add 856$U w/ $Z "Original Location" & IA URL
// Add 856$U w/ $Z "Local Mirror" & local URL
if(author.equalsIgnoreCase(datum.author) & title.equalsIgnoreCase(datum.title)){
System.out.println("It matches!\t" + record);
// add a data field for IA URL
MarcFactory factory = MarcFactory.newInstance();
DataField df = factory.newDataField("856", '0', '4');
df.addSubfield(factory.newSubfield('u', datum.iaURL));
df.addSubfield(factory.newSubfield('z', "ORIGINAL LOCATION"));
record.addVariableField(df);
// add another data field for local URL
DataField dq = factory.newDataField("856", '0', '4');
dq.addSubfield(factory.newSubfield('u', datum.localURL));
dq.addSubfield(factory.newSubfield('z', "LOCAL MIRROR"));
record.addVariableField(dq);
writer.write(record);
System.out.println("Updated Record:\t" + record);
break;
}
} // end while
} // end while
writer.close();
} finally {
// Close input/output streams
if (out != null)
out.close();
if (in != null)
in.close();
}
}
private static String getTitle(Record record) {
// get data field 245
DataField field = (DataField) record.getVariableField("245");
Subfield subfield;
String title = "";
try {
// get the title proper
subfield = field.getSubfield('a');
title = subfield.getData();
} catch (NullPointerException npe) {
title = " ";
}
return title;
}
private static String getFullAuthor(Record record) {
String author1 = "";
String author2 = "";
String author3 = "";
// get data field 100
DataField field = (DataField) record
.getVariableField("100");
// get the author proper, part 1
Subfield subfield;
try {
subfield = field.getSubfield('a');
author1 = subfield.getData();
} catch (NullPointerException npe) {
author1 = " ";
}
// get the author proper, part 2
try {
subfield = field.getSubfield('b');
author2 = subfield.getData();
} catch (NullPointerException npe) {
author2 = " ";
}
// get the author proper, part 3
try {
subfield = field.getSubfield('c');
author3 = subfield.getData();
} catch (NullPointerException npe) {
author3 = " ";
}
return author1 + author2 + author3;
}
// Gets the Key, Author, Title, and IA & Local URL
private static LinkedHashSet<KeyDatum> searchKATIL(String oneHitLog) throws IOException {
LinkedHashSet<KeyDatum> kati = new LinkedHashSet<KeyDatum>();
LinkedHashSet<KeyDatum> previous = new LinkedHashSet<KeyDatum>();
// Open file
BufferedReader inFile = null; // create a new stream to open a file
BufferedReader inFile2 = null;
BufferedWriter outFile = null;
final String addressRoot = "http://www.archive.org/download/";
final String localRoot = "http://zoia.library.nd.edu//sandbox/books";
final String outFileLocation = "C:/Documents and Settings/slittle2/Desktop/outFile.txt";
try {
inFile = new BufferedReader((Reader) new FileReader(oneHitLog));
inFile2 = new BufferedReader ((Reader) new FileReader(outFileLocation));
String data = " ";
String data2 = " ";
boolean old = true; // This is true because all the results should be stored in a local file now.
// Load previous results into memory
while((data2 = inFile2.readLine()) != null) {
String[] splitData2 = data2.split("\t");
previous.add(new KeyDatum(splitData2[0],splitData2[1],splitData2[2],splitData2[3],splitData2[4]));
}
inFile2.close();
outFile = new BufferedWriter((Writer) new FileWriter(outFileLocation, true));
// Retrieve URLs from file & send to Internet Archive
while ((data = inFile.readLine()) != null) {
// Extract keys
String[] splitData = data.split("\t");
// Load each Key, Author, Title into a KeyDatum; leave other two
// blank
KeyDatum keyDatum = new KeyDatum(splitData[2], splitData[0],
splitData[1], "", "");
// Check and see if already in previous results
Iterator<KeyDatum> iter = previous.iterator();
while (iter.hasNext()) {
KeyDatum next = iter.next();
if (keyDatum.compareQuick(next)) {
old = true;
kati.add(next);
break;
}
}
if (!old) {
// Generate IA URL
keyDatum.iaURL = addressRoot + keyDatum.key + "/";
// Generate local URL
data = (keyDatum.iaURL).toString();
data = redirectAndTrim(data);
keyDatum.localURL = data.replaceFirst("http:/", localRoot);
outFile.append(keyDatum.toString("\t"));
// Adds the new KeyDatum to the LHS
kati.add(keyDatum);
System.out.println(keyDatum.toString("\t"));
}
}
} catch (MalformedURLException e) {
System.err.println("*** Malformed URL Exception ***");
} catch (FileNotFoundException e) {
System.err.println("*** File not found! ***");
e.printStackTrace();
} catch (IOException e) {
System.err.println("*** IO Exception ***");
e.getStackTrace();
} finally {
if (inFile != null)
inFile.close();
if (outFile != null)
outFile.close();
}
return kati;
}
// TODO Can't I just use the one in IASearcher?
protected static String redirectAndTrim(String key) throws IOException {
// Retrieve the redirected URL from IA
URI uri = null;
URL url = null;
InputStream inURI = null;
String newURL = "";
try {
// Open connection to IA
uri = new URI(key);
url = uri.toURL();
URLConnection yc = url.openConnection();
HttpURLConnection h = (HttpURLConnection) yc;
HttpURLConnection.setFollowRedirects(true);
h.getInputStream(); // Necessary to force redirect!
newURL = h.getURL().toString();
return newURL;
// Catching errors
} catch (URISyntaxException e) {
System.err.println("*** URI Syntax Exception ***");
e.printStackTrace();
} catch (MalformedURLException e) {
System.err.println("*** Malformed URL Exception ***");
e.printStackTrace();
} catch (FileNotFoundException e) {
System.err.println("*** File not found! ***");
e.printStackTrace();
} catch (IOException e) {
System.err.println("*** IO Exception ***");
e.getStackTrace();
} finally {
if (inURI != null)
inURI.close();
}
return null;
}
}
// Class for handling the various kinds of data
// Each key maps to 1 each of: author, title, IA URL, & local URL
class KeyDatum {
protected String key;
protected String author;
protected String title;
protected String iaURL;
protected String localURL;
KeyDatum() {
key = "";
author = "";
title = "";
iaURL = "";
localURL = "";
}
KeyDatum(String k, String a, String t, String i, String l) {
key = k;
author = a;
title = t;
iaURL = i;
localURL = l;
}
// Returns all fields as a string separated by the passed string (e.g. \n or \t
public String toString(String c){
return new String(key + c + author + c + title + c + iaURL + c + localURL + c + "\n");
}
public boolean compare(KeyDatum datum){
if(this.key.equalsIgnoreCase(datum.key) &
this.author.equalsIgnoreCase(datum.author) &
this.title.equalsIgnoreCase(datum.title) &
this.iaURL.equalsIgnoreCase(datum.iaURL) &
this.localURL.equalsIgnoreCase(datum.localURL))
return true;
return false;
}
public boolean compareQuick(KeyDatum datum) {
if(this.key.equalsIgnoreCase(datum.key))
return true;
return false;
}
}
Project 1 - Parsing the XML File
Returning to Project 1: This class parsed the resulting XML file.
package iasearcher;
/**
*
*/
/**
* @author slittle2
*
* This class contains code modified from http://www.exampledepot.com/egs/javax.xml.parsers/BasicSax.html
*
*
*
*/
import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class IASearcherXMLReader {
private static int recordQuantity = 0;
private static String identifiers = "";
private static boolean grabCharacters = false;
private static boolean failFlag = false;
public static void main(String[] args) { // Create a handler to handle the SAX events generated during parsing
parse();
}
// Method called by main or another class to parse an XML file
public static void parse() {
// Re-initialize variables -- necessary!
recordQuantity = 0;
identifiers = "";
grabCharacters = false;
failFlag = false;
DefaultHandler handler = new XMLHandler(); // Parse the file using the handler
parseXmlFile("C:/Documents and Settings/slittle2/workspace/MarcRetriever/output.xml", handler, false);
}
// To get the number of records found
public static int getRecordQuantity(){
return recordQuantity;
}
// To get the String of identifiers
public static String getIdentifiers() {
return identifiers;
}
// To return whether the parse succeeded
public static boolean getFailFlag() {
return failFlag;
}
// DefaultHandler contain no-op implementations for all SAX events.
// This class should override methods to capture the events of interest.
static class XMLHandler extends DefaultHandler {
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if(qName.equals("result")) { // "numFound" attribute is second, i.e. "1"
recordQuantity = (Integer.valueOf(attributes.getValue("numFound"))).intValue();
}
try {
if (attributes.getValue(0).equals("identifier")) {
grabCharacters = true;
}
} catch (NullPointerException npe) {
// No attributes present! Move along!
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
if (grabCharacters) {
identifiers += new String(ch, start, length) + "\t";
grabCharacters = false;
}
}
}
// Parses an XML file using a SAX parser.
// If validating is true, the contents is validated against the DTD
// specified in the file.
public static void parseXmlFile(String filename, DefaultHandler handler, boolean validating) {
try { // Create a builder factory
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(validating); // Create the builder and parse the file
factory.newSAXParser().parse(new File(filename), handler);
} catch (SAXException e) { // A parsing error occurred; the xml input is not valid
System.err.println("*** SAX Exception ***");
failFlag = true;
} catch (ParserConfigurationException e) {
System.err.println("*** Parser Configuration Exception ***");
failFlag = true;
} catch (IOException e) {
System.err.println("*** IO Exception ***");
failFlag = true;
} // End try-catch
}
}
package iasearcher;
/**
*
*/
/**
* @author slittle2
*
* This class contains code modified from http://www.exampledepot.com/egs/javax.xml.parsers/BasicSax.html
*
*
*
*/
import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class IASearcherXMLReader {
private static int recordQuantity = 0;
private static String identifiers = "";
private static boolean grabCharacters = false;
private static boolean failFlag = false;
public static void main(String[] args) { // Create a handler to handle the SAX events generated during parsing
parse();
}
// Method called by main or another class to parse an XML file
public static void parse() {
// Re-initialize variables -- necessary!
recordQuantity = 0;
identifiers = "";
grabCharacters = false;
failFlag = false;
DefaultHandler handler = new XMLHandler(); // Parse the file using the handler
parseXmlFile("C:/Documents and Settings/slittle2/workspace/MarcRetriever/output.xml", handler, false);
}
// To get the number of records found
public static int getRecordQuantity(){
return recordQuantity;
}
// To get the String of identifiers
public static String getIdentifiers() {
return identifiers;
}
// To return whether the parse succeeded
public static boolean getFailFlag() {
return failFlag;
}
// DefaultHandler contain no-op implementations for all SAX events.
// This class should override methods to capture the events of interest.
static class XMLHandler extends DefaultHandler {
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if(qName.equals("result")) { // "numFound" attribute is second, i.e. "1"
recordQuantity = (Integer.valueOf(attributes.getValue("numFound"))).intValue();
}
try {
if (attributes.getValue(0).equals("identifier")) {
grabCharacters = true;
}
} catch (NullPointerException npe) {
// No attributes present! Move along!
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
if (grabCharacters) {
identifiers += new String(ch, start, length) + "\t";
grabCharacters = false;
}
}
}
// Parses an XML file using a SAX parser.
// If validating is true, the contents is validated against the DTD
// specified in the file.
public static void parseXmlFile(String filename, DefaultHandler handler, boolean validating) {
try { // Create a builder factory
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(validating); // Create the builder and parse the file
factory.newSAXParser().parse(new File(filename), handler);
} catch (SAXException e) { // A parsing error occurred; the xml input is not valid
System.err.println("*** SAX Exception ***");
failFlag = true;
} catch (ParserConfigurationException e) {
System.err.println("*** Parser Configuration Exception ***");
failFlag = true;
} catch (IOException e) {
System.err.println("*** IO Exception ***");
failFlag = true;
} // End try-catch
}
}
Wednesday, July 21, 2010
Project 2 - JUnit
Well, it isn't very fancy, but it provided a decent intro to JUnit testing! :-)
/**
*
*/
package crrasolrindexer;
import static org.junit.Assert.*;
import java.io.IOException;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
/**
* @author slittle2
*
*/
public class CRRA_DatumTest {
private CRRA_Datum testCD = null;
CRRA_Datum crraCD = null;
/**
* @throws java.lang.Exception
*/
@Before
public void setUp() throws Exception {
crraCD = new CRRA_Datum("id allfields institution collection building language format author author-letter authorStr auth_author auth_authorStr title title_sort title_sub title_short title_full title_fullStr title_auth physical publisher publisherStr publishDate edition description contents url thumbnail lccn ctrlnum isbn issn callnumber callnumber-a callnumber-first callnumber-first-code callnumber-subject callnumber-subject-code callnumber-label dewey-hundreds dewey-tens dewey-ones dewey-full dewey-sort author2 author2Str author2-role auth_author2 auth_author2Str author_additional author_additionalStr title_alt title_old title_new dateSpan series series2 topic genre geographic illustrated recordtype");
}
/**
* @throws java.lang.Exception
*/
@After
public void tearDown() throws Exception {
crraCD = null;
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#CRRA_Datum(java.lang.String)}.
*/
@Test
public final void testCRRA_DatumString() {
testCD = new CRRA_Datum("foo bar");
assertNotNull(testCD);
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#CRRA_Datum()}.
*/
@Test
public final void testCRRA_Datum() {
testCD = new CRRA_Datum();
assertNotNull(testCD);
assertEquals(testCD.toString(), crraCD.toString());
assert(testCD.equals(crraCD));
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#returnField(java.lang.String)}.
* @throws IOException
*/
@Test
public final void testReturnField() throws IOException {
testCD = new CRRA_Datum("foo bar");
testCD.setField("foo", "boo!");
assertEquals("boo!", testCD.returnField("foo"));
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#concatenateField(java.lang.String, java.lang.String)}.
* @throws IOException
*/
@Test
public final void testConcatenateField() throws IOException {
crraCD.setField("author", "Bob Shakespeare");
crraCD.concatenateField("author", " & Joe Shakespeare");
assertEquals("Bob Shakespeare & Joe Shakespeare", crraCD.returnField("author"));
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#toString()}.
*/
@Test
public final void testToString() {
assertNotNull(crraCD.toString());
}
}
/**
*
*/
package crrasolrindexer;
import static org.junit.Assert.*;
import java.io.IOException;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
/**
* @author slittle2
*
*/
public class CRRA_DatumTest {
private CRRA_Datum testCD = null;
CRRA_Datum crraCD = null;
/**
* @throws java.lang.Exception
*/
@Before
public void setUp() throws Exception {
crraCD = new CRRA_Datum("id allfields institution collection building language format author author-letter authorStr auth_author auth_authorStr title title_sort title_sub title_short title_full title_fullStr title_auth physical publisher publisherStr publishDate edition description contents url thumbnail lccn ctrlnum isbn issn callnumber callnumber-a callnumber-first callnumber-first-code callnumber-subject callnumber-subject-code callnumber-label dewey-hundreds dewey-tens dewey-ones dewey-full dewey-sort author2 author2Str author2-role auth_author2 auth_author2Str author_additional author_additionalStr title_alt title_old title_new dateSpan series series2 topic genre geographic illustrated recordtype");
}
/**
* @throws java.lang.Exception
*/
@After
public void tearDown() throws Exception {
crraCD = null;
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#CRRA_Datum(java.lang.String)}.
*/
@Test
public final void testCRRA_DatumString() {
testCD = new CRRA_Datum("foo bar");
assertNotNull(testCD);
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#CRRA_Datum()}.
*/
@Test
public final void testCRRA_Datum() {
testCD = new CRRA_Datum();
assertNotNull(testCD);
assertEquals(testCD.toString(), crraCD.toString());
assert(testCD.equals(crraCD));
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#returnField(java.lang.String)}.
* @throws IOException
*/
@Test
public final void testReturnField() throws IOException {
testCD = new CRRA_Datum("foo bar");
testCD.setField("foo", "boo!");
assertEquals("boo!", testCD.returnField("foo"));
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#concatenateField(java.lang.String, java.lang.String)}.
* @throws IOException
*/
@Test
public final void testConcatenateField() throws IOException {
crraCD.setField("author", "Bob Shakespeare");
crraCD.concatenateField("author", " & Joe Shakespeare");
assertEquals("Bob Shakespeare & Joe Shakespeare", crraCD.returnField("author"));
}
/**
* Test method for {@link crrasolrindexer.CRRA_Datum#toString()}.
*/
@Test
public final void testToString() {
assertNotNull(crraCD.toString());
}
}
Project 2 - Enhanced Data Class
The class just posted required an enhanced data class to model the VuFind records. Here it is!
/**
*
*/
package crrasolrindexer;
import java.io.*;
import java.util.Iterator;
import java.util.LinkedHashSet;
/**
* @author slittle2
*
* The CRRA_Datum class is just like the IndeDatum class, but modified to work
* with the CRRA schema (or any other). It holds the kinds of data that are to be extracted
* from an indexed field, whether MARC or EAD (or anything else). All fields
* are private; only a protected method allows setting them, and a public
* method allows retrieving their contents. (The setField() method is
* protected so that one has to change the data through the appropriate
* class that indexes the data.
*
* This class is designed to be easily extensible for use with other kinds
* of data. UNLIKE IndexDatum, it may be passed as string of schema names from
* which it builds a new schema for its entries.
*
* Note that the schema names used here are independent of the schema *map* that
* the CRRA_EADRetriever class uses to map from EAD to VuFind. If inconsistences
* occur between the schema here and the schema_map (or between these and the actual
* VuFind schema), unpredictable behavior may result.
*
*/
public class CRRA_Datum {
private class Entry {
String name;
String content;
Entry() {
name = "";
content = "";
}
Entry(String n, String c){
name = n;
content = c;
}
}
// The default schema is that used in Vufind as of this coding (June 2010).
private String schema_names = "id fullrecord allfields institution collection building language format author author-letter authorStr auth_author auth_authorStr title title_sort title_sub title_short title_full title_fullStr title_auth physical publisher publisherStr publishDate edition description contents url thumbnail lccn ctrlnum isbn issn callnumber callnumber-a callnumber-first callnumber-first-code callnumber-subject callnumber-subject-code callnumber-label dewey-hundreds dewey-tens dewey-ones dewey-full dewey-sort author2 author2Str author2-role auth_author2 auth_author2Str author_additional author_additionalStr title_alt title_old title_new dateSpan series series2 topic genre geographic illustrated recordtype";
private LinkedHashSet<Entry> entries = null;
// Pass a string containing scheme file names separated by a ' '.
public CRRA_Datum(String schema_names) {
entries = new LinkedHashSet<Entry>();
String[] schema = schema_names.split(" ");
for(int i = 0; i < schema.length; i++){
entries.add(new Entry(schema[i], ""));
}
}
// Default contructor uses current (2010) Vufind schema names
public CRRA_Datum() {
entries = new LinkedHashSet<Entry>();
String[] schema = schema_names.split(" ");
for(int i = 0; i < schema.length; i++){
entries.add(new Entry(schema[i], ""));
}
}
// Return the names of the schema fields as a single string. This can then be parsed/tokenized as needed.
public String returnSchemaNames(){
return schema_names;
}
// Return a given field's value.
public String returnField(String fieldName) throws IOException {
Iterator<Entry> iter = entries.iterator();
while(iter.hasNext()){
Entry entry = (Entry) iter.next();
if(entry.name.equalsIgnoreCase(fieldName)){
return entry.content;
}
}
throw new IOException();
}
// Set the value of a given field. Completely overwrites the original.
protected void setField(String fieldName, String data) throws IOException {
Iterator<Entry> iter = entries.iterator();
while(iter.hasNext()){
Entry entry = (Entry) iter.next();
if(entry.name.equalsIgnoreCase(fieldName)){
entry.content = data;
return;
}
}
throw new IOException();
}
// Adds data to a field without overwriting it.
protected void concatenateField(String fieldName, String data) throws IOException {
Iterator<Entry> iter = entries.iterator();
while(iter.hasNext()){
Entry entry = (Entry) iter.next();
if(entry.name.equalsIgnoreCase(fieldName)){
entry.content += data;
return;
}
}
throw new IOException();
}
// Displays the fields/values of the entire Datum.
public String toString(){
String contents = "";
Iterator<Entry> iter = entries.iterator();
while(iter.hasNext()){
Entry entry = (Entry) iter.next();
contents += "\n " + entry.name + "\t\t"+ entry.content;
}
return contents;
}
}
/**
*
*/
package crrasolrindexer;
import java.io.*;
import java.util.Iterator;
import java.util.LinkedHashSet;
/**
* @author slittle2
*
* The CRRA_Datum class is just like the IndeDatum class, but modified to work
* with the CRRA schema (or any other). It holds the kinds of data that are to be extracted
* from an indexed field, whether MARC or EAD (or anything else). All fields
* are private; only a protected method allows setting them, and a public
* method allows retrieving their contents. (The setField() method is
* protected so that one has to change the data through the appropriate
* class that indexes the data.
*
* This class is designed to be easily extensible for use with other kinds
* of data. UNLIKE IndexDatum, it may be passed as string of schema names from
* which it builds a new schema for its entries.
*
* Note that the schema names used here are independent of the schema *map* that
* the CRRA_EADRetriever class uses to map from EAD to VuFind. If inconsistences
* occur between the schema here and the schema_map (or between these and the actual
* VuFind schema), unpredictable behavior may result.
*
*/
public class CRRA_Datum {
private class Entry {
String name;
String content;
Entry() {
name = "";
content = "";
}
Entry(String n, String c){
name = n;
content = c;
}
}
// The default schema is that used in Vufind as of this coding (June 2010).
private String schema_names = "id fullrecord allfields institution collection building language format author author-letter authorStr auth_author auth_authorStr title title_sort title_sub title_short title_full title_fullStr title_auth physical publisher publisherStr publishDate edition description contents url thumbnail lccn ctrlnum isbn issn callnumber callnumber-a callnumber-first callnumber-first-code callnumber-subject callnumber-subject-code callnumber-label dewey-hundreds dewey-tens dewey-ones dewey-full dewey-sort author2 author2Str author2-role auth_author2 auth_author2Str author_additional author_additionalStr title_alt title_old title_new dateSpan series series2 topic genre geographic illustrated recordtype";
private LinkedHashSet<Entry> entries = null;
// Pass a string containing scheme file names separated by a ' '.
public CRRA_Datum(String schema_names) {
entries = new LinkedHashSet<Entry>();
String[] schema = schema_names.split(" ");
for(int i = 0; i < schema.length; i++){
entries.add(new Entry(schema[i], ""));
}
}
// Default contructor uses current (2010) Vufind schema names
public CRRA_Datum() {
entries = new LinkedHashSet<Entry>();
String[] schema = schema_names.split(" ");
for(int i = 0; i < schema.length; i++){
entries.add(new Entry(schema[i], ""));
}
}
// Return the names of the schema fields as a single string. This can then be parsed/tokenized as needed.
public String returnSchemaNames(){
return schema_names;
}
// Return a given field's value.
public String returnField(String fieldName) throws IOException {
Iterator<Entry> iter = entries.iterator();
while(iter.hasNext()){
Entry entry = (Entry) iter.next();
if(entry.name.equalsIgnoreCase(fieldName)){
return entry.content;
}
}
throw new IOException();
}
// Set the value of a given field. Completely overwrites the original.
protected void setField(String fieldName, String data) throws IOException {
Iterator<Entry> iter = entries.iterator();
while(iter.hasNext()){
Entry entry = (Entry) iter.next();
if(entry.name.equalsIgnoreCase(fieldName)){
entry.content = data;
return;
}
}
throw new IOException();
}
// Adds data to a field without overwriting it.
protected void concatenateField(String fieldName, String data) throws IOException {
Iterator<Entry> iter = entries.iterator();
while(iter.hasNext()){
Entry entry = (Entry) iter.next();
if(entry.name.equalsIgnoreCase(fieldName)){
entry.content += data;
return;
}
}
throw new IOException();
}
// Displays the fields/values of the entire Datum.
public String toString(){
String contents = "";
Iterator<Entry> iter = entries.iterator();
while(iter.hasNext()){
Entry entry = (Entry) iter.next();
contents += "\n " + entry.name + "\t\t"+ entry.content;
}
return contents;
}
}
Project 2 - Beefing Up EAD File-Handling
This class is far more powerful than its predecessor, EADDataRetriever. Added functionality includes: sending data to multiple VuFind fields at once; handling multiple kinds of schema files and EAD-VuFind crosswalks, and better documentation!
package crrasolrindexer;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.Set;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import org.apache.solr.client.solrj.SolrServerException;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
/**
* @author slittle2
*
* CRRA_EADRetriever does the same thing as EADDataRetriever, except
* using the new CRRA_Datum class.
*
* To this end, it cycles through the files in the given directory,
* parses each, extracts the relevant data, and puts it in a
* LinkedHashSet<CRRA_Datum>. Voila'!
*
*/
public class CRRA_EADRetriever {
/**
* @param args
*/
// This is used only in testing; if called from TextUICRRASI, it uses whatever is in the given properties file.
private static String testPathName = "C:/Documents and Settings/slittle2/Desktop/Index Data/ead/xml/";
// The set of records to send to the Indexer
private static LinkedHashSet<CRRA_Datum> eadRecords = new LinkedHashSet<CRRA_Datum>();
private static int recordQuantity = 0; // The quantity of EAD records parsed
private static boolean grabCharacters = false; // Whether or not to stuff data into fields besides "allfields"
private static boolean failFlag = false; // Whether or not the parsing operation succeeded
private static CRRA_Datum datum = new CRRA_Datum(); // Using default schema here
// The user-defined mapping from EAD to VuFind will be stored here
private static LinkedHashMap<LinkedHashSet<String>, String> schema_map = new LinkedHashMap<LinkedHashSet<String>, String>();
private static LinkedList<String> currentFieldSet = new LinkedList<String>(); // for keeping track of the current fields to send data to
private static LinkedList<String> tagStack = new LinkedList<String>(); // for keeping track of tag nesting while parsing
// fieldName sets the field in datum that the parser passes data to,
// and then datum is sent to eadRecords
private static String fieldName = "";
// Name for file with schema
public static String schema_filename = "";
// Whether to force parser to evaluate whether the tagStack and a given element "path" are identical before passing data
public static boolean strictElementPaths = false;
// Name for file with schema presets
public static String schema_presets = "";
// Stores the presets to add to each record right before moving on to the next parsing
private static LinkedHashMap<String, String> presets_map = new LinkedHashMap<String, String>();
// Main() included more or less for testing purposes
public static void main(String[] args) throws IOException, SolrServerException {
eadLoader(testPathName);
System.out.println("Number of records = " + eadRecords.size());
Iterator<CRRA_Datum> iter = eadRecords.iterator();
while (iter.hasNext()) {
datum = (CRRA_Datum) iter.next();
System.out.println(datum.toString());
}
Indexer.indexCD(eadRecords, "http://localhost:8983/solr/core0/");
System.out.println("Successfully indexed... we hope.");
}
// Cycling through the files in the directory and loading each in turn
public static LinkedHashSet<CRRA_Datum> eadLoader(String pathname) throws IOException, SolrServerException {
String filename = "";
// Initialize variables -- must be cleared every time parser is run!
recordQuantity = 0;
failFlag = false;
datum = new CRRA_Datum();
eadRecords = new LinkedHashSet<CRRA_Datum>();
// Read in schema file here, from filename indicated above.
// MUST set schema_filename before calling!
if(schema_filename.equalsIgnoreCase("")) throw new IOException();
// Open schema_map file
BufferedReader inFile = null; // create a new stream to open a file
try {
inFile = new BufferedReader((Reader) new FileReader(schema_filename));
String data = " ";
// Read in each line until termination
while ((data = inFile.readLine()) != null) {
// Split from last element of string
String[] schema_entry = data.split(" ", 2);
// Check for schema meta-info
if (schema_entry[0].equalsIgnoreCase("strictElementPaths"))
strictElementPaths = true;
else
// schema_map.put(addTagSet( - last word - ), - first part -
// );
schema_map.put(addTagSet(schema_entry[1]), schema_entry[0]);
}
} finally {
if (inFile != null)
inFile.close();
}
// Open schema presets file, if there is one
if (!schema_presets.equalsIgnoreCase("")) {
// Open schema_map file
inFile = null; // create a new stream to open a file
try {
inFile = new BufferedReader((Reader) new FileReader(
schema_presets));
String data = " ";
// Read in each line until termination
while ((data = inFile.readLine()) != null) {
// Split from last element of string
String[] schema_entry = data.split(" ", 2);
presets_map.put(schema_entry[0], schema_entry[1]);
}
} finally {
if (inFile != null)
inFile.close();
}
}
// Cycle through all files; XmlFilter (below) makes sure each is
// an XML file.
File directory = new File( pathname );
String[] eadFiles = directory.list( new XmlFilter() );
// for (int i = 0; i < eadFiles.length; i++) { // Uncomment this and comment out the following line
for (int i = 0; i < 4; i++) { // Used to limit number of records parsed for test purposes
filename = eadFiles[i];
eadRecords.add(parse(pathname + filename)); // Returns all the EAD data
// from ONE file to
// eadRecords
System.out.println("Successfully parsed " + pathname + filename + "!");
datum = new CRRA_Datum(); // This is REALLY, REALLY IMPORTANT! Bad things will happen if it is deleted!
}
System.out.println("Number of records = " + eadRecords.size());
return eadRecords;
}
// Used to create schema for parsing
private static LinkedHashSet<String> addTagSet(String string) {
// Parse tags from tagSet and add individually as elements to a LHS<S>
LinkedHashSet<String> returnSet = new LinkedHashSet<String>();
String[] tagSet = string.split(" ");
for(int i = 0; i < tagSet.length; i++){
returnSet.add((String) tagSet[i]);
}
return returnSet;
}
// To get the number of records found
public static int getRecordQuantity() {
return recordQuantity;
}
// To get the LinkedHashSet<CRRA_Datum> eadRecords
public static LinkedHashSet<CRRA_Datum> getEadRecords() {
return eadRecords;
}
// To return whether the parse succeeded
public static boolean getFailFlag() {
return failFlag;
}
// Parsing a given file into an CRRA_Datum
private static CRRA_Datum parse(String filename)
throws IOException {
DefaultHandler handler = new EADHandler(); // Parse the file using the
// handler and given schema
parseXmlFile(filename, handler, false);
// Add preset fields
// Get the keySet of presets_map
Set<String> preset_fields = presets_map.keySet();
// Iterate over the keySet and put all values into the fields of the new datum
Iterator iter = preset_fields.iterator();
while(iter.hasNext()){
String field = (String) iter.next();
datum.setField(field, presets_map.get(field));
}
// Open file for sending to 'fullrecord' field
BufferedReader inFile = null; // create a new stream to open a file
try {
inFile = new BufferedReader((Reader) new FileReader(filename));
String data = " ";
while ((data = inFile.readLine()) != null) {
datum.concatenateField("fullrecord", data);
}
} finally {
if (inFile != null)
inFile.close();
}
return datum;
}
public static CRRA_Datum returnCurrentCD(){
return datum;
}
/*
* EADHandler looks for the appropriate parts of the EAD record to grab.
*
* A stack (tagStack) is used to keep track of the element "pathname".
* Every time a new element is reached, its name is put on the stack;
* when a close-element is encountered, its name is popped off, along
* with any elements "on top" of it (like <p> tags and the like).
*
* If stricElementPaths is 'true', then data will be collected for VuFind
* if and only if the elements on the tagStack match exactly at least one
* set of elements in the schema.
*
* The currentFieldSet is the set of VuFind fields to send the current data
* to. Every time an element is encountered, the CFS gets wiped out and
* recalculated from scratch. Inelegant, but effective.
*
* If the CFS is not empty, then grabCharacters is true and data will be sent
* to at least one field.
*
*/
static class EADHandler extends DefaultHandler {
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// qName determines which field, or possible fields, the characters
// go in.
// Add the tag name to the stack
tagStack.addFirst(qName);
// Initialize the CFS
currentFieldSet = new LinkedList<String>();
// Update the CFS to include only fields corresponding to the tags currently on the stack.
updateCFS();
if(currentFieldSet.isEmpty())
grabCharacters = false;
else
grabCharacters = true;
}
// Overriden to remove closed tags from the tagStack and update the CFS.
public void endElement(String uri, String localName, String qName)
throws SAXException {
// Removes any non-closed tags on the front of the stack, plus the closed tag.
while(tagStack.contains(qName.toString())){
tagStack.removeFirst();
}
updateCFS();
}
public void characters(char[] ch, int start, int length)
throws SAXException {
try {
datum.concatenateField("allfields", new String(ch, start, length));
} catch (IOException e1) {
e1.printStackTrace();
}
if (grabCharacters) {
try {
// Update each field in 'currentFieldSet' in the current 'datum'
Iterator cfsIter = currentFieldSet.iterator();
while(cfsIter.hasNext()){
fieldName = (String) cfsIter.next();
datum.concatenateField(fieldName, new String(ch, start,
length)
+ "\n\t");
}
// Eliminates unnecessary whitespace
datum.setField(fieldName, datum.returnField(fieldName).trim());
} catch (IOException e) {
System.err
.println("*** Saving parsed data to CRRA Datum failed! ***");
e.printStackTrace();
} finally {
grabCharacters = false;
}
}
}
}
// Parses an XML file using a SAX parser.
// If validating is true, the contents is validated against the DTD
// specified in the file.
public static void parseXmlFile(String filename, DefaultHandler handler,
boolean validating) {
try { // Create a builder factory
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(validating); // Create the builder and parse
// the file
factory.newSAXParser().parse(new File(filename), handler);
} catch (SAXException e) { // A parsing error occurred; the xml input is
// not valid
System.err.println("*** SAX Exception ***");
e.getStackTrace();
failFlag = true;
} catch (ParserConfigurationException e) {
System.err.println("*** Parser Configuration Exception ***");
e.getStackTrace();
failFlag = true;
} catch (IOException e) {
System.err.println("*** IO Exception in parseXmlFile ***");
e.getStackTrace();
failFlag = true;
} // End try-catch
}
public static void updateCFS() {
Set<LinkedHashSet<String>> keys = schema_map.keySet();
Iterator tagIter = keys.iterator();
while(tagIter.hasNext()){
LinkedHashSet<String> tempSet = (LinkedHashSet<String>) tagIter.next();
if(tagStack.containsAll((Collection<String>) tempSet) && (!strictElementPaths ||
(tempSet.containsAll((Collection<String>) tagStack)))){
currentFieldSet.add(schema_map.get(tempSet));
}
}
}
}
package crrasolrindexer;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.Set;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import org.apache.solr.client.solrj.SolrServerException;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
/**
* @author slittle2
*
* CRRA_EADRetriever does the same thing as EADDataRetriever, except
* using the new CRRA_Datum class.
*
* To this end, it cycles through the files in the given directory,
* parses each, extracts the relevant data, and puts it in a
* LinkedHashSet<CRRA_Datum>. Voila'!
*
*/
public class CRRA_EADRetriever {
/**
* @param args
*/
// This is used only in testing; if called from TextUICRRASI, it uses whatever is in the given properties file.
private static String testPathName = "C:/Documents and Settings/slittle2/Desktop/Index Data/ead/xml/";
// The set of records to send to the Indexer
private static LinkedHashSet<CRRA_Datum> eadRecords = new LinkedHashSet<CRRA_Datum>();
private static int recordQuantity = 0; // The quantity of EAD records parsed
private static boolean grabCharacters = false; // Whether or not to stuff data into fields besides "allfields"
private static boolean failFlag = false; // Whether or not the parsing operation succeeded
private static CRRA_Datum datum = new CRRA_Datum(); // Using default schema here
// The user-defined mapping from EAD to VuFind will be stored here
private static LinkedHashMap<LinkedHashSet<String>, String> schema_map = new LinkedHashMap<LinkedHashSet<String>, String>();
private static LinkedList<String> currentFieldSet = new LinkedList<String>(); // for keeping track of the current fields to send data to
private static LinkedList<String> tagStack = new LinkedList<String>(); // for keeping track of tag nesting while parsing
// fieldName sets the field in datum that the parser passes data to,
// and then datum is sent to eadRecords
private static String fieldName = "";
// Name for file with schema
public static String schema_filename = "";
// Whether to force parser to evaluate whether the tagStack and a given element "path" are identical before passing data
public static boolean strictElementPaths = false;
// Name for file with schema presets
public static String schema_presets = "";
// Stores the presets to add to each record right before moving on to the next parsing
private static LinkedHashMap<String, String> presets_map = new LinkedHashMap<String, String>();
// Main() included more or less for testing purposes
public static void main(String[] args) throws IOException, SolrServerException {
eadLoader(testPathName);
System.out.println("Number of records = " + eadRecords.size());
Iterator<CRRA_Datum> iter = eadRecords.iterator();
while (iter.hasNext()) {
datum = (CRRA_Datum) iter.next();
System.out.println(datum.toString());
}
Indexer.indexCD(eadRecords, "http://localhost:8983/solr/core0/");
System.out.println("Successfully indexed... we hope.");
}
// Cycling through the files in the directory and loading each in turn
public static LinkedHashSet<CRRA_Datum> eadLoader(String pathname) throws IOException, SolrServerException {
String filename = "";
// Initialize variables -- must be cleared every time parser is run!
recordQuantity = 0;
failFlag = false;
datum = new CRRA_Datum();
eadRecords = new LinkedHashSet<CRRA_Datum>();
// Read in schema file here, from filename indicated above.
// MUST set schema_filename before calling!
if(schema_filename.equalsIgnoreCase("")) throw new IOException();
// Open schema_map file
BufferedReader inFile = null; // create a new stream to open a file
try {
inFile = new BufferedReader((Reader) new FileReader(schema_filename));
String data = " ";
// Read in each line until termination
while ((data = inFile.readLine()) != null) {
// Split from last element of string
String[] schema_entry = data.split(" ", 2);
// Check for schema meta-info
if (schema_entry[0].equalsIgnoreCase("strictElementPaths"))
strictElementPaths = true;
else
// schema_map.put(addTagSet( - last word - ), - first part -
// );
schema_map.put(addTagSet(schema_entry[1]), schema_entry[0]);
}
} finally {
if (inFile != null)
inFile.close();
}
// Open schema presets file, if there is one
if (!schema_presets.equalsIgnoreCase("")) {
// Open schema_map file
inFile = null; // create a new stream to open a file
try {
inFile = new BufferedReader((Reader) new FileReader(
schema_presets));
String data = " ";
// Read in each line until termination
while ((data = inFile.readLine()) != null) {
// Split from last element of string
String[] schema_entry = data.split(" ", 2);
presets_map.put(schema_entry[0], schema_entry[1]);
}
} finally {
if (inFile != null)
inFile.close();
}
}
// Cycle through all files; XmlFilter (below) makes sure each is
// an XML file.
File directory = new File( pathname );
String[] eadFiles = directory.list( new XmlFilter() );
// for (int i = 0; i < eadFiles.length; i++) { // Uncomment this and comment out the following line
for (int i = 0; i < 4; i++) { // Used to limit number of records parsed for test purposes
filename = eadFiles[i];
eadRecords.add(parse(pathname + filename)); // Returns all the EAD data
// from ONE file to
// eadRecords
System.out.println("Successfully parsed " + pathname + filename + "!");
datum = new CRRA_Datum(); // This is REALLY, REALLY IMPORTANT! Bad things will happen if it is deleted!
}
System.out.println("Number of records = " + eadRecords.size());
return eadRecords;
}
// Used to create schema for parsing
private static LinkedHashSet<String> addTagSet(String string) {
// Parse tags from tagSet and add individually as elements to a LHS<S>
LinkedHashSet<String> returnSet = new LinkedHashSet<String>();
String[] tagSet = string.split(" ");
for(int i = 0; i < tagSet.length; i++){
returnSet.add((String) tagSet[i]);
}
return returnSet;
}
// To get the number of records found
public static int getRecordQuantity() {
return recordQuantity;
}
// To get the LinkedHashSet<CRRA_Datum> eadRecords
public static LinkedHashSet<CRRA_Datum> getEadRecords() {
return eadRecords;
}
// To return whether the parse succeeded
public static boolean getFailFlag() {
return failFlag;
}
// Parsing a given file into an CRRA_Datum
private static CRRA_Datum parse(String filename)
throws IOException {
DefaultHandler handler = new EADHandler(); // Parse the file using the
// handler and given schema
parseXmlFile(filename, handler, false);
// Add preset fields
// Get the keySet of presets_map
Set<String> preset_fields = presets_map.keySet();
// Iterate over the keySet and put all values into the fields of the new datum
Iterator iter = preset_fields.iterator();
while(iter.hasNext()){
String field = (String) iter.next();
datum.setField(field, presets_map.get(field));
}
// Open file for sending to 'fullrecord' field
BufferedReader inFile = null; // create a new stream to open a file
try {
inFile = new BufferedReader((Reader) new FileReader(filename));
String data = " ";
while ((data = inFile.readLine()) != null) {
datum.concatenateField("fullrecord", data);
}
} finally {
if (inFile != null)
inFile.close();
}
return datum;
}
public static CRRA_Datum returnCurrentCD(){
return datum;
}
/*
* EADHandler looks for the appropriate parts of the EAD record to grab.
*
* A stack (tagStack) is used to keep track of the element "pathname".
* Every time a new element is reached, its name is put on the stack;
* when a close-element is encountered, its name is popped off, along
* with any elements "on top" of it (like <p> tags and the like).
*
* If stricElementPaths is 'true', then data will be collected for VuFind
* if and only if the elements on the tagStack match exactly at least one
* set of elements in the schema.
*
* The currentFieldSet is the set of VuFind fields to send the current data
* to. Every time an element is encountered, the CFS gets wiped out and
* recalculated from scratch. Inelegant, but effective.
*
* If the CFS is not empty, then grabCharacters is true and data will be sent
* to at least one field.
*
*/
static class EADHandler extends DefaultHandler {
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// qName determines which field, or possible fields, the characters
// go in.
// Add the tag name to the stack
tagStack.addFirst(qName);
// Initialize the CFS
currentFieldSet = new LinkedList<String>();
// Update the CFS to include only fields corresponding to the tags currently on the stack.
updateCFS();
if(currentFieldSet.isEmpty())
grabCharacters = false;
else
grabCharacters = true;
}
// Overriden to remove closed tags from the tagStack and update the CFS.
public void endElement(String uri, String localName, String qName)
throws SAXException {
// Removes any non-closed tags on the front of the stack, plus the closed tag.
while(tagStack.contains(qName.toString())){
tagStack.removeFirst();
}
updateCFS();
}
public void characters(char[] ch, int start, int length)
throws SAXException {
try {
datum.concatenateField("allfields", new String(ch, start, length));
} catch (IOException e1) {
e1.printStackTrace();
}
if (grabCharacters) {
try {
// Update each field in 'currentFieldSet' in the current 'datum'
Iterator cfsIter = currentFieldSet.iterator();
while(cfsIter.hasNext()){
fieldName = (String) cfsIter.next();
datum.concatenateField(fieldName, new String(ch, start,
length)
+ "\n\t");
}
// Eliminates unnecessary whitespace
datum.setField(fieldName, datum.returnField(fieldName).trim());
} catch (IOException e) {
System.err
.println("*** Saving parsed data to CRRA Datum failed! ***");
e.printStackTrace();
} finally {
grabCharacters = false;
}
}
}
}
// Parses an XML file using a SAX parser.
// If validating is true, the contents is validated against the DTD
// specified in the file.
public static void parseXmlFile(String filename, DefaultHandler handler,
boolean validating) {
try { // Create a builder factory
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(validating); // Create the builder and parse
// the file
factory.newSAXParser().parse(new File(filename), handler);
} catch (SAXException e) { // A parsing error occurred; the xml input is
// not valid
System.err.println("*** SAX Exception ***");
e.getStackTrace();
failFlag = true;
} catch (ParserConfigurationException e) {
System.err.println("*** Parser Configuration Exception ***");
e.getStackTrace();
failFlag = true;
} catch (IOException e) {
System.err.println("*** IO Exception in parseXmlFile ***");
e.getStackTrace();
failFlag = true;
} // End try-catch
}
public static void updateCFS() {
Set<LinkedHashSet<String>> keys = schema_map.keySet();
Iterator tagIter = keys.iterator();
while(tagIter.hasNext()){
LinkedHashSet<String> tempSet = (LinkedHashSet<String>) tagIter.next();
if(tagStack.containsAll((Collection<String>) tempSet) && (!strictElementPaths ||
(tempSet.containsAll((Collection<String>) tagStack)))){
currentFieldSet.add(schema_map.get(tempSet));
}
}
}
}
Subscribe to:
Posts (Atom)