org.apache.hadoop.streaming
Class StreamXmlRecordReader

java.lang.Object
  extended by org.apache.hadoop.streaming.StreamBaseRecordReader
      extended by org.apache.hadoop.streaming.StreamXmlRecordReader
All Implemented Interfaces:
RecordReader

public class StreamXmlRecordReader
extends StreamBaseRecordReader

A way to interpret XML fragments as Mapper input records. Values are XML subtrees delimited by configurable tags. Keys could be the value of a certain attribute in the XML subtree, but this is left to the stream processor application. The name-value properties that StreamXmlRecordReader understands are: String begin (chars marking beginning of record) String end (chars marking end of record) int maxrec (maximum record size) int lookahead(maximum lookahead to sync CDATA) boolean slowmatch

Author:
Michel Tourn

Field Summary
 
Fields inherited from class org.apache.hadoop.streaming.StreamBaseRecordReader
LOG
 
Constructor Summary
StreamXmlRecordReader(FSDataInputStream in, FileSplit split, Reporter reporter, JobConf job, FileSystem fs)
           
 
Method Summary
 void init()
           
 boolean next(Writable key, Writable value)
          Read a record.
 void seekNextRecordBoundary()
          Implementation should seek forward in_ to the first byte of the next record.
 
Methods inherited from class org.apache.hadoop.streaming.StreamBaseRecordReader
close, createKey, createValue, getPos, getProgress, validateInput
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StreamXmlRecordReader

public StreamXmlRecordReader(FSDataInputStream in,
                             FileSplit split,
                             Reporter reporter,
                             JobConf job,
                             FileSystem fs)
                      throws IOException
Throws:
IOException
Method Detail

init

public void init()
          throws IOException
Throws:
IOException

next

public boolean next(Writable key,
                    Writable value)
             throws IOException
Description copied from class: StreamBaseRecordReader
Read a record. Implementation should call numRecStats at the end

Specified by:
next in interface RecordReader
Specified by:
next in class StreamBaseRecordReader
Parameters:
key - the key to read data into
value - the value to read data into
Returns:
true iff a key/value was read, false if at EOF
Throws:
IOException
See Also:
Writable.readFields(DataInput)

seekNextRecordBoundary

public void seekNextRecordBoundary()
                            throws IOException
Description copied from class: StreamBaseRecordReader
Implementation should seek forward in_ to the first byte of the next record. The initial byte offset in the stream is arbitrary.

Specified by:
seekNextRecordBoundary in class StreamBaseRecordReader
Throws:
IOException


Copyright © 2006 The Apache Software Foundation