org.apache.hadoop.streaming
Class StreamXmlRecordReader
java.lang.Object
org.apache.hadoop.streaming.StreamBaseRecordReader
org.apache.hadoop.streaming.StreamXmlRecordReader
- All Implemented Interfaces:
- RecordReader
public class StreamXmlRecordReader
- extends StreamBaseRecordReader
A way to interpret XML fragments as Mapper input records.
Values are XML subtrees delimited by configurable tags.
Keys could be the value of a certain attribute in the XML subtree,
but this is left to the stream processor application.
The name-value properties that StreamXmlRecordReader understands are:
String begin (chars marking beginning of record)
String end (chars marking end of record)
int maxrec (maximum record size)
int lookahead(maximum lookahead to sync CDATA)
boolean slowmatch
- Author:
- Michel Tourn
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
StreamXmlRecordReader
public StreamXmlRecordReader(FSDataInputStream in,
FileSplit split,
Reporter reporter,
JobConf job,
FileSystem fs)
throws IOException
- Throws:
IOException
init
public void init()
throws IOException
- Throws:
IOException
next
public boolean next(Writable key,
Writable value)
throws IOException
- Description copied from class:
StreamBaseRecordReader
- Read a record. Implementation should call numRecStats at the end
- Specified by:
next
in interface RecordReader
- Specified by:
next
in class StreamBaseRecordReader
- Parameters:
key
- the key to read data intovalue
- the value to read data into
- Returns:
- true iff a key/value was read, false if at EOF
- Throws:
IOException
- See Also:
Writable.readFields(DataInput)
seekNextRecordBoundary
public void seekNextRecordBoundary()
throws IOException
- Description copied from class:
StreamBaseRecordReader
- Implementation should seek forward in_ to the first byte of the next record.
The initial byte offset in the stream is arbitrary.
- Specified by:
seekNextRecordBoundary
in class StreamBaseRecordReader
- Throws:
IOException
Copyright © 2006 The Apache Software Foundation