CubeTwister 2.0alpha141 2011-10-13

ch.randelshofer.io
Class CSVTokenizer

java.lang.Object
  extended by ch.randelshofer.io.CSVTokenizer

public class CSVTokenizer
extends java.lang.Object

Parses a comma separated values (CSV) stream into tokens.

EBNF rules for the CSV format:

 CSV = Record {RecordSeparator, Record}
 
 RecordSeparator = linebreak
 Record = Field {FieldSeparator, Field}
 
 FieldSeparator = {whitespace} comma {whitespace}
 Field = UnquotedField | DQuotedField
 
 UnquotedField = (simplechar) {{simplechar|space}, (simplechar)}
 
 DQuotedField = dquote (simplechar|stuffeddquote|linebreak|comma} dquote
 
 
 simplechar = (* every character except specialchar *)
 specialchar = linebreak | comma | dquote | whitespace | space
 linebreak = lf | cr | cr, lf
 comma = ','
 dquote = '"'
 stuffeddquote = '""'
 lf = 0x0a
 cr = 0x0d
 space = ' '
 whitespace = ' ' | tab
 tab = 0x07
 
 

Simple Example with unquoted fields:

 Jacques, Mayol, Rue St. Claire 8, Antibes
 Enzo, Maiorca, Via Roma 2, Taormina
 
Example with quoted fields:
 Trio, "Uno, due, tre!", Pop
 Alice, "Did you go?
 Did you stay?", Rock
 The Pringles, "He said ""I like it""", Pop
 

Version:
1.0.1 2011-10-13 New class comments.
1.0 2004-04-18 Created.
Author:
Werner Randelshofer

Field Summary
static int TT_DELIMITER
          A constant indicating that a delimiter token has been read.
static int TT_EOF
          A constant indicating that the end of the stream has been read.
static int TT_EOL
          A constant indicating that the end of the line has been read.
static int TT_VALUE
          A constant indicating that a word token has been read.
 int ttype
          After a call to the nextToken method, this field contains the type of the token just read.
static java.lang.String value
          If the current token is a value token, this field contains a string giving the characters of the value token.
 
Constructor Summary
CSVTokenizer(java.io.Reader in)
          Creates a new instance.
CSVTokenizer(java.io.Reader in, char delimiterChar, char quoteChar)
          Creates a new instance.
 
Method Summary
 int getLineNumber()
          Return the current line number.
 int nextToken()
          Parses the next token from the input stream of this tokenizer.
 void pushBack()
          Causes the next call to the nextToken method of this tokenizer to return the current value in the ttype field, and not to modify the value in the value field.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ttype

public int ttype
After a call to the nextToken method, this field contains the type of the token just read. For a single character token, its value is the single character, converted to an integer. For a quoted string token (see , its value is the quote character. Otherwise, its value is one of the following:

The initial value of this field is -4.

See Also:
nextToken(), TT_EOF, TT_EOL, TT_VALUE

TT_EOF

public static final int TT_EOF
A constant indicating that the end of the stream has been read.

See Also:
Constant Field Values

TT_EOL

public static final int TT_EOL
A constant indicating that the end of the line has been read.

See Also:
Constant Field Values

TT_DELIMITER

public static final int TT_DELIMITER
A constant indicating that a delimiter token has been read.

See Also:
Constant Field Values

TT_VALUE

public static final int TT_VALUE
A constant indicating that a word token has been read.

See Also:
Constant Field Values

value

public static java.lang.String value
If the current token is a value token, this field contains a string giving the characters of the value token.

The current token is a value when the value of the ttype field is TT_VALUE.

The initial value of this field is null.

See Also:
TT_VALUE, ttype
Constructor Detail

CSVTokenizer

public CSVTokenizer(java.io.Reader in)
Creates a new instance.


CSVTokenizer

public CSVTokenizer(java.io.Reader in,
                    char delimiterChar,
                    char quoteChar)
Creates a new instance.

Parameters:
in - reader from which to read.
delimiterChar - The new delimiter character to use.
quoteChar - The new character to use for quoting.
Throws:
java.lang.IllegalArgumentException - if one of the delimiters can not be used.
Method Detail

nextToken

public int nextToken()
              throws java.io.IOException
Parses the next token from the input stream of this tokenizer. The type of the next token is returned in the ttype field. Additional information about the token may be in the nval field or the sval field of this tokenizer.

Typical clients of this class first set up the syntax tables and then sit in a loop calling nextToken to parse successive tokens until TT_EOF is returned.

Returns:
the value of the ttype field.
Throws:
java.io.IOException - if an I/O error occurs.
See Also:
StreamTokenizer.nval, StreamTokenizer.sval, StreamTokenizer.ttype

pushBack

public void pushBack()
Causes the next call to the nextToken method of this tokenizer to return the current value in the ttype field, and not to modify the value in the value field.

See Also:
nextToken(), value, ttype

getLineNumber

public int getLineNumber()
Return the current line number.

Returns:
the current line number of this stream tokenizer.

(c) Werner Randelshofer.
All rights reserved.