Caigen CSV JDBC Drivers for raw data, flat text, CSV file, TSV file, PSV file, fixed-length, and variable-length binary file

Cagen CSV JDBC Driver v4.2 Documentation

Index:

Setting the CLASSPATH
Loading the Driver
Connecting to the Database
Supported SQL Syntax
Supported SQL Function

To put csvdriver.jar into your class path, you should use "export CLASSPATH=/usr/share/lib/csvdriver.jar:$CLASSPATH" on Solaris and Linux, and "SET CLASSPATH=\javalib\csvdriver.jar;%classpath%" on Windows.

Loading the Driver

Any source that uses JDBC needs to import the java.sql package by using " import java.sql.*;".

Caigen CSV driver' name is com.caigen.sql.text.CSVDriver, and you can uses it without involving hard coding the driver into your code. You do this by setting the jdbc.drivers system property. For example, for command line apps you can use:
java -Djdbc.drivers=com.caigen.sql.text.CSVDriver yourApp
Then, the JVM upon startup will load the drivers automatically. Some applications (JBoss, Tomcat etc) support a .properties file which they use to save putting this on the command line.

The second method is the most common and involves you loading the driver yourself. It's simple:
Class.forName("com.caigen.sql.text.CSVDriver");
From then on you can get connections from DriverManager.
Note: If Class.forName() throws ClassNotFoundException, you should check your classpath.

Data Source class is com.caigen.sql.CaigenDataSource, and Connection Pool Data Source class is com.caigen.sql.CaigenConnectionPoolDataSource.

Connecting to the Database

After the driver has been registered with the DriverManager, you can obtain a Connection instance that is connected to a particular database by calling DriverManager.getConnection(). With JDBC, a database is represented by a URL (Uniform Resource Locator).

        Embedded:
                jdbc:csv:[//]/[DatabasePath][?prop1=value1[;prop2=value2]] (You can omit that "//" characters sometimes)
            or
                jdbc:%csv:[//]/[DatabasePath][?prop1=value1[;prop2=value2]] (You can omit that "//" characters sometimes)
            The only difference for two kinds of url format is: with jdbc:csv prefix, the default table type for CREATE TABLE is CSV.
                        For example:
                                "jdbc:csv:/."
                                "jdbc:csv:/c:/data" for Windows driver
                                "jdbc:csv:///c:/data" for Windows driver
                                "jdbc:csv:////usr/data" for unix or linux
                                "jdbc:csv://///192.168.10.2/sharedir" for UNC path
                                "jdbc:csv:/./data"
                                "jdbc:csv:/."
                                "jdbc:csv:/c:/data"
                                "jdbc:csv:////usr/data" 
        Compressed Database:(.ZIP, .JAR, .GZ, .TAR, .BZ2, .TGZ, .TAR.GZ, .TAR.BZ2) 
                jdbc url format is the same as embedded url and remote url.
                        For example:
                                "jdbc:csv:/c:/test/testcsv.zip
        Memory-only Database:
                jdbc:csv:/_memory_/
        URL Database:(http protocol, https protocol, ftp protocol)
                jdbc:csv:http://httpURL
                jdbc:csv:https://httpsURL
                jdbc:csv:ftp://ftpURL
                        For example:
                                "jdbc:csv:http://www.caigen.com/test" //Note: FTP site's user/password should be set in ftpURL, and cannot be set in JDBC connection property because user/password JDBC connection property belongs to server/client connection.
        SAMBA Database:(smb protocol)
                                jdbc:csv:smb://[[[domain;]username[:password]@]server[:port]/[[share/[dir/]file]]][?[param=value]]
                        For example:
                                "jdbc:csv:smb://test1:123@100.100.13.94/csvfiles/zone" //Note: SAMBA user/password should be set in SMB url, and cannot be set in JDBC connection property because user/password JDBC connection property belongs to server/client connection.
	UNC path JDBC url:
                jdbc:csv:/uncpath
                jdbc:csv:///uncpath
                        For example:
                                "jdbc:csv:/\\PC17\c$\values" 
                                "jdbc:csv:/\\PC17\val"
	Free JDBC url:(Warning: only use it for special project)
                jdbc:csv:/" or "jdbc:csv:///". Then you can use some full UNC path names in SQL to visit anywhere where your Java VM has right to access.
                        For instance:
                                select * from \\amd2500\e$\csvfiles\test;
                                elect * from "\\amd2500\d$\csviles".test;
                                select * from ".".test;

         Caigen CSV supports seamlessly data mining on memory-only table, physical table, url table, compressed table, SAMBA table in a sql.

To connect, you need to get a Connection instance from JDBC. To do this, you use the DriverManager.getConnection() method:

Connection con = DriverManager.getConnection(url, properties);

There are a few different signatures for the getConnection() method. You should see the API documentation that comes with your JDK for more specific information on how to use them. You can specify additional properties to the JDBC driver by placing them in a java.util.Properties instance and passing that instance to the DriverManager when you connect.

Property Name	Definition	Default Value
user	The user to connect as	null
password	The password to use when connecting	null
charSet	To specify a Character Encoding Scheme other than the client default. You can find a Supported Encodings list of file:///c\|/jdk1.2/docs/guide/internat/encoding.doc.html. Cp895(Czech MS - DOS 895), Cp620(Polish MS - DOS 620) and Mazovia are extra supported although JVM doesn't support those.	null
lockTimeout	To specify Text driver's timeout in milliseconds to wait until processes released record lock or table lock. 0 means a default value, and <0 means no wait.	1000
tmpdir	Indicates whether set a temp directory, Default: the value of JVM's "java.io.tmpdir" property. If that value is incorrect, uing the directory of JDBC url. _memory_ means large data in memory.	null
delayedClose	Indicates the delayed seconds for close transaction. That option is used to avoid frequent close/open table operations for following sqls. Automatic temporary index is disabled when delayedClose<=60s. You can use 0~120 seconds. Default: 3.	null
maxCacheSize	Indicates the max memory utilization for per table on automatic temporary index or matched result cache. You can use 16~65536 kilo bytes. Default: 1024.	null
schemaFile	Indicates whether loads some SQL statements for table definition from a text file on database directory. '#' is used to start a remark line. schemaFile and odbcSchemaFile can be load from any (absolute or relative) path, even you can load it from http url or compressed file. Default: null.	null
odbcSchemaFile	Indicates whether loads ODBC schema file for table definition. Default: null.	null
createMissingFile4odbcSchemaFile	Indicates whether creates an empty CSV file which is defined in ODBC schema file, but is missing.	false
fileExtension	To specify other suffix as default extension of raw data or flat file.	BIN
csvfileExtension	To specify other suffix as default csv file extension.	CSV
_CSV_Separator	To specify a character sequence used to separate the values of the fields. It supports escape processing so that you can use \t, \r, \u001A, \x32, and so on.	,
_CSV_EOL	To specify 1~2 character sequence to terminate one line. An end-of-line line sequence is any one of a line feed ('\n', 0x0A), a carriage return ('\r', 0x0D), or a carriage return followed immediately by a linefeed. In most occasions, you needn't to care that connection property, since Caigen CSV supports even to mix three styles in one file. For data update, Caigen CSV can detect automatically OS version and choose the suitable EOL style in UNIX-style, DOS-style, and Mac-style. If you wish to create unix-style file on Windows, then you need to assign that connection porperty. It supports escape processing so that you can use \t, \r, \u001A, \x32, and so on. For instance, your data file is using $ as EOL, Caigen CSV can support that file.	null
_CSV_Quoter	To specify a character used to quote the string value. It supports escape processing so that you can use \t, \r, \u001A, \x32, and so on.	"
_CSV_Header	Indicates whether the first record of text file to consist of the names of the fields in the data following. The first CSV line information is used to verify predefined CREATE TABLE sql when _CSV_Header=true. You can use the following code to know the error in your predefined CREATE TABLE sql. If you make sure that predefined CREATE TABLE sql is correct, you can ignore those warnings. SQLWarning warnings=con.getWarnings(); if(warnings!=null){ do{//SQState: 2A000 Direct SQL syntax error or access rule violation System.out.println(warnings.getMessage()); System.out.println("Error Code:"+warnings.getErrorCode()); System.out.println("SQL State:"+warnings.getSQLState()); }while((warnings=warnings.getNextWarning())!=null); con.clearWarnings(); } For CSV/TSV file, the preferable solution is using maxScanRows to detect automatically table information, not using CREATE TABLE sql.	false
_CSV_Comment	To specify whether there're some comment lines before csv header and data rows. If you use n (integer values), the first n lines will be ingored as comment. If you use some paragraphs, these paragraphs will be used for csv file creation, the total line number of comment will be used for existent file.	null
_CSV_ColumnCount4EmptyFile	To specify the default column count for an empty text file without header information and data rows.	25
soloMode	Indicates whether uses solo mode for speed optimization. Solo mode means single user read/write mode or multi-user read mode.	true
readOnlyMode	Indicates whether uses readOnly mode for speed optimization. ReadOnly mode means multi-user read mode and need more memory.	false
maxScanRows	Indicates how many rows should be scanned when determining the column types. If you set maxScanRows to 0, the entire file is scanned. If you set maxScanRows to a negative value, the file won't be scanned. For those tables with predefined table structure, that option will be ignored.	-1
maxScanRows4ColumnCount	Indicates how many rows should be scanned when determining the possible maximum column count. If you set maxScanRows to 0, the entire file is scanned. If you set maxScanRows to a negative value, the file won't be scanned, and will use the absolute value of that negative value as column count. For those tables with predefined table structure, that option will be ignored.	1
emptyStringAsNull	Indicates whether returns empty string as null value. You can use null, true, false	true
ignoreDirtyData	Indicates whether ignores all dirty data and return null value when failed to parse number value or date value. You can use the following code to know where's dirty data in your file. warnings=rs.getWarnings(); if(warnings!=null){ do{//SQState: C0106 Convert dirty data into null value System.out.println(warnings.getMessage()); System.out.println("Error Code:"+warnings.getErrorCode()); System.out.println("SQL State:"+warnings.getSQLState()); }while((warnings=warnings.getNextWarning())!=null); rs.clearWarnings(); }	false
ignoreDirtyXML	Indicates whether ignores all dirty data and return null value when found invalid XML characters in string value. You can use the following code to know where's dirty xml data in your file. warnings=rs.getWarnings(); if(warnings!=null){ do{//SQState: C0106 Convert dirty data into null value System.out.println(warnings.getMessage()); System.out.println("Error Code:"+warnings.getErrorCode()); System.out.println("SQL State:"+warnings.getSQLState()); }while((warnings=warnings.getNextWarning())!=null); rs.clearWarnings(); }	false
locale	locale is used to specify a default local for parse. You can use CANADA, CANADA_FRENCH, CHINA, CHINESE, ENGLISH, FRANCE, FRENCH, GERMAN, GERMANY, ITALIAN, ITALY, JAPAN, JAPANESE, KOREA, KOREAN, PRC, ROOT, SIMPLIFIED_CHINESE, TAIWAN, TRADITIONAL_CHINESE, UK, or US.	null
dateFormat	dateFormat is used to specify a default parse sequence of date(Default: 'yyyy-MM-dd') format. Date and Time patterns follow the Java java.text.SimpleDateFormat Format (https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html) standard.	yyyy-MM-dd
timeFormat	timeFormat is used to specify a default parse sequence of time(Default: 'hh:mm:ss') format. Date and Time patterns follow the Java java.text.SimpleDateFormat Format (https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html) standard.	hh:mm:ss
timestampFormat	timestampFormat is used to specify a default parse sequence of timestamp(Default: 'yyyy-MM-dd hh:mm:ss') format. Date and Time patterns follow the Java java.text.SimpleDateFormat Format (https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html) standard.	yyyy-MM-dd hh:mm:ss
decimalFormat	decimalFormat is used to specify a default parse sequence of decimal number format.	null
decimalSeparator	decimalSeparator is used to specify a default character for decimal sign. Different for French, etc.	null
groupingSeparator	groupingSeparator is used to specify a default character for thousands separator. Different for French, etc.	null

When your code then tries to open a Connection, and you get a No driver available SQLException being thrown, this is probably caused by the driver not being in the class path, or the JDBC url not being correct.

To close the database connection, simply call the close() method to the Connection:

con.close();