Open Source Relational Database System

Overview of SQLite benefits including compatibility with R, Python, and ability to work with converted Microsoft Access databases.

Light, Open Source RDBMS SQLite

Tanner Jessel Spring 2014 IS 592 – Big Data Analytics School of Information Sciences College of Communication & Information

What is SQLite?

• Big Data in a Little Package – Portable, No Server – Writes directly to common media – Cross-platform (Mac, Windows, Linux)
• Portable – Small Gadgets (Windows Phone, Android, iOS) • (500 KB, 300 KB) • Embeddable – Computer Applications (Chrome, Firefox, Skype) • Single file database – Triggers – Tables – Indices – Views

http://en.wikipedia.org/wiki/SQLite

What it is not…
• Not a full database application – No forms – No reports – No saved queries

Why should you be interested?
*Free*

• No Server (no mess)

• Command Line Interface

• GUI Interface

• Works with R

• Works with Python

• Good for students to learn advanced SQL queries, command line interaction

• Good for small database projects

RSQLite
Alter, analyze, attatch, begin, commit, create (table, trigger, view, virtual table), delete, select, insert… SQLite Query Language “Mostly” SQL

Database Interface R driver for SQLite. This package embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine (version 3.7.17) is included.” –Comprehensive R Archive Network (CRAN) http://cran.r-project.org/web/packages/RSQLite/ 
Sqlite3 http://www.sqlite.org/cli.html

Command Line Shell

$ sqlite3 myexampledb SQLite version 3.8.4 2014-02-11 16:24:34 Enter ".help" for usage hints. sqlite> CREATE table tbl1(one varchar(10), two smallint); sqlite> INSERT into tbl1 values('hello!',10); sqlite> INSERT into tbl1 values('goodbye', 20); sqlite> SELECT * from tbl1; hello!|10 goodbye|20 sqlite>

*Free* • Feature Rich GUI • Cross-platform – Windows – Mac – Linux – Solaris • http://sqlitestudio.pl/ SQLite Studio
10. • Rstudio Console

> install.packages("RSQLite") > library(RSQLite) > dbDIR <- "/Users/apple/Documents/IS592-Big- Data-Analytics/SQLite-Presentation" > dbFileName <- paste(dbDIR,"classHOBOData2013.db3", sep="/") >drv <- dbDriver("SQLite”) >con <- dbConnect(drv, dbname = dbFileName)

Connect to Data
• Rstudio Console

> allDf <- dbGetQuery(con, "select * from HoboTable ") >View(allDf) Load Data to Frame

Pendant monitor data from SQLite database in data frame SQLite Table Data in R Studio 15,186 rows

>summary(allDf) obsNum serialNo recNo dateTime temperatureC Min. : 1 Length:15186 Min. : 1.0 Length:15186 Min. :14.42 1st Qu.: 3797 Class :character 1st Qu.: 422.0 Class :character 1st Qu.:23.58 Median : 7594 Mode :character Median : 844.0 Mode :character Median :23.87 Mean : 7594 Mean : 844.5 Mean :26.43 3rd Qu.:11390 3rd Qu.:1266.0 3rd Qu.:25.12 Max. :15186 Max. :1722.0 Max. :59.81 intensityLight Min. : 0.0 1st Qu.: 0.0 Median : 10.8 Mean : 2524.8 3rd Qu.: 21.5 Max. :231468.2

Pendant monitor data from SQLite database in plot

>plot(allDf$temperatureC) Temperature Data
Rstudio Console > temperatureF <- (allDf$temperatureC * (9/5.0))+32.0 Transformation C to F
Rstudio Console > temperatureF <- (allDf$temperatureC * (9/5.0))+32.0 > newDf <- cbind(allDf,temperatureF) > dbWriteTable(con, "newTable", newDf) Update Data Frame

Pendant data .db3 file now has two tables
SQLite Table Data Modified by R 15,186 rows, new TemperatureF column added

>view(hotDf) serialNo dateTime temperatureC 1 10081427 2013-06-04 12:32:30 37.824 2 10081427 2013-06-04 12:32:40 37.935 3 10081427 2013-06-04 12:32:50 37.935 4 10081427 2013-06-04 12:33:00 37.935 5 10081427 2013-06-04 12:33:10 38.046 6 10081427 2013-06-04 12:33:20 38.046 7 10081427 2013-06-04 12:33:30 38.490 8 10081427 2013-06-04 12:33:40 38.490 9 10081427 2013-06-04 12:33:50 38.602

10081427 2013-06-04 12:34:00 38.602

Quality Assurance / Control • Import SQLite3 import sqlite3 conn=sqlite3.connect(‘/Users/apple/D ocuments/IS592-Big-Data- Analytics/SQLite- Presentation/classHOBOData2013.db 3’) c=conn.cursor()

Quality Assurance / Control • Find, list “errors” • Whatever you define them to be • > 100 F in Antarctica, for example c.execute(‘SELECT obsNum,serialNo,temperatureF FROM newTable WHERE temperatureF > 100 OR temperatureF < 10’)

Quality Assurance / Control • Check for Errors • Nice printed messsage with your selected “errors”
python QAQCdemo.py

#Fetch all the query results myListOfNames=c.fetchall() # print them out print print("Range Errors for Temperature:") for myTuple in myListOfNames: print("obsNum: "+str(myTuple[0])+" SerialNo: "+str(myTuple[1])+" Temperature: "+str(myTuple[2])) conn.commit() conn.close()
'SELECT obsNum,serialNo,temperatureF FROM newTable WHERE temperatureF &gt; 100 OR temperatureF &lt; 10' &gt;100 F: SQLite/Python QAQC 1,538 records where the temperature was over 100 F

Access SQLite Convert Access to SQLite

https://code.google.com/p/mdb-sqlite/

http://convertdb.com/access/sqlite

https://code.google.com/p/access2sqlite/

http://mdbtools.sourceforge.net/

http://sites.fastspring.com/eggerapps/product/mdbaccdbviewer

Further reading http://www.sqlite.org/books.html

 

Advertisements

About Tanner Jessel

I am a recent M.S. in Information Science graduate from the University of Tennessee School of Information Science. I was formerly a graduate research assistant funded by DataONE (Data Observation Network for Earth). Prior, I worked for four years as a content lead and biodiversity scientist with the U.S. Geological Survey's Biodiversity Informatics Program. Building on my work experience in biodiversity and environmental informatics, my work with DataONE focused on exploring the nature of scientific collaborations necessary for scientific inquiry. I also conducted research concerning user experience and usability, and assisted in development of member nodes with an emphasis on spatial data and infrastructure. I assisted with research designed to understand sociocultural issues within collaborative research communities. Through August 1, 2014, I was based at the Center for Information and Communication Studies at the University of Tennessee School of Information Science in Knoxville, Tennessee.

Posted on April 24, 2014, in Big Data Analytics, Coursework and tagged , , , , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: