Monthly Archives: April 2014

Open Source Relational Database System

Overview of SQLite benefits including compatibility with R, Python, and ability to work with converted Microsoft Access databases.

Light, Open Source RDBMS SQLite

Tanner Jessel Spring 2014 IS 592 – Big Data Analytics School of Information Sciences College of Communication & Information

What is SQLite?

• Big Data in a Little Package – Portable, No Server – Writes directly to common media – Cross-platform (Mac, Windows, Linux)
• Portable – Small Gadgets (Windows Phone, Android, iOS) • (500 KB, 300 KB) • Embeddable – Computer Applications (Chrome, Firefox, Skype) • Single file database – Triggers – Tables – Indices – Views

http://en.wikipedia.org/wiki/SQLite

What it is not…
• Not a full database application – No forms – No reports – No saved queries

Why should you be interested?
*Free*

• No Server (no mess)

• Command Line Interface

• GUI Interface

• Works with R

• Works with Python

• Good for students to learn advanced SQL queries, command line interaction

• Good for small database projects

RSQLite
Alter, analyze, attatch, begin, commit, create (table, trigger, view, virtual table), delete, select, insert… SQLite Query Language “Mostly” SQL

Database Interface R driver for SQLite. This package embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine (version 3.7.17) is included.” –Comprehensive R Archive Network (CRAN) http://cran.r-project.org/web/packages/RSQLite/ 
Sqlite3 http://www.sqlite.org/cli.html

Command Line Shell

$ sqlite3 myexampledb SQLite version 3.8.4 2014-02-11 16:24:34 Enter ".help" for usage hints. sqlite> CREATE table tbl1(one varchar(10), two smallint); sqlite> INSERT into tbl1 values('hello!',10); sqlite> INSERT into tbl1 values('goodbye', 20); sqlite> SELECT * from tbl1; hello!|10 goodbye|20 sqlite>

*Free* • Feature Rich GUI • Cross-platform – Windows – Mac – Linux – Solaris • http://sqlitestudio.pl/ SQLite Studio
10. • Rstudio Console

> install.packages("RSQLite") > library(RSQLite) > dbDIR <- "/Users/apple/Documents/IS592-Big- Data-Analytics/SQLite-Presentation" > dbFileName <- paste(dbDIR,"classHOBOData2013.db3", sep="/") >drv <- dbDriver("SQLite”) >con <- dbConnect(drv, dbname = dbFileName)

Connect to Data
• Rstudio Console

> allDf <- dbGetQuery(con, "select * from HoboTable ") >View(allDf) Load Data to Frame

Pendant monitor data from SQLite database in data frame SQLite Table Data in R Studio 15,186 rows

>summary(allDf) obsNum serialNo recNo dateTime temperatureC Min. : 1 Length:15186 Min. : 1.0 Length:15186 Min. :14.42 1st Qu.: 3797 Class :character 1st Qu.: 422.0 Class :character 1st Qu.:23.58 Median : 7594 Mode :character Median : 844.0 Mode :character Median :23.87 Mean : 7594 Mean : 844.5 Mean :26.43 3rd Qu.:11390 3rd Qu.:1266.0 3rd Qu.:25.12 Max. :15186 Max. :1722.0 Max. :59.81 intensityLight Min. : 0.0 1st Qu.: 0.0 Median : 10.8 Mean : 2524.8 3rd Qu.: 21.5 Max. :231468.2

Pendant monitor data from SQLite database in plot

>plot(allDf$temperatureC) Temperature Data
Rstudio Console > temperatureF <- (allDf$temperatureC * (9/5.0))+32.0 Transformation C to F
Rstudio Console > temperatureF <- (allDf$temperatureC * (9/5.0))+32.0 > newDf <- cbind(allDf,temperatureF) > dbWriteTable(con, "newTable", newDf) Update Data Frame

Pendant data .db3 file now has two tables
SQLite Table Data Modified by R 15,186 rows, new TemperatureF column added

>view(hotDf) serialNo dateTime temperatureC 1 10081427 2013-06-04 12:32:30 37.824 2 10081427 2013-06-04 12:32:40 37.935 3 10081427 2013-06-04 12:32:50 37.935 4 10081427 2013-06-04 12:33:00 37.935 5 10081427 2013-06-04 12:33:10 38.046 6 10081427 2013-06-04 12:33:20 38.046 7 10081427 2013-06-04 12:33:30 38.490 8 10081427 2013-06-04 12:33:40 38.490 9 10081427 2013-06-04 12:33:50 38.602

10081427 2013-06-04 12:34:00 38.602

Quality Assurance / Control • Import SQLite3 import sqlite3 conn=sqlite3.connect(‘/Users/apple/D ocuments/IS592-Big-Data- Analytics/SQLite- Presentation/classHOBOData2013.db 3’) c=conn.cursor()

Quality Assurance / Control • Find, list “errors” • Whatever you define them to be • > 100 F in Antarctica, for example c.execute(‘SELECT obsNum,serialNo,temperatureF FROM newTable WHERE temperatureF > 100 OR temperatureF < 10’)

Quality Assurance / Control • Check for Errors • Nice printed messsage with your selected “errors”
python QAQCdemo.py

#Fetch all the query results myListOfNames=c.fetchall() # print them out print print("Range Errors for Temperature:") for myTuple in myListOfNames: print("obsNum: "+str(myTuple[0])+" SerialNo: "+str(myTuple[1])+" Temperature: "+str(myTuple[2])) conn.commit() conn.close()
'SELECT obsNum,serialNo,temperatureF FROM newTable WHERE temperatureF &gt; 100 OR temperatureF &lt; 10' &gt;100 F: SQLite/Python QAQC 1,538 records where the temperature was over 100 F

Access SQLite Convert Access to SQLite

https://code.google.com/p/mdb-sqlite/

http://convertdb.com/access/sqlite

https://code.google.com/p/access2sqlite/

http://mdbtools.sourceforge.net/

http://sites.fastspring.com/eggerapps/product/mdbaccdbviewer

Further reading http://www.sqlite.org/books.html

 

Options for Open Source, Public Access for ATBI Data

When I worked for the NBII we used Rackspace and Amazon’s cloud service, along with MySQL for the “Species of Greatest Conservation Need” database that held a similar number of species, but no spatial data.
Since your data has a spatial component, something like PostgreSQL or SQLite might be worthwhile to look at. I took a class in environmental information management last summer and was introduced to SQLite – which apparently has some advantages over databases created with Access, including native support for GIS applications like GRASS and QGIS.
SQLite works with a CMS called Django, Scott Simmerman suggested I look at that for improvements on the ATBI mapping project’s Web interface.  There some free hosting services for small or dev projects:https://wiki.python.org/moin/FreeHosts Might be worthwhile to look at that. Another “free” hosting site here: https://www.pythonanywhere.com
Finally Tom Colson mentioned something he’s working on with GSMIT – the Otter Spotter. Tom said Google Earth Engine might be worthwhile looking at for that.  Google offers grants and hosting of data for non-profits to use Earth Engine, so I’m curious how much of the ATBI database data might be translatable to KML.
When I worked for the NBII I was interested in serving up spatial data in KML files – I like them because you can open KML in robust GIS packages along with popular virtual globe tools like Google Earth or ArcGIS Explorer – things the everyday person has access to.
Even if not all the data can translate, it’s another “view” of the ATBI data that’s pretty useful for outreach, if not doing actual science.
I will explore at this issue as I have time and hopefully help as I am able.  I am a student of RDBMS and the ATBI is one of the more interesting datasets to learn with.

Jackson Avenue Redevelopment

Sharing my comments in response to a survey from the City of Knoxville:
The City of Knoxville seeking public ideas and input for redevelopment of City-owned properties on the 400 and 500 blocks of West Jackson Avenue, including the Jackson Avenue Parking Lot and the site of the former McClung Warehouses. Please give us your thoughts on any or all of the following questions:

1. USE: What kind of development would you like to see along that corridor? i.e., residential, retail, office, other?

Office space at ground floor leads to closed shops in the evening. There should be an emphasis on businesses at ground floor open in the evening when people are free to be downtown. This can create an inviting corridor that safely conveys pedestrians from the Old City towards World’s Fair Park and vice-versa. The low volume of traffic makes a good opportunity for window shopping and patios.
USE: What kind of development would you like to see along that corridor? i.e., residential, retail, office, other?

2. FORM: What architectural and design characteristics are important to you? What should the development look like? How should it relate to its surroundings? i.e. a few big buildings, many smaller buildings, a mix?

With the loss of the McClung buildings, there’s an opportunity to incorporate modern design elements. The Knoxville Convention Center’s World’s Fair Park facade is a good template for incorporating light weight design elements that can balance the industrial heft of Jackson Avenue’s border with the rail yard and Interstate viaduct. All design should consider that the area is highly visible from the Interstate and may provide many passers-by their only impression of Knoxville. Public art such as murals, mosaics, or colored lighting cast on interesting metalwork is highly appropriate.
FORM: What architectural and design characteristics are important to you? What should the development look like? How should it relate to its surroundings? i.e. a few big buildings, many smaller buildings, a mix?

3. PUBLIC AMENITIES: What kinds of public amenities would you like to see incorporated into private redevelopment of the area?

This formerly was a warehouse district for loading and unloading via rail. The L&N station and Southern Railway formerly provided regional transportation for Knoxville. Jackson avenue’s proximity to abandoned rail infrastructure makes it an ideal connecting “hub” for a local heavy rail system connecting to the Knoxville & Holston River Railroad’s lease, which extends to John Sevier Highway along the French Broad, and reaches deep into South Knoxville as far as Ijams Nature Center. A high speed rail link from Atlanta to Chattanooga to Knoxville may someday be an option – Jackson will be the hub for transportation as it was before. This possibility should be not only preserved but actively pursued.

4. OTHER COMMENTS: Anything else you would like to add?
The space between the railroad and Jackson should incorporate a greenway that travels *under* Gay Street , parallel to the railroad right-of-way, and *under* Broadway to connect to World’s Fair Park. Jackson at Gay Street should provide a welcoming atmosphere for interpretation of Knoxville’s “underground” rather than the frightening scene presented today. TDOT needs to be more open about their designs for the Broadway viaduct so the public can provide input.

Feedback on Magnolia Avenue Streetscape Plan

Hello Dawn,

Thanks for the note.

Also great to see you in action last night.

The slide sharing site I mentioned is called “slideshare.net.”

When I used to build web sites for a living, one of my favorite features was that you can take a powerpoint slide deck, upload it to the site, and then get some code to insert into your Web page to provide an interactive presentation.

Or you can just view the presentation on the slideshare site.  Works great on mobile devices, too.

Here are some examples –

http://www.slideshare.net/cltchamber/transit-in-the-charlotte-region

http://www.slideshare.net/cltchamber/meet-the-primes-lynx-blue-line-extension

I’m sending those two particular links because the proposed streetscape enhancement is is exciting, but I feel like we’re taking another step away from what “Park St.” used to be – and that is a dual track light rail public transit system from downtown to the zoo.  After the meeting last night, I dug around in online photo archives to find out of my favorite photos – Chilhowee Park with tree-lined light rail that looks straight out of the English countryside. Here’s that photo:

Chilhowee Park, Knoxville Tenn

Entrance to Chilhowee Park, Knoxville, Tenn

I’m also disapointed the consultant’s presentation did not start by taking a moment to reflect on the rich history of East Knoxville.  This oversight echoed throughout the consultant’s work, from a stylistic perspective to a practical perspective (for example, displacing residents to extend Cherry Street to MLK, when the same connector already exists at Olive Street). Further, I felt the consultant did not effectively engage the public for input in the earliest stages of the design process.  I overheard remarks that the consultant “did not walk around to get a sense of the neighborhoods.” That too is a shame – since in strolling past homes built a century or more ago, you get the sense there is a story to tell here – and diverse personalities to convey.  The consultant’s plan failed to capture these characteristics of the community.

Many residents of my own East Knoxville community see the area as Knoxville’s “Parks and Gardens” district. I found the consultant’s concepts overly generic and therefore at odds with the unique, richly storied community of East Knoxville.”Parks and Gardens” is a designation quite apt, with the Zoological Gardens, Botanical Gardens, and myriad parks and gardens in the area. These attractions were all part of why Magnolia was once known as Park Street and is today “as wide as an interstate,” in the consultant’s words. In the old days Knoxville Traction Company trolley cars rolled from the old car barn at the current KAT facility out to Chilhowee Park.

If we go forward with this streetscape plan as is, then we’ll be taking another step away from this past. This is unfortunate, as it is an affirmation of the abandonment of a transit oriented community design the community sprung from in the first place. Also, in terms of wayfinding – nothing is more clear to me than a line of track. Even in an uneasy environment like a dingy subway platform, trains running on time is comforting.  I can see out-of-town families starting their day at Market Square confidently catching a Downtown trolley to a Magnolia light rail terminus for a quick visit to the Zoo.

Charlotte laid track for their LYNX system for 9 miles. Magnolia from Knoxville Station to city limits is less than that, something like 6.  I can’t think of a community more deserving of transit infrastructure investment or a corridor more appropriate than East Knoxville.  A re-born transit line from downtown to the city limits at the Holston River could connect to an AMP style “fast bus” jetting travellers down John Sevier Highway towards McGhee Tyson Airport.  The County has always wanted to develop East Knox County – collaboration with the city on transit might play a key part in that.  Future collaboration with Sevier County might extend the transit line down Asheville Highway to Exit 407, Smokies Stadium, Bass Pro Shops, and the planned Dumplin Creek shopping attraction, not to mention connecting to the existing transit hub in Sevierville, Pigeon Forge, and Gatlinburg.

I’m grateful for the energy and attention the consultants brought to this project.  But I’m sad to think that if we take a step forward on this, we’re taking yet another step away from a proven, transit oriented design. Worse, we’re limiting our options to return to transit-centric design in the future.

On a related note – it sounds like Linden avenue would be another great choice for a “neighborhood greenway.”

Thank you for your invitation to share my thoughts with you. I hope the slideshare site proves helpful.  I’ll likely reflect further with additional review of the material shared last night.

-Tanner

Volunteer Service to National Amphibian Conservation Organization Highlighted

Along with volunteering as the webmaster for SEPARC, I volunteer to help out with content management tasks for the national Partners in Amphibian and Reptile Conservation (PARC) web site.

Mostly, I post new job opportunities to the PARC jobs board. If jobs are particularly interesting or relevant to the Southeast, I’ll post them on the Southeast PARC (SEPARC) Facebook Page that I help manage, which then is linked to that organization’s Twitter.

This volunteer work not only helps out the animals, but helps me keep my HTML and social media skills up-to-date, and lets me add experience with the Joomla content management system to my list of skills.

Visit PARC’S website at http://www.parcplace.org! The website has been updated and new content has been added. In particular, see the new information under Resources, including a new Department of Defense PARC (DoD PARC) link regarding our new initiative with military instal- lation natural resource personnel to provide stewardship for threatened and endangered herpetofauna. You can also join our announcement listserv to stay up-to-date on PARC and partner happenings. Check it out! PARC thanks JD Willson (University of Arkansas) and Tanner Jessel (University of Tennessee, Knoxville) for their volunteer time to keep our website up to date and attractive, and to Brian Todd (University of California, Davis) for maintaining our listserv!

Read more about PARC in the 2012 Annual Report linked here <http://www.parcplace.org/images/stories/documents/2012_PARC_Annual_Report_MedQ.pdf>.

Semi-Automated Dissemmination of Geographically Linked Policing Data

I post Regional Analyisis and Information Data Sharing (RAIDS) online data for my neighborhood on a separate blog. I made a matching twitter account @alertparkridge to tweet a link when new data is posted on the blog.

I’m trying to automate updates to the extent possible – presently a subscription to RAIDS data for all reports within 1 mile of Olive and Washington comes to my e-mail inbox. Now it’s automatically forwarded to my blog via an e-mail filter and my blog’s publish-by-e-mail address.

The flow of information is as follows

1. Incident report from Knoxville Police Department officer

2. Weekly upload to RAIDS Online, Knoxville Police Department (I assume on Fridays).

3. Push of information from RAIDS online database via e-mail to my personal gmail account

4. Personal g-mail account filter instruction forwarding for all incoming mail from RAIDS online database to personal blog (alertparkridge.blogspot.com) “publish-by-e-mail” url.

5.  Saved as draft.  Editing to remove unsubscribe link.

6. Publish with tags matching incident codes within proper boundaries of neighborhood (cataloguing, classification, tagging)

7. “push” via blogger RSS to @alertparkridge using twitterfeed.

8. Potential last step, if possible, is to sync with neighborhood organization’s Facebook page. This would save me the step of posting anything myself.  I don’t know if this works for Groups, but I do know it works on my personal Facebook and a page I help admin (SEPARC).

Unfortunately the workflow as-is still requires me to interact with it; I set the publish-by-email setting to save a draft because the e-mail includes a URL that, when clicked, can disable my RAIDS online subscription, so I have to edit that out.

The point of the Twitter is that residents can get incident updates for the neighborhood on twitter or text the following: “follow alertparkridge” to the number 40404 to get SMS updates. This is explained here:

http://alertparkridge.blogspot.com/2014/04/alerts-via-twitter-or-text-message.html

Custom Plant Model Request

Eastern Hemlock (Tsuga canadensis) model on PC versus Supercomputer

I recently ran a model of Hemlock on my personal computer.

I modeled ~489 records; the Nautilus model used over 2,000.

From the model output:

This is a representation of the Maxent model for Tsuga_canadensis. Warmer colors show areas with better predicted conditions. White dots show the presence locations used for training, while violet dots show test locations.

 

This slideshow requires JavaScript.

This is not really a fair comparison, but the difference between the models with 489 records and 2000+ is interesting for comparing the predictions.

EECS model for Eastern hemlock is different.  It uses more data, and it was run 20 times with 10% of the records reserved before being synthesized into one image.

I will post some additional comparisons for other trees in some later posts.

Convert Access Database to SQLite

Converting a database:

BioDB_EECS.accdb

to

BioDB_EECS.db

I found this to be the best (and easiest) tool:

https://eggerapps.at/mdbviewer/

http://sites.fastspring.com/eggerapps/product/mdbaccdbviewer

This took a 440 MB file in Access down to a 80 MB file (although, it should be noted the demo version only saves half the table data; still, assuming the other half is 80 MB, we see a huge reduction in file size).

Also I like that this goes to an open access file format. With SQLite3 in the command line, or SQLite Studio across platforms, a file format that was once proprietary and accessible only to those with Windows software is now an open -access format.

I’ll update this post with examples at a later point.

Other options I evaluated:

https://code.google.com/p/mdb-sqlite/

  • This one did not work for me, although I appreciated that it’s open source.  I don’t think it works with sqlite 3. 
  • Also, it appears to work for .mdb only; you have to convert an .accdb file to a .mdb file

http://convertdb.com/access/sqlite

https://code.google.com/p/access2sqlite/

http://mdbtools.sourceforge.net/