Monthly Archives: April 2014
Light, Open Source RDBMS SQLite
Tanner Jessel Spring 2014 IS 592 – Big Data Analytics School of Information Sciences College of Communication & Information
What is SQLite?
• Big Data in a Little Package – Portable, No Server – Writes directly to common media – Cross-platform (Mac, Windows, Linux)
• Portable – Small Gadgets (Windows Phone, Android, iOS) • (500 KB, 300 KB) • Embeddable – Computer Applications (Chrome, Firefox, Skype) • Single file database – Triggers – Tables – Indices – Views
What it is not…
• Not a full database application – No forms – No reports – No saved queries
Why should you be interested?
• No Server (no mess)
• Command Line Interface
• GUI Interface
• Works with R
• Works with Python
• Good for students to learn advanced SQL queries, command line interaction
• Good for small database projects
Alter, analyze, attatch, begin, commit, create (table, trigger, view, virtual table), delete, select, insert… SQLite Query Language “Mostly” SQL
Database Interface R driver for SQLite. This package embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine (version 3.7.17) is included.” –Comprehensive R Archive Network (CRAN) http://cran.r-project.org/web/packages/RSQLite/
$ sqlite3 myexampledb SQLite version 3.8.4 2014-02-11 16:24:34 Enter ".help" for usage hints. sqlite> CREATE table tbl1(one varchar(10), two smallint); sqlite> INSERT into tbl1 values('hello!',10); sqlite> INSERT into tbl1 values('goodbye', 20); sqlite> SELECT * from tbl1; hello!|10 goodbye|20 sqlite>
*Free* • Feature Rich GUI • Cross-platform – Windows – Mac – Linux – Solaris • http://sqlitestudio.pl/ SQLite Studio
10. • Rstudio Console
> install.packages("RSQLite") > library(RSQLite) > dbDIR <- "/Users/apple/Documents/IS592-Big- Data-Analytics/SQLite-Presentation" > dbFileName <- paste(dbDIR,"classHOBOData2013.db3", sep="/") >drv <- dbDriver("SQLite”) >con <- dbConnect(drv, dbname = dbFileName)
Connect to Data
• Rstudio Console
> allDf <- dbGetQuery(con, "select * from HoboTable ") >View(allDf) Load Data to Frame
Pendant monitor data from SQLite database in data frame SQLite Table Data in R Studio 15,186 rows
>summary(allDf) obsNum serialNo recNo dateTime temperatureC Min. : 1 Length:15186 Min. : 1.0 Length:15186 Min. :14.42 1st Qu.: 3797 Class :character 1st Qu.: 422.0 Class :character 1st Qu.:23.58 Median : 7594 Mode :character Median : 844.0 Mode :character Median :23.87 Mean : 7594 Mean : 844.5 Mean :26.43 3rd Qu.:11390 3rd Qu.:1266.0 3rd Qu.:25.12 Max. :15186 Max. :1722.0 Max. :59.81 intensityLight Min. : 0.0 1st Qu.: 0.0 Median : 10.8 Mean : 2524.8 3rd Qu.: 21.5 Max. :231468.2
Pendant monitor data from SQLite database in plot
>plot(allDf$temperatureC) Temperature Data Rstudio Console > temperatureF <- (allDf$temperatureC * (9/5.0))+32.0 Transformation C to F Rstudio Console > temperatureF <- (allDf$temperatureC * (9/5.0))+32.0 > newDf <- cbind(allDf,temperatureF) > dbWriteTable(con, "newTable", newDf) Update Data Frame
Pendant data .db3 file now has two tables
SQLite Table Data Modified by R 15,186 rows, new TemperatureF column added
>view(hotDf) serialNo dateTime temperatureC 1 10081427 2013-06-04 12:32:30 37.824 2 10081427 2013-06-04 12:32:40 37.935 3 10081427 2013-06-04 12:32:50 37.935 4 10081427 2013-06-04 12:33:00 37.935 5 10081427 2013-06-04 12:33:10 38.046 6 10081427 2013-06-04 12:33:20 38.046 7 10081427 2013-06-04 12:33:30 38.490 8 10081427 2013-06-04 12:33:40 38.490 9 10081427 2013-06-04 12:33:50 38.602
10081427 2013-06-04 12:34:00 38.602
Quality Assurance / Control • Import SQLite3 import sqlite3 conn=sqlite3.connect(‘/Users/apple/D ocuments/IS592-Big-Data- Analytics/SQLite- Presentation/classHOBOData2013.db 3’) c=conn.cursor()
Quality Assurance / Control • Find, list “errors” • Whatever you define them to be • > 100 F in Antarctica, for example c.execute(‘SELECT obsNum,serialNo,temperatureF FROM newTable WHERE temperatureF > 100 OR temperatureF < 10’)
Quality Assurance / Control • Check for Errors • Nice printed messsage with your selected “errors”
#Fetch all the query results myListOfNames=c.fetchall() # print them out print print("Range Errors for Temperature:") for myTuple in myListOfNames: print("obsNum: "+str(myTuple)+" SerialNo: "+str(myTuple)+" Temperature: "+str(myTuple)) conn.commit() conn.close()
'SELECT obsNum,serialNo,temperatureF FROM newTable WHERE temperatureF > 100 OR temperatureF < 10' >100 F: SQLite/Python QAQC 1,538 records where the temperature was over 100 F
Access SQLite Convert Access to SQLite
Further reading http://www.sqlite.org/books.html
Sharing my comments in response to a survey from the City of Knoxville:
The City of Knoxville seeking public ideas and input for redevelopment of City-owned properties on the 400 and 500 blocks of West Jackson Avenue, including the Jackson Avenue Parking Lot and the site of the former McClung Warehouses. Please give us your thoughts on any or all of the following questions:
1. USE: What kind of development would you like to see along that corridor? i.e., residential, retail, office, other?
Office space at ground floor leads to closed shops in the evening. There should be an emphasis on businesses at ground floor open in the evening when people are free to be downtown. This can create an inviting corridor that safely conveys pedestrians from the Old City towards World’s Fair Park and vice-versa. The low volume of traffic makes a good opportunity for window shopping and patios.
USE: What kind of development would you like to see along that corridor? i.e., residential, retail, office, other?
2. FORM: What architectural and design characteristics are important to you? What should the development look like? How should it relate to its surroundings? i.e. a few big buildings, many smaller buildings, a mix?
With the loss of the McClung buildings, there’s an opportunity to incorporate modern design elements. The Knoxville Convention Center’s World’s Fair Park facade is a good template for incorporating light weight design elements that can balance the industrial heft of Jackson Avenue’s border with the rail yard and Interstate viaduct. All design should consider that the area is highly visible from the Interstate and may provide many passers-by their only impression of Knoxville. Public art such as murals, mosaics, or colored lighting cast on interesting metalwork is highly appropriate.
FORM: What architectural and design characteristics are important to you? What should the development look like? How should it relate to its surroundings? i.e. a few big buildings, many smaller buildings, a mix?
3. PUBLIC AMENITIES: What kinds of public amenities would you like to see incorporated into private redevelopment of the area?
This formerly was a warehouse district for loading and unloading via rail. The L&N station and Southern Railway formerly provided regional transportation for Knoxville. Jackson avenue’s proximity to abandoned rail infrastructure makes it an ideal connecting “hub” for a local heavy rail system connecting to the Knoxville & Holston River Railroad’s lease, which extends to John Sevier Highway along the French Broad, and reaches deep into South Knoxville as far as Ijams Nature Center. A high speed rail link from Atlanta to Chattanooga to Knoxville may someday be an option – Jackson will be the hub for transportation as it was before. This possibility should be not only preserved but actively pursued.
4. OTHER COMMENTS: Anything else you would like to add?
The space between the railroad and Jackson should incorporate a greenway that travels *under* Gay Street , parallel to the railroad right-of-way, and *under* Broadway to connect to World’s Fair Park. Jackson at Gay Street should provide a welcoming atmosphere for interpretation of Knoxville’s “underground” rather than the frightening scene presented today. TDOT needs to be more open about their designs for the Broadway viaduct so the public can provide input.
Thanks for the note.
Also great to see you in action last night.
The slide sharing site I mentioned is called “slideshare.net.”
When I used to build web sites for a living, one of my favorite features was that you can take a powerpoint slide deck, upload it to the site, and then get some code to insert into your Web page to provide an interactive presentation.
Or you can just view the presentation on the slideshare site. Works great on mobile devices, too.
Here are some examples –
I’m sending those two particular links because the proposed streetscape enhancement is is exciting, but I feel like we’re taking another step away from what “Park St.” used to be – and that is a dual track light rail public transit system from downtown to the zoo. After the meeting last night, I dug around in online photo archives to find out of my favorite photos – Chilhowee Park with tree-lined light rail that looks straight out of the English countryside. Here’s that photo:
I’m also disapointed the consultant’s presentation did not start by taking a moment to reflect on the rich history of East Knoxville. This oversight echoed throughout the consultant’s work, from a stylistic perspective to a practical perspective (for example, displacing residents to extend Cherry Street to MLK, when the same connector already exists at Olive Street). Further, I felt the consultant did not effectively engage the public for input in the earliest stages of the design process. I overheard remarks that the consultant “did not walk around to get a sense of the neighborhoods.” That too is a shame – since in strolling past homes built a century or more ago, you get the sense there is a story to tell here – and diverse personalities to convey. The consultant’s plan failed to capture these characteristics of the community.
Many residents of my own East Knoxville community see the area as Knoxville’s “Parks and Gardens” district. I found the consultant’s concepts overly generic and therefore at odds with the unique, richly storied community of East Knoxville.”Parks and Gardens” is a designation quite apt, with the Zoological Gardens, Botanical Gardens, and myriad parks and gardens in the area. These attractions were all part of why Magnolia was once known as Park Street and is today “as wide as an interstate,” in the consultant’s words. In the old days Knoxville Traction Company trolley cars rolled from the old car barn at the current KAT facility out to Chilhowee Park.
If we go forward with this streetscape plan as is, then we’ll be taking another step away from this past. This is unfortunate, as it is an affirmation of the abandonment of a transit oriented community design the community sprung from in the first place. Also, in terms of wayfinding – nothing is more clear to me than a line of track. Even in an uneasy environment like a dingy subway platform, trains running on time is comforting. I can see out-of-town families starting their day at Market Square confidently catching a Downtown trolley to a Magnolia light rail terminus for a quick visit to the Zoo.
Charlotte laid track for their LYNX system for 9 miles. Magnolia from Knoxville Station to city limits is less than that, something like 6. I can’t think of a community more deserving of transit infrastructure investment or a corridor more appropriate than East Knoxville. A re-born transit line from downtown to the city limits at the Holston River could connect to an AMP style “fast bus” jetting travellers down John Sevier Highway towards McGhee Tyson Airport. The County has always wanted to develop East Knox County – collaboration with the city on transit might play a key part in that. Future collaboration with Sevier County might extend the transit line down Asheville Highway to Exit 407, Smokies Stadium, Bass Pro Shops, and the planned Dumplin Creek shopping attraction, not to mention connecting to the existing transit hub in Sevierville, Pigeon Forge, and Gatlinburg.
I’m grateful for the energy and attention the consultants brought to this project. But I’m sad to think that if we take a step forward on this, we’re taking yet another step away from a proven, transit oriented design. Worse, we’re limiting our options to return to transit-centric design in the future.
On a related note – it sounds like Linden avenue would be another great choice for a “neighborhood greenway.”
Thank you for your invitation to share my thoughts with you. I hope the slideshare site proves helpful. I’ll likely reflect further with additional review of the material shared last night.
Note: the following is correspondence in response to a call for nominations to “Beautification Mobs” as part of “Keep Knoxville Beautiful.”
I took note of the recent press release, “Keep Knoxville Beautiful Beautification Mob Call for Locations.”
I believe you’re already familiar with the 6th Avenue Mural Project, which was awarded an Orchid for beautification (https://www.facebook.com/parkridge.mural.project).
While the mural itself made great strides in beautifying the site, the vegetated margin is a prime spot for further work.
Along with restoring the intersection of Hall of Fame and North Sixth Avenue occupied by the former TDOT SmartFix40 community center, this corridor connecting a major thoroughfare to the mural site could be important for creating an inviting atmosphere for tourists interested in cruising Knoxville’s public art circuit, promoted by Visit Knoxville at http://www.visitknoxville.com/knoxvillemurals.
I’m attaching a photo from Google’s Street View <https://firstname.lastname@example.org,-83.914576,3a,90y,32.71h,85.29t/data=!3m4!1e1!3m2!1shBXEPmyZBD31n_xuxey5fQ!2e0!6m1!1e1> of the area in need of attention at the intersection of Washington Avenue and Sixth Avenue – currently bare soil and unmaintained vegetation contrast with the beautification effort of the 6th Avenue Mural Project.
I post Regional Analyisis and Information Data Sharing (RAIDS) online data for my neighborhood on a separate blog. I made a matching twitter account @alertparkridge to tweet a link when new data is posted on the blog.
I’m trying to automate updates to the extent possible – presently a subscription to RAIDS data for all reports within 1 mile of Olive and Washington comes to my e-mail inbox. Now it’s automatically forwarded to my blog via an e-mail filter and my blog’s publish-by-e-mail address.
The flow of information is as follows
1. Incident report from Knoxville Police Department officer
2. Weekly upload to RAIDS Online, Knoxville Police Department (I assume on Fridays).
3. Push of information from RAIDS online database via e-mail to my personal gmail account
5. Saved as draft. Editing to remove unsubscribe link.
6. Publish with tags matching incident codes within proper boundaries of neighborhood (cataloguing, classification, tagging)
7. “push” via blogger RSS to @alertparkridge using twitterfeed.
8. Potential last step, if possible, is to sync with neighborhood organization’s Facebook page. This would save me the step of posting anything myself. I don’t know if this works for Groups, but I do know it works on my personal Facebook and a page I help admin (SEPARC).
Unfortunately the workflow as-is still requires me to interact with it; I set the publish-by-email setting to save a draft because the e-mail includes a URL that, when clicked, can disable my RAIDS online subscription, so I have to edit that out.
The point of the Twitter is that residents can get incident updates for the neighborhood on twitter or text the following: “follow alertparkridge” to the number 40404 to get SMS updates. This is explained here:
I recently ran a model of Hemlock on my personal computer.
I modeled ~489 records; the Nautilus model used over 2,000.
From the model output:
This is a representation of the Maxent model for Tsuga_canadensis. Warmer colors show areas with better predicted conditions. White dots show the presence locations used for training, while violet dots show test locations.
This is not really a fair comparison, but the difference between the models with 489 records and 2000+ is interesting for comparing the predictions.
EECS model for Eastern hemlock is different. It uses more data, and it was run 20 times with 10% of the records reserved before being synthesized into one image.
I will post some additional comparisons for other trees in some later posts.
Converting a database:
I found this to be the best (and easiest) tool:
This took a 440 MB file in Access down to a 80 MB file (although, it should be noted the demo version only saves half the table data; still, assuming the other half is 80 MB, we see a huge reduction in file size).
Also I like that this goes to an open access file format. With SQLite3 in the command line, or SQLite Studio across platforms, a file format that was once proprietary and accessible only to those with Windows software is now an open -access format.
I’ll update this post with examples at a later point.
Other options I evaluated:
- This one did not work for me, although I appreciated that it’s open source. I don’t think it works with sqlite 3.
- Also, it appears to work for .mdb only; you have to convert an .accdb file to a .mdb file