-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathparams.json
More file actions
1 lines (1 loc) · 5.79 KB
/
params.json
File metadata and controls
1 lines (1 loc) · 5.79 KB
1
{"name":"q","tagline":"q - Text as Data","body":"# q - Text as Data\r\nq is a command line tool that allows direct execution of SQL-like queries on CSVs/TSVs (and any other tabular text files).\r\n\r\nMain features:\r\n* Seamless multi-table SQL support, including joins. filenames are just used instead of table names (use - for stdin)\r\n* Automatic column name and column type detection (Allows working more naturally with the data)\r\n* Full encoding support (input, output and query)\r\n\r\n## Examples\r\nA beginner's tutorial can be found [here](examples/EXAMPLES.markdown).\r\n\r\n__Example 1:__\r\n\r\n q -H -t \"select count(distinct(uuid)) from ./clicks.csv\"\r\n \r\n__Output 1:__\r\n```bash\r\n229\r\n```\r\n\r\n__Example 2:__\r\n\r\n q -H -t \"select request_id,score from ./clicks.csv where score > 0.7 order by score desc limit 5\"\r\n\r\n__Output 2:__\r\n```bash\r\n2cfab5ceca922a1a2179dc4687a3b26e\t1.0\r\nf6de737b5aa2c46a3db3208413a54d64\t0.986665809568\r\n766025d25479b95a224bd614141feee5\t0.977105183282\r\n2c09058a1b82c6dbcf9dc463e73eddd2\t0.703255121794\r\n```\r\n\r\n__Example 3:__\r\n\r\n q -t -H \"select strftime('%H:%M',date_time) hour_and_minute,count(*) from ./clicks.csv group by hour_and_minute\"\r\n\r\n__Output 3:__\r\n```bash\r\n07:00\t138148\r\n07:01\t140026\r\n07:02\t121826\r\n```\r\n\r\n__Usage Example 4:__\r\n\r\n q -t -H \"select hashed_source_machine,count(*) from ./clicks.csv group by hashed_source_machine\"\r\n \r\n__Output 4:__\r\n```bash\r\n47d9087db433b9ba.domain.com\t400000\r\n```\r\n\r\n__Example 5 (total size per user/group in the /tmp subtree):__\r\n\r\n sudo find /tmp -ls | q \"select c5,c6,sum(c7)/1024.0/1024 as total from - group by c5,c6 order by total desc\"\r\n\r\n__Output 5:__\r\n```bash\r\nmapred hadoop 304.00390625\r\nroot root 8.0431451797485\r\nsmith smith 4.34389972687\r\n```\r\n\r\n__Example 6 (top 3 user ids with the largest number of owned processes, sorted in descending order):__\r\n\r\nNote the usage of the autodetected column name UID in the query.\r\n\r\n ps -ef | q -H \"select UID,count(*) cnt from - group by UID order by cnt desc limit 3\"\r\n \r\n__Output 6:__\r\n```bash\r\nroot 152\r\nharel 119\r\navahi 2\r\n```\r\n\r\n## Installation\r\nCurrent stable version is `1.4.0`. \r\n\r\nRequirements: Just Python 2.5 and up or Python 2.4 with sqlite3 module installed. Python 3.x is not supported yet.\r\n\r\n### Mac Users\r\nMake sure you run `brew update` first and then just run `brew install q`. \r\n\r\nThanks [@stuartcarnie](https://github.com/stuartcarnie) for the initial homebrew formula\r\n\r\n### RPM-Base Linux distributions\r\nDownload the version `1.4.0` RPM here **[here](https://github.com/harelba/packages-for-q/raw/master/rpms/q-text-as-data-1.4.0-1.noarch.rpm)**. \r\n\r\nInstall using `rpm -ivh <rpm-name>`.\r\n\r\nRPM Releases also contain a man page. Just enter `man q`.\r\n\r\n**NOTE** In Version `1.4.0`, the RPM package name has been changed to `q-text-as-data`. If you already have the old version, just remove it with `rpm -e q` before installing.\r\n\r\n### Manual installation (very simple, since there are no dependencies)\r\n\r\n1. Download the main q executable from **[here](https://raw.github.com/harelba/q/1.4.0/bin/q)** into a folder in the PATH.\r\n2. Make the file executable.\r\n\r\nFor `Windows` machines, also download q.bat **[here](https://raw.github.com/harelba/q/1.4.0/bin/q.bat)** into the same folder and use it to run q.\r\n\r\n### Debian-based Linux distributions\r\nIf you're interested in Debian packaing, please drop me a line to harelba@gmail.com.\r\n\r\n## Overview\r\nHave you ever stared at a text file on the screen, hoping it would have been a database so you could ask anything you want about it? I had that feeling many times, and I've finally understood that it's not the _database_ that I want. It's the language - SQL.\r\n\r\nSQL is a declarative language for data, and as such it allows me to define what I want without caring about how exactly it's done. This is the reason SQL is so powerful, because it treats data as data and not as bits and bytes (and chars).\r\n\r\nThe goal of this tool is to provide a bridge between the world of text files and of SQL.\r\n\r\n## Usage\r\nq's basic usage is very simple:`q <flags> <query>`, but it has lots of features under the hood and in the flags that can be passed to the command.\r\n\r\nSimplest execution is q \"SELECT * FROM myfile\" which prints the entire file.\r\n\r\nComplete information can be found [here](doc/USAGE.markdown)\r\n\r\n## Implementation\r\nSome implementation details can be found [here](doc/IMPLEMENTATION.markdown)\r\n\r\n## Limitations\r\n* No checks and bounds on data size\r\n* Spaces in file names are not supported yet. I'm working on it.\r\n* It is possible that some rare cases of subqueries are not supported yet. Please open an issue if you find such a case. This will be fixed once the tool performs its own full-blown SQL parsing.\r\n\r\n## Future Ideas\r\n* Faster reuse of previous data loading\r\n* Allow working with external DB\r\n* Real parsing of the SQL, allowing smarter execution of queries.\r\n* Smarter batch insertion to the database\r\n* Provide mechanisms beyond SELECT - INSERT and CREATE TABLE SELECT and such.\r\n\r\n## Rationale\r\nSome information regarding the rationale for this tool and related philosophy can be found [here](doc/RATIONALE.markdown)\r\n\r\n## Change log\r\nHistory of changes can be found [here](doc/CHANGELOG.markdown)\r\n\r\n## Contact\r\nAny feedback/suggestions/complaints regarding this tool would be much appreciated. Contributions are most welcome as well, of course.\r\n\r\nHarel Ben-Attia, harelba@gmail.com, [@harelba](https://twitter.com/harelba) on Twitter\r\n\r\nq on twitter: #qtextasdata\r\n\r\n","google":"UA-48316355-1","note":"Don't delete this file! It's used internally to help with page regeneration."}