X Tutup
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "ISRC Python Workshop: Using APIs\n", "\n", "___Getting data using APIs___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "@author: Zhiya Zuo\n", "\n", "@email: zhiya-zuo@uiowa.edu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "APIs, application programming interfaces, are services designed for easier software developments. APIs can be in many different forms, including software libraries and database systems. Generally, you can think of APIs as Lego pieces used for specific models. I found a somewhat brief but interesting read on APIs [here](https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-apis-application-programming-interfaces-5-apis-a-data-scientist-must-know/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we dive into playing with APIs using Python, let's first try some simple examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### [Open Geocoding APIs](https://developer.mapquest.com/documentation/open/geocoding-api/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use this API product to convert between addresses and latitude/longtitude." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "API Request Format:\n", "\n", "http://open.mapquestapi.com/geocoding/v1/address?key=KEY&location=LOCATION\n", "\n", "where:\n", "- `KEY` is the [API key](https://stackoverflow.com/questions/1453073/what-is-an-api-key)\n", "- `LOCATION`: address name (e.g., the University of Iowa)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first thing we need to do is to register for API keys: [link here](https://developer.mapquest.com/plan_purchase/steps/business_edition/business_edition_free/register)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a simple example, let's try to search where the university is by: http://open.mapquestapi.com/geocoding/v1/address?key=f08MixE5Tcq8uFS8ltbu3xeJ7qPF4SRw&location=The%20University%20of%20Iowa\n", "\n", "And now we get:\n", "```json\n", "{\"info\":{\"statuscode\":0,\"copyright\":{\"text\":\"\\u00A9 2019 MapQuest, Inc.\",\"imageUrl\":\"http://api.mqcdn.com/res/mqlogo.gif\",\"imageAltText\":\"\\u00A9 2019 MapQuest, Inc.\"},\"messages\":[]},\"options\":{\"maxResults\":-1,\"thumbMaps\":true,\"ignoreLatLngInput\":false},\"results\":[{\"providedLocation\":{\"location\":\"The University of Iowa\"},\"locations\":[{\"street\":\"200 Hawkins Drive\",\"adminArea6\":\"\",\"adminArea6Type\":\"Neighborhood\",\"adminArea5\":\"Iowa City\",\"adminArea5Type\":\"City\",\"adminArea4\":\"Johnson County\",\"adminArea4Type\":\"County\",\"adminArea3\":\"IA\",\"adminArea3Type\":\"State\",\"adminArea1\":\"US\",\"adminArea1Type\":\"Country\",\"postalCode\":\"52245\",\"geocodeQualityCode\":\"P1XXX\",\"geocodeQuality\":\"POINT\",\"dragPoint\":false,\"sideOfStreet\":\"N\",\"linkId\":\"0\",\"unknownInput\":\"\",\"type\":\"s\",\"latLng\":{\"lat\":41.659429,\"lng\":-91.548385},\"displayLatLng\":{\"lat\":41.659429,\"lng\":-91.548385},\"mapUrl\":\"http://open.mapquestapi.com/staticmap/v5/map?key=f08MixE5Tcq8uFS8ltbu3xeJ7qPF4SRw&type=map&size=225,160&locations=41.65942875,-91.5483848030346|marker-sm-50318A-1&scalebar=true&zoom=15&rand=-925387094\"},{\"street\":\"21 North Clinton Street\",\"adminArea6\":\"\",\"adminArea6Type\":\"Neighborhood\",\"adminArea5\":\"Iowa City\",\"adminArea5Type\":\"City\",\"adminArea4\":\"Johnson County\",\"adminArea4Type\":\"County\",\"adminArea3\":\"IA\",\"adminArea3Type\":\"State\",\"adminArea1\":\"US\",\"adminArea1Type\":\"Country\",\"postalCode\":\"52242\",\"geocodeQualityCode\":\"P1XXX\",\"geocodeQuality\":\"POINT\",\"dragPoint\":false,\"sideOfStreet\":\"N\",\"linkId\":\"0\",\"unknownInput\":\"\",\"type\":\"s\",\"latLng\":{\"lat\":41.661337,\"lng\":-91.536149},\"displayLatLng\":{\"lat\":41.661337,\"lng\":-91.536149},\"mapUrl\":\"http://open.mapquestapi.com/staticmap/v5/map?key=f08MixE5Tcq8uFS8ltbu3xeJ7qPF4SRw&type=map&size=225,160&locations=41.6613368,-91.5361491|marker-sm-50318A-2&scalebar=true&zoom=15&rand=-1788318072\"}]}]}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the return object in JSON format, where there are key-value pairs to store specific values for different attributes. For example, the ___street___ is ___200 Hawkins Drive___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that API key is not needed here if we only make a couple of requests. If you want to build an app or query more often, you will need to pay attention to the [rate limit](https://developer.mapquest.com/user/me/plan)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### [Weather API](https://openweathermap.org/api)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a second example, let's try get some weather data. This name, we will need an API key!\n", "\n", "Whenever we go to a webpage with list of API choices, we should first find what we really want. Suppose we want to find out the current weather data, we will go to the [___api doc___ for that API](https://openweathermap.org/current). Let's try the first method: getting weather by city name:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "API call:\n", "\n", "- https://api.openweathermap.org/data/2.5/weather?q={city}\n", "\n", "- https://api.openweathermap.org/data/2.5/weather?q={city},{country}\n", "\n", "Parameters:\n", "___q___ city name and country code divided by comma, use [___ISO 3166 country codes___](https://en.wikipedia.org/wiki/ISO_3166-1#Officially_assigned_code_elements)\n", "\n", "Examples of API calls:\n", "\n", "- https://api.openweathermap.org/data/2.5/weather?q=London\n", "\n", "- https://api.openweathermap.org/data/2.5/weather?q=London,uk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time, we get an error without an API key, saying that:\n", "\n", "```json\n", "{\"cod\":401, \"message\": \"Invalid API key. Please see http://openweathermap.org/faq#error401 for more info.\"}\n", "```\n", "\n", "Note that this is also a JSON object, with an error code of 401 and an error message." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_I just found that it takes up to 10 minutes for new accounts' keys to be activated. For this reason, let's use my key: `a236f384f5bced47bbba86335cdb1d2a`, which will be deleted after this workshop_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try to get an API key [here](http://openweathermap.org/appid). After creating an account, you are able to find an API key [here](https://home.openweathermap.org/api_keys). For privacy and security issue, I will save my API key locally in a file called `weather_keys.csv`. Now that you have the key, you can run the following line in your browser:\n", "\n", "https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=apikey\n", "\n", "In our case, it should be https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=a236f384f5bced47bbba86335cdb1d2a\n", "\n", "Looking at the structure of the API call, we know that different parameters are seperated by a `&` sign." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output is:\n", "```json\n", "{\"coord\":{\"lon\":121.49,\"lat\":31.23},\"weather\":[{\"id\":701,\"main\":\"Mist\",\"description\":\"mist\",\"icon\":\"50d\"}],\"base\":\"stations\",\"main\":{\"temp\":286.11,\"pressure\":1024,\"humidity\":87,\"temp_min\":283.15,\"temp_max\":289.82},\"visibility\":6000,\"wind\":{\"speed\":4,\"deg\":220,\"gust\":9},\"clouds\":{\"all\":48},\"dt\":1552610491,\"sys\":{\"type\":1,\"id\":9659,\"message\":0.0044,\"country\":\"CN\",\"sunrise\":1552601099,\"sunset\":1552644100},\"id\":1796236,\"name\":\"Shanghai\",\"cod\":200}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Overall we see that this is very easy and straightforward." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Send API requests in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While the use of APIs are pretty simple, we might not want to do all these copy and paste manually. Python can help us to send requests and parse results automatically with less human supervision." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To do this, we need to know how to send requests first. We will use an amazing package called [`requests`](http://docs.python-requests.org/en/master/). If you did not have it, please install it by `pip` or `conda`:\n", "\n", "```bash\n", "$ pip install requests\n", "```\n", "\n", "or \n", "\n", "```bash\n", "$ conda install requests\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:42:55.803058Z", "start_time": "2019-03-15T00:42:55.383982Z" } }, "outputs": [], "source": [ "# Let's load the library first\n", "import requests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using weather as an example, we should first know what is the request URL (where the request goes to), with what inputs (e.g., API key and city name). In our case, we know that our API key and the city to query so we can do the following." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:43:40.225801Z", "start_time": "2019-03-15T00:43:40.222667Z" } }, "outputs": [], "source": [ "apikey = 'a236f384f5bced47bbba86335cdb1d2a'" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:43:40.485509Z", "start_time": "2019-03-15T00:43:40.478147Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://api.openweathermap.org/data/2.5/weather\n", "Shanghai\n", "a236f384f5bced47bbba86335cdb1d2a\n" ] } ], "source": [ "weather_url = \"https://api.openweathermap.org/data/2.5/weather\"\n", "city_name = \"Shanghai\"\n", "print(weather_url)\n", "print(city_name)\n", "print(apikey)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we should let `requests` do its work." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:43:43.068322Z", "start_time": "2019-03-15T00:43:42.662055Z" } }, "outputs": [ { "data": { "text/plain": [ "'https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=a236f384f5bced47bbba86335cdb1d2a'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r = requests.get(weather_url, params={'q': city_name, 'APPID': apikey})\n", "r.url # `requests` help us encode the URL in the correct format" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:43:43.883679Z", "start_time": "2019-03-15T00:43:43.875204Z" } }, "outputs": [ { "data": { "text/plain": [ "200" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r.status_code # 200 means success" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a side note, the `requests.get` method here means we want to use `GET` method, as opposed to `POST` method. The former refers to obtaining data, whereas the latters refers to modifying data. See [this post](https://www.w3schools.com/tags/ref_httpmethods.asp) for more details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the JSON response, we call `r.json()` method." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:45:38.562894Z", "start_time": "2019-03-15T00:45:38.545341Z" } }, "outputs": [ { "data": { "text/plain": [ "{'coord': {'lon': 121.49, 'lat': 31.23},\n", " 'weather': [{'id': 701,\n", " 'main': 'Mist',\n", " 'description': 'mist',\n", " 'icon': '50d'}],\n", " 'base': 'stations',\n", " 'main': {'temp': 286.1,\n", " 'pressure': 1024,\n", " 'humidity': 87,\n", " 'temp_min': 283.15,\n", " 'temp_max': 289.82},\n", " 'visibility': 6000,\n", " 'wind': {'speed': 4, 'deg': 220, 'gust': 9},\n", " 'clouds': {'all': 48},\n", " 'dt': 1552610612,\n", " 'sys': {'type': 1,\n", " 'id': 9659,\n", " 'message': 0.0043,\n", " 'country': 'CN',\n", " 'sunrise': 1552601099,\n", " 'sunset': 1552644100},\n", " 'id': 1796236,\n", " 'name': 'Shanghai',\n", " 'cod': 200}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = r.json()\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "JSON object will be converted into a `dict` type, which is the data structure in Python holding key value pairs. To access certain values, we just access them like a `dict`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:45:40.429973Z", "start_time": "2019-03-15T00:45:40.422244Z" } }, "outputs": [ { "data": { "text/plain": [ "'Shanghai'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result['name']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:50:01.845027Z", "start_time": "2019-03-15T00:50:01.838639Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "temp 286.1\n", "pressure 1024\n", "humidity 87\n", "temp_min 283.15\n", "temp_max 289.82\n" ] } ], "source": [ "for key, value in result['main'].items():\n", " print(key, value) # default temperature is in Kelvin" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Use packages: Twitter API as an example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many web servers have their own APIs ready to use. By using these convenient tools, we can get started right off following their documentations and examples without any manual efforts. We will be using Twitter API as an example. We will first install this package as shown [here](https://python-twitter.readthedocs.io/en/latest/installation.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we have to register an account for Twitter Developer and register an app. Let's go to https://dev.twitter.com/ and get an app togther. Here's a quick start on how you can do this. After we obtain *__consumer key__*, *__consumer secret__*, *__access token__*, and *__access token secret__*, we are ready to retrieve some data from Twitter!" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:52:24.406197Z", "start_time": "2019-03-15T00:52:24.402935Z" } }, "outputs": [], "source": [ "## suppress warnings\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true }, "source": [ "I saved my own keys into a text file with four lines of commented code below:\n", "```\n", "consumer_key = \"your_consumer_key\" \n", "consumer_secret = \"your_consumer_secret\"\n", "access_token = \"your_access_token\"\n", "access_secret = \"your_access_secret\"\n", "```" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:53:55.354910Z", "start_time": "2019-03-15T00:53:55.350856Z" } }, "outputs": [], "source": [ "with open(\"./twitter_keys.csv\", \"r\") as twitter_keys:\n", " keys = twitter_keys.read()\n", " consumer_key, consumer_secret, access_token, access_secret = \\\n", " keys.split(\"\\n\")[:-1]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:53:55.948550Z", "start_time": "2019-03-15T00:53:55.655583Z" }, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "User(ID=2740697738, ScreenName=zhiyzuo)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## load twitter package, which a well-written Python package for Twitter APIs\n", "import twitter\n", "api = twitter.Api(consumer_key=consumer_key,\n", " consumer_secret=consumer_secret, \n", " access_token_key=access_token,\n", " access_token_secret=access_secret)\n", "\n", "## check status\n", "api.VerifyCredentials()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try to do some simple tasks: get my ownstatuses" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:53:57.983227Z", "start_time": "2019-03-15T00:53:57.847295Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Paper with @kangzhao and @ChaoqunNi on faculty hiring within iSchools is out: https://t.co/XYvd0xqauZ …; @iSchools… https://t.co/uiX9QCknhX\n", "I'm using Overleaf, the free online collaborative LaTeX editor - it's awesome and easy to use! https://t.co/fcNXTDoXPs\n", "RT @kangzhao: Paper with @zhiyzuo: \"The More Multidisciplinary the Better?--The Prevalence and Interdisciplinarity of Research Collaboratio…\n", "@ulfaslak @ScienceNews no, that’s dr. phalange!\n", "@ulfaslak @m_rosvall @suneman Congrats !\n", "RT @jcdl2018: We have extended the #jcdl2018 deadline for panels, posters, and demonstrations to February 2, 2018. https://t.co/OrA407HlcT\n", "Say Trello to boards in @Bitbucket Cloud. https://t.co/bZFOhGIDqH #BitbucketTrends\n", "The state and evolution of U.S. iSchools: From talent acquisitions to research... https://t.co/t5wY6YvxQl\n", "RT @kangzhao: Our paper on @iSchools published--The state and evolution of U.S. iSchools: from talent acquisitions to research https://t.co…\n", "@JASIST 😆\n", "#fabric #myowntwitterapp fun app!\n", "test my fabric composer\n", "Test pic http://t.co/C8CJbKg19b\n", "Test url twitter api http://t.co/3eXsFUEZPo\n", "This is a test tweet for the api.\n", "Test tweet\n" ] } ], "source": [ "statuses = api.GetUserTimeline(screen_name=\"Zhiya Zuo\")\n", "for s in statuses:\n", " print(s.text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use our `user id`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:56:04.770520Z", "start_time": "2019-03-15T00:56:04.557062Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sat Feb 02 02:51:28 +0000 2019\n", "Fri Dec 28 18:02:15 +0000 2018\n", "Sat Jul 07 04:39:16 +0000 2018\n", "Mon Jun 25 22:04:10 +0000 2018\n", "Fri Jun 22 18:48:15 +0000 2018\n", "Sat Jan 27 07:24:13 +0000 2018\n", "Thu Sep 14 17:55:58 +0000 2017\n", "Tue May 23 15:15:19 +0000 2017\n", "Tue May 23 15:04:10 +0000 2017\n", "Tue May 23 14:56:41 +0000 2017\n", "Mon Oct 31 16:40:09 +0000 2016\n", "Mon Oct 31 04:03:32 +0000 2016\n", "Wed Dec 10 17:11:50 +0000 2014\n", "Wed Dec 10 17:01:40 +0000 2014\n", "Fri Dec 05 22:28:49 +0000 2014\n", "Fri Dec 05 17:54:31 +0000 2014\n" ] } ], "source": [ "statuses = api.GetUserTimeline(user_id=\"2740697738\")\n", "for s in statuses:\n", " print(s.created_at)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also get a friend list" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:56:06.657979Z", "start_time": "2019-03-15T00:56:06.252535Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Liangfei Qiu\n", "Rong Su\n", "GSOM at Clark Univ.\n", "Yuqing Ren\n", "Twitter API\n", "Amin Vahedian\n", "PHD Comics\n", "Nitesh Chawla\n", "Inside Higher Ed\n", "The Chronicle of Higher Education\n", "JCDL Conference\n", "Quantitative Science Studies\n", "ISSI President\n", "Associate Deans\n", "Tippie College of Business\n", "Ann Melissa Campbell\n", "INFORMS\n", "INFORMS2019\n", "Peter Fennell\n", "Chaomei Chen\n", "Academic JobTracker\n", "Shangguan Wang\n", "Weiguo Fan\n", "IOWAInformsStuChap\n", "Jason Dou\n", "CSS\n", "yrCSS\n", "Bodo Winter\n", "Penny Skateboards\n", "Skateboarding\n", "ASIS&T SIG SI\n", "Jiepu Jiang\n", "Tippie Analytics\n", "Blei Lab\n", "Ulf Aslak\n", "Center For Open Science\n", "Time.Graphics\n", "GHTorrent\n", "Complexity Challenge\n", "Brian Uzzi\n", "Yang\n", "Andrej Karpathy\n", "NeurIPS Conference\n", "IC2S2\n", "USC ISI\n", "Albert-László Barabási\n", "Roberta Sinatra\n", "Dashun Wang\n", "Kristina Lerman\n", "Not just Google Scholar's Digest\n", "Loet Leydesdorff\n", "Vincent Larivière\n", "Cassidy R. Sugimoto\n", "Lutz Bornmann\n", "Ludo Waltman\n", "ASIS&T Social Media Special Interest Group\n", "Emilio Ferrara\n", "David Mimno\n", "/r/datasets\n", "JCDL 2018\n", "Duncan Watts\n", "Command Line Magic\n", "Unix tool tip\n", "Data Science Fact\n", "ASIS&T SIG/MET\n", "Nick Street\n", "Yuanyang Liu\n", "Kristina Bigsby\n", "Mendeley Support\n", "Xun Zhou\n", "New York Internships\n", "Santa Fe Institute\n", "Microsoft Academic\n", "Network Science PhDs\n", "JASIST\n", "ASIS&T\n", "iSchools iConference\n", "iSchools\n", "WASD Keyboards\n", "Yong-Yeol Ahn\n", "Lada Adamic\n", "Jure Leskovec\n", "Network Fact\n", "Dan Larremore\n", "Simon DeDeo\n", "Jevin West\n", "Aaron Clauset\n", "John Jairo Rios R.\n", "Michael Lash\n", "Kang Zhao\n", "sharelatex\n", "Lincoln Wang\n", "Xi Wang\n", "qix\n", "Overleaf\n", "Andrew Ng\n", "Iowa Memorial Union\n", "University of Iowa\n", "kevin Garnet\n" ] } ], "source": [ "friends = api.GetFriends()\n", "for f in friends:\n", " print(f.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More interestingly, let's go get some tweets from Twitter. Let's try to search for any popular tweets (limit to 20) related to `uiowa` since 12/01/2014 in English.\n", "- See https://dev.twitter.com/rest/public/search for more informaiton on how to construct a query\n", "- How to set `lang` parameter -> https://dev.twitter.com/rest/reference/get/help/languages" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:56:19.186357Z", "start_time": "2019-03-15T00:56:19.066968Z" } }, "outputs": [], "source": [ "results = api.GetSearch(\n", " raw_query=\"q=uiowa&result_type=popular&since=2014-12-01&count=20&lang=en\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We only got 15 results though." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:56:20.064369Z", "start_time": "2019-03-15T00:56:20.058646Z" } }, "outputs": [ { "data": { "text/plain": [ "14" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show all the text in the retrieved tweets, with user screen name highlited" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:56:21.479323Z", "start_time": "2019-03-15T00:56:21.471343Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "We'll have a slice of that 👇 #PiDay https://t.co/Gfzhl5n9Ly. Tweeted by \u001b[41muiowa\u001b[0m\n", "That smile you get on your face when you realize you're #AlwaysAHawkeye. https://t.co/5wiAl9sbiX. Tweeted by \u001b[41muiowa\u001b[0m\n", "In a first of its kind study, #uiowa researchers proved that early intervention for children with hearing loss can… https://t.co/7dcmQ4k63G. Tweeted by \u001b[41muiowa\u001b[0m\n", "Hawkeyes are #B1GTourney champions! Congratulations, @IowaWBB! It's time to go dancing. #FightForIowa 🖤💛🏀 https://t.co/cS4P336JfI. Tweeted by \u001b[41muiowa\u001b[0m\n", "Judges have a history of finding ways to avoid big sentences in white collar economic crimes, with the exception of… https://t.co/KeGgfeiUSs. Tweeted by \u001b[41mmayawiley\u001b[0m\n", "Creator and co-star of the hit Netflix series Love, Hawkeye Paul Rust said it best this weekend when he visited Iow… https://t.co/jLpGSp3XCd. Tweeted by \u001b[41muiowa\u001b[0m\n", "On #InternationalWomensDay we're celebrating the many incredibly talented and inspirational women of #uiowa. 🖤💛 https://t.co/3y2U8EyF2U. Tweeted by \u001b[41muiowa\u001b[0m\n", "Current Mood: 😁🏆 \n", "#MondayMotivation 🖤💛🏀\n", "@IowaWBB https://t.co/klUrInwN7c. Tweeted by \u001b[41muiowa\u001b[0m\n", "New: Montserrat Fuentes, dean and professor in the College of Humanities & Sciences at Virginia Commonwealth Univer… https://t.co/Oto8jMe88I. Tweeted by \u001b[41muiowa\u001b[0m\n", "Emily Maxwell and @BlairBraverman have three things in common: they're dogsledders, Hawkeyes, and they're both comp… https://t.co/hrNhZirAwg. Tweeted by \u001b[41muiowa\u001b[0m\n", "10 seasons, 134 children, and a lifetime of memories make up the Kid Captain program. Read about how the special bo… https://t.co/A40ALrvgWx. Tweeted by \u001b[41muiowa\u001b[0m\n", "When you love living the Hawkeye life.🖤💛🐶 https://t.co/YHs0WLOZUj. Tweeted by \u001b[41muiowa\u001b[0m\n", "Proud to see my article co-written w/ Dr. @tigerlilyrocks focused on online feminism, Wikipedia & digital humanitie… https://t.co/RHGGfd41bi. Tweeted by \u001b[41mSarahEBond\u001b[0m\n", "Calling young writers in India, USA and Pakistan! \n", "- Are you between the ages of 18 and 22?\n", "- Do you want to spend… https://t.co/YelZzFk8M9. Tweeted by \u001b[41mSiyandaWrites\u001b[0m\n" ] } ], "source": [ "from IPython.display import clear_output\n", "for tw in results:\n", " print(\"%s. Tweeted by \\033[41m%s\\033[0m\"%(tw.text, tw.user.screen_name))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can save these text into a file for further analyses. Note that we may want to remove all the newlines within each tweet." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:56:24.882760Z", "start_time": "2019-03-15T00:56:24.875959Z" } }, "outputs": [ { "data": { "text/plain": [ "\"We'll have a slice of that 👇\\xa0#PiDay https://t.co/Gfzhl5n9Ly\"" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_list = [tw.text.replace('\\n', ' ') for tw in results]\n", "tweet_list[0]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:56:25.609776Z", "start_time": "2019-03-15T00:56:25.605955Z" } }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2019-03-15T00:56:26.908060Z", "start_time": "2019-03-15T00:56:26.886221Z" } }, "outputs": [], "source": [ "np.savetxt('sample-data/sample_tweets.csv', tweet_list, encoding='utf-8', fmt='%s')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Conclusions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this workshop, we went through some examples of using APIs to get various types of data in Python. The last Twitter example is relatively superficial and does not go deep enough to get meaningful data for social media analysis. Here I would like to recommend reading more materials, especially those on ___streaming API___:\n", "\n", "- http://adilmoujahid.com/posts/2014/07/twitter-analytics/\n", "- http://socialmedia-class.org/twittertutorial.html\n", "\n", "\n", "Further, [`tweepy`] package seems to be pretty popular as well: http://www.tweepy.org/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" }, "toc": { "base_numbering": 1, "nav_menu": { "height": "133px", "width": "254px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "959px", "left": "0px", "right": "1662.34px", "top": "126.003px", "width": "204px" }, "toc_section_display": "block", "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }
X Tutup