X Tutup
{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numpy Data Types\n", "Numpy has the following data types: \n", "- ```int```\n", "- ```float```\n", "- ```complex```\n", "- ```bool```\n", "- ```string```\n", "- ```unicode```\n", "- ```object```\n", "\n", "The numeric data types have various precisions like 32-bit or 64-bit. \n", "\n", "Numpy data types can be represented using either __Type__ or __Type Code__" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TypeType Code
0int8i1
1uint8u1
2int16i2
3uint16u2
4int or int32i4 or i
5uint32u4
6int64i8
7uint64u8
8float16f2
9float32f4 or f
10float or float64f8 or d
11float128f16 or g
12complex64c8
13complex or complex128c16
14boolNone
15objectO
16string_S
17unicode_U
\n", "
" ], "text/plain": [ " Type Type Code\n", "0 int8 i1\n", "1 uint8 u1\n", "2 int16 i2\n", "3 uint16 u2\n", "4 int or int32 i4 or i\n", "5 uint32 u4\n", "6 int64 i8\n", "7 uint64 u8\n", "8 float16 f2\n", "9 float32 f4 or f\n", "10 float or float64 f8 or d\n", "11 float128 f16 or g\n", "12 complex64 c8\n", "13 complex or complex128 c16\n", "14 bool None\n", "15 object O\n", "16 string_ S\n", "17 unicode_ U" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dtypes = pd.DataFrame(\n", " {\n", " 'Type': [\n", " 'int8', \n", " 'uint8', \n", " 'int16', \n", " 'uint16', \n", " 'int or int32', \n", " 'uint32', \n", " 'int64', \n", " 'uint64', \n", " 'float16', \n", " 'float32', \n", " 'float or float64',\n", " 'float128', \n", " 'complex64', \n", " 'complex or complex128', \n", " 'bool', \n", " 'object', \n", " 'string_',\n", " 'unicode_',\n", " ],\n", " \n", " 'Type Code': [\n", " 'i1', \n", " 'u1', \n", " 'i2', \n", " 'u2', \n", " 'i4 or i', \n", " 'u4', \n", " 'i8', \n", " 'u8', \n", " 'f2', \n", " 'f4 or f', \n", " 'f8 or d', \n", " 'f16 or g', \n", " 'c8', \n", " 'c16', \n", " None, \n", " 'O', \n", " 'S', \n", " 'U',\n", " ]\n", " }\n", ")\n", "\n", "dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data types can be defined at creating the numpy array and converted to other types later. \n", "\n", "You can use either _type, type code_ or _```np``` dot_ methods to define the data type of an array, but when you use ```np``` dot method to define the data type, it can only follow _type_ rather than _type code_." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float32')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([1,2,3], dtype='f4')\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float32')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Identical to the above\n", "arr = np.array([1,2,3], dtype='float32')\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('complex64')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([1+2j, 3-4j], dtype=np.complex64)\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('complex64')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Identical to the above\n", "arr = np.array([1+2j, 3-4j], dtype='c8')\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "module 'numpy' has no attribute 'c8'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# ERROR\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0marr\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0marray\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m+\u001b[0m\u001b[1;36m2j\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m3\u001b[0m\u001b[1;33m-\u001b[0m\u001b[1;36m4j\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mc8\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3\u001b[0m \u001b[0marr\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mAttributeError\u001b[0m: module 'numpy' has no attribute 'c8'" ] } ], "source": [ "# ERROR\n", "arr = np.array([1+2j, 3-4j], dtype=np.c8)\n", "arr.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Type Conversion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```astype``` method: convert the data type of an array to other data types. \n", "\n", "Notice that ```astype``` returns a copy of the array instead of converting the data type in place. You need to assign the copy to the original array or a new array." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original Data Type: int16\n", "Data Type After Conversion: float32\n" ] } ], "source": [ "arr = np.array([1,2,3], dtype='int16')\n", "print('Original Data Type: ' + str(arr.dtype))\n", "\n", "arr = arr.astype(np.float32)\n", "print('Data Type After Conversion: ' + str(arr.dtype))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__WARNING__: be cautious about data overflow when you downcast the data type (from higher precision to lower precision). Some unexpected and undefined values might occur and it is usually difficult to debug such issues. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "np array before type conversion: [126 127 256]\n", "np array after type conversion: [126 127 0]\n" ] } ], "source": [ "# An example of integer overflow at downcasting\n", "arr = np.array([126,127,256], dtype='int16')\n", "print('np array before type conversion: ' + str(arr))\n", "\n", "# Range of int8 [-128, 127], 256 overflows after conversion\n", "arr = arr.astype('int8')\n", "print('np array after type conversion: ' + str(arr))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### String and Unicode Data Type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ```string_``` and ```unicode_``` data types are all implicitly _fixed-length_. \n", "\n", "The length of the string is given by their type code appended with a number. For example, ```S3``` represents string of length 3; ```U10``` represents unicode of length 10. Otherwise, the default length is the length of the longest string in the array.\n", "\n", "If the length of a string in the array is shorter than the length of the data type defined or converted to, the string will be truncated." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[b'abc' b'def']\n", "['abc' 'efg']\n" ] } ], "source": [ "# An example of truncated string\n", "s = np.array(['abc', 'defg'], dtype='S3')\n", "print(s)\n", "\n", "# An example of truncated unicode\n", "s = np.array(['abcd', 'efghi'], dtype='U3')\n", "print(s)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The array is [b'a' b'ab' b'abc']\n", "The data type is |S3 because the longest string in the array is \"abc\" and its length is 3.\n", "The array is ['a' 'abc' 'abcd']\n", "The data type is
X Tutup