X Tutup

{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numpy Data Types\n", "Numpy has the following data types: \n", "- ```int```\n", "- ```float```\n", "- ```complex```\n", "- ```bool```\n", "- ```string```\n", "- ```unicode```\n", "- ```object```\n", "\n", "The numeric data types have various precisions like 32-bit or 64-bit. \n", "\n", "Numpy data types can be represented using either __Type__ or __Type Code__" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	Type	Type Code
0	int8	i1
1	uint8	u1
2	int16	i2
3	uint16	u2
4	int or int32	i4 or i
5	uint32	u4
6	int64	i8
7	uint64	u8
8	float16	f2
9	float32	f4 or f
10	float or float64	f8 or d
11	float128	f16 or g
12	complex64	c8
13	complex or complex128	c16
14	bool	None
15	object	O
16	string_	S
17	unicode_	U

\n", "

" ], "text/plain": [ " Type Type Code\n", "0 int8 i1\n", "1 uint8 u1\n", "2 int16 i2\n", "3 uint16 u2\n", "4 int or int32 i4 or i\n", "5 uint32 u4\n", "6 int64 i8\n", "7 uint64 u8\n", "8 float16 f2\n", "9 float32 f4 or f\n", "10 float or float64 f8 or d\n", "11 float128 f16 or g\n", "12 complex64 c8\n", "13 complex or complex128 c16\n", "14 bool None\n", "15 object O\n", "16 string_ S\n", "17 unicode_ U" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dtypes = pd.DataFrame(\n", " {\n", " 'Type': [\n", " 'int8', \n", " 'uint8', \n", " 'int16', \n", " 'uint16', \n", " 'int or int32', \n", " 'uint32', \n", " 'int64', \n", " 'uint64', \n", " 'float16', \n", " 'float32', \n", " 'float or float64',\n", " 'float128', \n", " 'complex64', \n", " 'complex or complex128', \n", " 'bool', \n", " 'object', \n", " 'string_',\n", " 'unicode_',\n", " ],\n", " \n", " 'Type Code': [\n", " 'i1', \n", " 'u1', \n", " 'i2', \n", " 'u2', \n", " 'i4 or i', \n", " 'u4', \n", " 'i8', \n", " 'u8', \n", " 'f2', \n", " 'f4 or f', \n", " 'f8 or d', \n", " 'f16 or g', \n", " 'c8', \n", " 'c16', \n", " None, \n", " 'O', \n", " 'S', \n", " 'U',\n", " ]\n", " }\n", ")\n", "\n", "dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data types can be defined at creating the numpy array and converted to other types later. \n", "\n", "You can use either _type, type code_ or _```np``` dot_ methods to define the data type of an array, but when you use ```np``` dot method to define the data type, it can only follow _type_ rather than _type code_." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float32')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([1,2,3], dtype='f4')\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float32')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Identical to the above\n", "arr = np.array([1,2,3], dtype='float32')\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('complex64')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([1+2j, 3-4j], dtype=np.complex64)\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('complex64')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Identical to the above\n", "arr = np.array([1+2j, 3-4j], dtype='c8')\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "module 'numpy' has no attribute 'c8'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# ERROR\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0marr\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0marray\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m+\u001b[0m\u001b[1;36m2j\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m3\u001b[0m\u001b[1;33m-\u001b[0m\u001b[1;36m4j\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mc8\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3\u001b[0m \u001b[0marr\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mAttributeError\u001b[0m: module 'numpy' has no attribute 'c8'" ] } ], "source": [ "# ERROR\n", "arr = np.array([1+2j, 3-4j], dtype=np.c8)\n", "arr.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Type Conversion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```astype``` method: convert the data type of an array to other data types. \n", "\n", "Notice that ```astype``` returns a copy of the array instead of converting the data type in place. You need to assign the copy to the original array or a new array." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original Data Type: int16\n", "Data Type After Conversion: float32\n" ] } ], "source": [ "arr = np.array([1,2,3], dtype='int16')\n", "print('Original Data Type: ' + str(arr.dtype))\n", "\n", "arr = arr.astype(np.float32)\n", "print('Data Type After Conversion: ' + str(arr.dtype))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__WARNING__: be cautious about data overflow when you downcast the data type (from higher precision to lower precision). Some unexpected and undefined values might occur and it is usually difficult to debug such issues. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "np array before type conversion: [126 127 256]\n", "np array after type conversion: [126 127 0]\n" ] } ], "source": [ "# An example of integer overflow at downcasting\n", "arr = np.array([126,127,256], dtype='int16')\n", "print('np array before type conversion: ' + str(arr))\n", "\n", "# Range of int8 [-128, 127], 256 overflows after conversion\n", "arr = arr.astype('int8')\n", "print('np array after type conversion: ' + str(arr))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### String and Unicode Data Type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ```string_``` and ```unicode_``` data types are all implicitly _fixed-length_. \n", "\n", "The length of the string is given by their type code appended with a number. For example, ```S3``` represents string of length 3; ```U10``` represents unicode of length 10. Otherwise, the default length is the length of the longest string in the array.\n", "\n", "If the length of a string in the array is shorter than the length of the data type defined or converted to, the string will be truncated." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[b'abc' b'def']\n", "['abc' 'efg']\n" ] } ], "source": [ "# An example of truncated string\n", "s = np.array(['abc', 'defg'], dtype='S3')\n", "print(s)\n", "\n", "# An example of truncated unicode\n", "s = np.array(['abcd', 'efghi'], dtype='U3')\n", "print(s)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The array is [b'a' b'ab' b'abc']\n", "The data type is |S3 because the longest string in the array is \"abc\" and its length is 3.\n", "The array is ['a' 'abc' 'abcd']\n", "The data type is

X Tutup