1. Sample Data
2. Import Module
import pandas as pd
from print_df import print_df
import matplotlib.pyplot as plt
3. Data 분석
- TSV(tab separated values) : Data들이 tap 으로 구분된 파일.
- TSV File Load(sep = 구분형식)
df = pd.read_csv('data\gapminder.tsv', sep='\t')
- Data의 행(row) / 열(column) 갯수 확인
print('shape:', df.shape)
shape: (1704, 6)
Process finished with exit code 0
- Data 행의 머리말 부분 확인
print_df(df.head())
+---+-------------+-----------+------+--------------------+----------+-------------------+
| | country | continent | year | lifeExp | pop | gdpPercap |
+---+-------------+-----------+------+--------------------+----------+-------------------+
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453145 |
| 1 | Afghanistan | Asia | 1957 | 30.331999999999997 | 9240934 | 820.8530296 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007099999999 |
| 3 | Afghanistan | Asia | 1967 | 34.02 | 11537966 | 836.1971382 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811057999999 |
+---+-------------+-----------+------+--------------------+----------+-------------------+
Process finished with exit code 0
- Data 행의 꼬리말 부분 확인
print_df(df.tail())
+------+----------+-----------+------+--------------------+----------+--------------------+
| | country | continent | year | lifeExp | pop | gdpPercap |
+------+----------+-----------+------+--------------------+----------+--------------------+
| 1699 | Zimbabwe | Africa | 1987 | 62.351000000000006 | 9216418 | 706.1573059 |
| 1700 | Zimbabwe | Africa | 1992 | 60.376999999999995 | 10704340 | 693.4207856 |
| 1701 | Zimbabwe | Africa | 1997 | 46.809 | 11404948 | 792.4499602999999 |
| 1702 | Zimbabwe | Africa | 2002 | 39.989000000000004 | 11926563 | 672.0386227000001 |
| 1703 | Zimbabwe | Africa | 2007 | 43.486999999999995 | 12311143 | 469.70929810000007 |
+------+----------+-----------+------+--------------------+----------+--------------------+
Process finished with exit code 0
- Data Frame의 열 이름 출력
print(df.columns)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
Process finished with exit code 0
- Data Frame의 열 이름 한 줄 씩 출력
for col in df.columns:
print(col)
country
continent
year
lifeExp
pop
gdpPercap
Process finished with exit code 0
- Data Frame 열의 있는 데이터 타입 확인(object : 문자열 / int : 정수 / float : 실수)
print(df.dtypes)
<class 'pandas.core.series.Series'>
country object
continent object
year int64
lifeExp float64
pop int64
gdpPercap float64
dtype: object
Process finished with exit code 0
- Data Frame 정보 출력(행 / 열 / 데이터 타입 등)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
country 1704 non-null object
continent 1704 non-null object
year 1704 non-null int64
lifeExp 1704 non-null float64
pop 1704 non-null int64
gdpPercap 1704 non-null float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB
Process finished with exit code 0
- Data Frame의 열 단위 데이터 추출
countries = df['continent']
print(countries)
0 Asia
1 Asia
2 Asia
3 Asia
4 Asia
5 Asia
6 Asia
7 Asia
8 Asia
9 Asia
10 Asia
11 Asia
12 Europe
13 Europe
14 Europe
15 Europe
16 Europe
17 Europe
18 Europe
19 Europe
20 Europe
21 Europe
22 Europe
23 Europe
24 Africa
25 Africa
26 Africa
27 Africa
28 Africa
29 Africa
...
1674 Asia
1675 Asia
1676 Asia
1677 Asia
1678 Asia
1679 Asia
1680 Africa
1681 Africa
1682 Africa
1683 Africa
1684 Africa
1685 Africa
1686 Africa
1687 Africa
1688 Africa
1689 Africa
1690 Africa
1691 Africa
1692 Africa
1693 Africa
1694 Africa
1695 Africa
1696 Africa
1697 Africa
1698 Africa
1699 Africa
1700 Africa
1701 Africa
1702 Africa
1703 Africa
Name: continent, Length: 1704, dtype: object
Process finished with exit code 0
print(type(countries))
<class 'pandas.core.series.Series'>
Process finished with exit code 0
- 변수 Counteries의 머리말 부분 확인
print(countries.head())
0 Asia
1 Asia
2 Asia
3 Asia
4 Asia
Name: continent, dtype: object
Process finished with exit code 0
print(countries.tail())
1699 Africa
1700 Africa
1701 Africa
1702 Africa
1703 Africa
Name: continent, dtype: object
Process finished with exit code 0
subset_df = df[['country', 'year', 'lifeExp']]
print(type(subset_df))
print_df(subset_df.head())
print_df(subset_df.tail())
+------+----------+------+--------------------+
| | country | year | lifeExp |
+------+----------+------+--------------------+
| 1699 | Zimbabwe | 1987 | 62.351000000000006 |
| 1700 | Zimbabwe | 1992 | 60.376999999999995 |
| 1701 | Zimbabwe | 1997 | 46.809 |
| 1702 | Zimbabwe | 2002 | 39.989000000000004 |
| 1703 | Zimbabwe | 2007 | 43.486999999999995 |
+------+----------+------+--------------------+
Process finished with exit code 0
- 행 단위 데이터 추출 방법 1(df.loc['행 번호'])
print(df.loc[0])
country Afghanistan
continent Asia
year 1952
lifeExp 28.801
pop 8425333
gdpPercap 779.445
Name: 0, dtype: object
Process finished with exit code 0
- 행 단위의 데이터 추출 방법 2(df.iloc['행의 인덱스'])
country Afghanistan
continent Asia
year 1952
lifeExp 28.801
pop 8425333
gdpPercap 779.445
Name: 0, dtype: object
Process finished with exit code 0
'Python_Intermediate > Pandas' 카테고리의 다른 글
Pandas - Gapminder Data 분석(그래프 분석) 3 (0) | 2019.05.20 |
---|---|
Pandas - Gapminder Data 분석(TSV File) 2 (0) | 2019.05.19 |
Pandas - 연비 TEST Data 분석 2 (0) | 2019.05.18 |
Pandas - 연비 TEST Data 분석 1 (0) | 2019.05.18 |
190517 21:42> Naver 실시간 검색어 20위 (0) | 2019.05.17 |