nba_rosters_2017_18

Malcolm J Gyagenda June 06, 20115

Background

I hope to learn how to use set up histograms, scatter plots, and also use simple statistics functions like min, max, standrad deviation in python I also hope to learn how to create and update repositories using data from the National Basketball Association which includes height, weight, team, birht city and jersey number

Data

The variables of interest are weight of the player and the age of the player. How the old the player is versus how much they weigh. Do the older players weight more than the younger ones and viceversa

In [10]:
import pandas as pd 
import numpy as np 

s=pd.Series()
print(s)
nba=pd.read_csv("nba_rosters_2017_18.csv")
# print(nba.head())

nba_sub = nba[nba['Age']>0 & (nba['Weight']>0)]

print(len(nba))
print(len(nba_sub))
nba_sub.head()


plt.scatter(df_sub["Weight"], df_sub["Age"])
Series([], dtype: float64)
724
682
Out[10]:
Player_ID LastName FirstName Jersey_Num Position Height Weight Birth_Date Age Birth_City Birth_Country Rookie Team_ID Team_Abbr. Team_City Team_Name
0 10138 Abrines Alex 8.0 SG 6'6" 190.0 8/1/93 24 Palma de Mallorca Spain N 96.0 OKL Oklahoma City Thunder
1 9466 Acy Quincy 13.0 PF 6'7" 240.0 10/6/90 27 Tyler, TX USA N 84.0 BRO Brooklyn Nets
2 9301 Adams Jordan 3.0 SG 6'5" 209.0 7/8/94 23 Atlanta, GA USA N NaN NaN NaN NaN
3 9390 Adams Steven 12.0 C 7'0" 255.0 7/20/93 24 Rotorua New Zealand N 96.0 OKL Oklahoma City Thunder
4 13742 Adebayo Bam 13.0 C 6'10" 255.0 7/18/97 20 Newark, NJ USA Y 92.0 MIA Miami Heat
In [19]:
#weight = nba_sub['Age','Weight']
young_sample = nba_sub[nba_sub["Age"]<30]
#old_sample = nba_sub[nba_sub["Age"]>=30]
young_sample.describe()
#old_sample.describe()
y_wt = np.mean(young_sample['Weight'])
o_wt = np.mean(old_sample['Weight'])
print("Young:", y_wt, "\nOld:", o_wt)
Young: 218.0431818181818 
Old: 223.6
In [7]:
print(nba[600:610])
     Player_ID   LastName FirstName  Jersey_Num Position Height  Weight  \
600       9241      Smith      Josh         5.0       PF    NaN     NaN   
601       9411      Smith     Jason        14.0       PF   7'0"   240.0   
602       9350      Smith      Greg         4.0       PF  6'10"   250.0   
603       9542      Smith      Russ         NaN       PG    NaN     NaN   
604       9167      Smith      J.R.         5.0       SG   6'6"   225.0   
605      13736  Smith Jr.    Dennis         1.0       PG   6'3"   195.0   
606       9145      Snell      Tony        21.0       SG   6'7"   221.0   
607      13879     Snyder      Quin         NaN        C    NaN     NaN   
608       9225   Speights  Marreese         5.0        C  6'10"   255.0   
609       9090   Splitter     Tiago        11.0        C  6'11"   245.0   

     Birth_Date  Age          Birth_City Birth_Country Rookie  Team_ID  \
600   12/5/1985   32    College Park, GA           USA      N    110.0   
601    3/2/1986   32         Greeley, CO           USA      N     94.0   
602    1/8/1991   27         Vallejo, CA           USA      N      NaN   
603   4/19/1991   27        New York, NY           USA      N      NaN   
604    9/9/1985   32        Freehold, NJ           USA      N     86.0   
605  11/25/1997   20    Fayetteville, NC           USA      Y    108.0   
606  11/10/1991   26       Riverside, CA           USA      N     90.0   
607   1/26/2018    0                 NaN           NaN      N     98.0   
608    8/4/1987   30  St. Petersburg, FL           USA      N     95.0   
609    1/1/1985   33           Joinville        Brazil      N      NaN   

    Team_Abbr.    Team_City  Team_Name  
600        NOP  New Orleans   Pelicans  
601        WAS   Washington    Wizards  
602        NaN          NaN        NaN  
603        NaN          NaN        NaN  
604        CLE    Cleveland  Cavaliers  
605        DAL       Dallas  Mavericks  
606        MIL    Milwaukee      Bucks  
607        UTA         Utah       Jazz  
608        ORL      Orlando      Magic  
609        NaN          NaN        NaN  
In [21]:
print(nba["Weight"][600:610])
print(nba["Age"][600:610]) 
600      NaN
601    240.0
602    250.0
603      NaN
604    225.0
605    195.0
606    221.0
607      NaN
608    255.0
609    245.0
Name: Weight, dtype: float64
600    32
601    32
602    27
603    27
604    32
605    20
606    26
607     0
608    30
609    33
Name: Age, dtype: int64

Summary statistics for Age

In [22]:
print(nba["Age"][600:610].describe())
count    10.000000
mean     25.900000
std       9.926955
min       0.000000
25%      26.250000
50%      28.500000
75%      32.000000
max      33.000000
Name: Age, dtype: float64

Summary Statistics for Weight

In [30]:
print(nba["Weight"][600:610].describe())
count      7.000000
mean     233.000000
std       20.888593
min      195.000000
25%      223.000000
50%      240.000000
75%      247.500000
max      255.000000
Name: Weight, dtype: float64
In [54]:
print(type(Weight))
#Weight=(nba["Weight"][600:610])
print(sum(Weight.isnull()))
<class 'pandas.core.series.Series'>
3
In [29]:
Age=(nba["Age"][600:610])
print(sum(Age.isnull()))
0
In [51]:
import numpy as np 
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

plt.hist(Age)
Out[51]:
(array([  2.,  18.,  83., 201., 271., 245., 136.,  37.,   5.,   2.]),
 array([-3.40398576, -2.6936193 , -1.98325284, -1.27288637, -0.56251991,
         0.14784655,  0.85821301,  1.56857947,  2.27894594,  2.9893124 ,
         3.69967886]),
 <a list of 10 Patch objects>)
In [48]:
fig2=plt.figure()
plt.scatter(Age,Weight)
plt.show()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-48-39ec7e3e427a> in <module>()
      1 fig2=plt.figure()
----> 2 plt.scatter(Age,Weight)
      3 plt.show()

~\Newfolder\lib\site-packages\matplotlib\pyplot.py in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, hold, data, **kwargs)
   3468                          vmin=vmin, vmax=vmax, alpha=alpha,
   3469                          linewidths=linewidths, verts=verts,
-> 3470                          edgecolors=edgecolors, data=data, **kwargs)
   3471     finally:
   3472         ax._hold = washold

~\Newfolder\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
   1853                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1854                         RuntimeWarning, stacklevel=2)
-> 1855             return func(ax, *args, **kwargs)
   1856 
   1857         inner.__doc__ = _add_data_doc(inner.__doc__,

~\Newfolder\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4241         y = np.ma.ravel(y)
   4242         if x.size != y.size:
-> 4243             raise ValueError("x and y must be the same size")
   4244 
   4245         if s is None:

ValueError: x and y must be the same size